fall 2021 SFI documentation changes

2021-11-16 14:32:42 -05:00 · 2021-11-16 14:32:42 -05:00 · d259c13c0b
commit d259c13c0b
parent 8584da6a7f
22 changed files with 66 additions and 82 deletions
--- a/sfi/images/gif_crash_reproduce_demo.gif
+++ b/sfi/images/gif_crash_reproduce_demo.gif
--- a/sfi/images/gif_crash_reproduce_demo_small.gif
+++ b/sfi/images/gif_crash_reproduce_demo_small.gif
--- a/sfi/images/gif_fuzz_debug.gif
+++ b/sfi/images/gif_fuzz_debug.gif
--- a/sfi/images/gif_fuzz_demo_small.gif
+++ b/sfi/images/gif_fuzz_demo_small.gif
--- a/sfi/images/gif_fuzz_demo.gif
+++ b/sfi/images/gif_fuzz_demo.gif
--- a/sfi/images/img_fuzz_pserv_screenshot1.png
+++ b/sfi/images/img_fuzz_pserv_screenshot1.png
--- a/sfi/images/img_fuzz_pserv_screenshot2.png
+++ b/sfi/images/img_fuzz_pserv_screenshot2.png
--- a/sfi/images/img_fuzz_pserv_screenshot3.png
+++ b/sfi/images/img_fuzz_pserv_screenshot3.png
--- a/sfi/images/img_fuzz_results_screenshot1.png
+++ b/sfi/images/img_fuzz_results_screenshot1.png
--- a/sfi/images/img_fuzz_results_screenshot2.png
+++ b/sfi/images/img_fuzz_results_screenshot2.png
--- a/sfi/images/img_fuzz_results_screenshot3.png
+++ b/sfi/images/img_fuzz_results_screenshot3.png
--- a/sfi/images/img_fuzz_results_screenshot4.png
+++ b/sfi/images/img_fuzz_results_screenshot4.png
--- a/sfi/images/img_fuzz_utils_screenshot1.png
+++ b/sfi/images/img_fuzz_utils_screenshot1.png
--- a/sfi/images/img_fuzz_utils_screenshot2.png
+++ b/sfi/images/img_fuzz_utils_screenshot2.png
--- a/sfi/images/img_fuzz_utils_screenshot3.png
+++ b/sfi/images/img_fuzz_utils_screenshot3.png
--- a/sfi/images/img_fuzz_utils_screenshot4.png
+++ b/sfi/images/img_fuzz_utils_screenshot4.png
--- a/sfi/sfi_after_fuzzing.md
+++ b/sfi/sfi_after_fuzzing.md
@ -1,46 +1,33 @@
-# What do I do after fuzzing? (`fuzz-utils.py`)
+# What do I do after fuzzing?

-Once you've completed a fuzzing run, you'll most likely have a few output files whose contents caused your server to crash or hang. (If the fuzzer didn't report any, congratulations! Your server must be pretty robust.) Each of these files contains the exact input that was sent to your server that caused the issue. For example, from the example shown in `how_to_fuzz.md`, we can see a few of the crash files in `./fuzz_out/fuzz0/crashes`:
+Once you've completed a fuzzing run, you'll most likely have a few output files whose contents caused your server to crash or hang. (If the fuzzer didn't report any, congratulations! Your server must be pretty robust.) Each of these files contains the input that was sent to your server that caused the issue. From the example shown in `how_to_fuzz.md`, we can see a few of the crash files in `./fuzz_out_2021-11-16_09-42-39/fuzz1/crashes`:

-![](./images/img_fuzz_utils_screenshot1.png)
-  
-We can open one of these files up. Running `cat ./fuzz_out/fuzz0/crashes/id:000000,sig:11,src:000008+000068,time:193665,op:splice,rep:4` shows us the exact message that caused the server to crash:
+![](./images/img_fuzz_results_screenshot1.png)

+The `LD_PRELOAD` library ("sockstorm") developed for this purpose uses a special file format to represent several connections' data in a single run. Because of this, sending the file straight to your server won't reproduce the exact behavior found by the fuzzer.
+
+(If you'd like to see the details of one of these **comux** files, run `~cs3214/bin/comux -s -i PATH_TO_FILE [-v]` on one to show a summary of how many connections are represented in the file, and what data will be sent to the server.)
+
+Let's look at `./fuzz_out_2021-11-16_09-42-39/fuzz1/crashes/id:000000,sig:11,src:000188+000106,time:86,ss_chunk_havoc`. Based on the file's name (we can see `sig:11`), the fuzzer indicated the server crashed by receiving a SIGSEGV signal (a segmentation fault) when this file's contents were sent to the server.
+
+## Reproducing a Crash or Hang
+
+Once the fuzzer has found a bug in your code, the next logical step would be to reproduce it and debug. This is made easy by the scripts generated inside the fuzzer's output directory (`fuzz-rerun.sh` and `fuzz-rerun-gdb.sh`) Let's say we want to try recreating the supposed SIGSEGV the server received when that file's contents were sent to the server. 
+
+We can take the command straight from the fuzzing summary and modify it to point to the file we're interested in:
+
+```bash
+$ ./fuzz_out_2021-11-16_09-42-39/fuzz-rerun.sh ./fuzz_out_2021-11-16_09-42-39/fuzz1/crashes/id:000000,sig:11,src:000188+000106,time:86,ss_chunk_havoc
 ```
-GET /Opi/login HTTP/1.1
-Host: hornbe/1.1
-Host: hornbeam.rloginJ21564
-Accept-Encoding: identity
-Content-Length:
-GET /api/log 64

-{"username": "user0
+![](./images/img_fuzz_results_screenshot2.png)
+
+As promised, a segmentation fault occurred! Now, let's try running it in GDB:
+
+```bash
+$ ./fuzz_out_2021-11-16_09-42-39/fuzz-rerun-gdb.sh ./fuzz_out_2021-11-16_09-42-39/fuzz1/crashes/id:000000,sig:11,src:000188+000106,time:86,ss_chunk_havoc
 ```
-  
-Based on the name of the file (we can see `sig:11`), your server received a SIGSEGV signal (a segmentation fault). Using `fuzz-utils.py`, we can take a closer look at the file's contents and use it to recreate the same crash.

-## Post-Fuzzing Utilities
+![](./images/img_fuzz_results_screenshot3.png)

-Running `fuzz-utils.py` will produce a help menu, with a few tools you can use in tandem with these crash files:
-
-![](./images/img_fuzz_utils_screenshot2.png)
-
-With the `--tool` switch, you can specify a tool to invoke. They're discussed below.
-
-## Hexdump Tool
-
-To get a better idea of what's inside each crash/hang-inducing output file, a hexdump tool is built into `fuzz-utils.py`. The contents of a file are printed, in hexadecimal, when the script is invoked like so: `fuzz-utils.py --tool hex <path_to_file>`.
-  
-For example, if we take the crash file shown above and run it through the hexdump tool, we get:  
-
-![](./images/img_fuzz_utils_screenshot3.png)
-
-## Sending Tool
-
-To actually reproduce a crash/hang found by AFL++, you'll want to send the exact same input to your server. To do this, launch your server in one terminal, and run this in another: `fuzz-utils.py --tool send <server_address> <server_port>`. The script will read STDIN and send it through a socket to your server. To send the contents of a file, simply use I/O redirection: `fuzz-utils.py --tool send <server_address> <server_port> < <path_to_file>`.
-  
-If we consider the same example as before, we can start our server (something like: `./server -p 13650`), then, on the same machine, run `fuzz-utils.py --tool send 127.0.0.1 13650 < ./fuzz_out/fuzz0/crashes/id:000000,sig:11,src:000008+000068,time:193665,op:splice,rep:4`. When the server tries to parse the input, the segmentation fault will occur.
-
-![](./images/img_fuzz_utils_screenshot4.png)
-  
-With this, you'll be able to debug: launch your server in GCC in one terminal, send the crash file's contents in another, and go to town.
+Again, the SIGSEGV occurred. From here, you can debug in GDB to discover the source of your bug.
--- a/sfi/sfi_concepts_fuzzing.md
+++ b/sfi/sfi_concepts_fuzzing.md
@ -10,11 +10,11 @@ Now, to make sure this fruit slicer works correctly, you decide it's best to tes

 ![](./images/img_fuzzing_analogy1.png)
  
-At this point, you might think your machine is perfect: it cuts every kind of fruit flawlessly. Enter, fuzzing. What happens if you put something into the fruit slicer that it isn't expecting? Some unexpected inputs might be: water, bread, a paper towel, a rock, or a spoon. These aren't fruits, and some of them aren't even food. Will your machine break?
+At this point, you might think your machine is perfect: it cuts every kind of fruit flawlessly. Enter, fuzzing. Take the fruit and surround it by a bunch of rocks, then glue it together and bake it in the oven for twelve hours. We've "mutated" the original input to be something odd and unexpected. Will your machine break?

 ![](./images/img_fuzzing_analogy2.png)
  
-By fuzzing, we can uncover bugs that might later be revealed by attackers or flawed inputs. Identifying such bugs and fixing them will make our programs more robust and secure.
+Now, mutate something else (or the same input a second time), and try it again. Now do it again. Over and over. This is fuzzing: creating unexpected inputs and feeding it to our target repeatedly, in some automated fashion. By fuzzing, we can uncover bugs that might later be revealed by attackers or flawed inputs. Identifying such bugs and fixing them will make our programs more robust and secure.

 ## Fuzzing HTTP Servers

@ -37,4 +37,4 @@ User-Agent: Mozilla/4.0
 Conn---ection: Keep-Alivey
 ```

-As is common with most fuzzers, the input was "mutated" to produce this odd, incorrect HTTP request message. Will your server see the `GONT` and recognize it as an invalid method? Will it realize `HTTP/1.111` is an invalid version number? If your program handles this badly, and ends up crashing or doing something it shouldn't, then fuzzing was a success. An attacker could take advantage of this vulnerability to bring your server down while it's doing important work! Of course, a proactive programmer could also take this knowledge and patch up the problem to prevent future crashes.
+As is common with most fuzzers, the input was "mutated" to produce this odd, incorrect HTTP request message. Will your server see the `GONT` and recognize it as an invalid method? Will it realize `HTTP/1.111` is an invalid version number? If your program handles this badly, and ends up crashing or doing something it shouldn't, then fuzzing was a success. An attacker could take advantage of this vulnerability to bring your server down while it's doing important work. Of course, a proactive programmer could also take this knowledge and patch up the problem to prevent future crashes.
--- a/sfi/sfi_concepts_sockfuzz.md
+++ b/sfi/sfi_concepts_sockfuzz.md
@ -1,19 +0,0 @@
-# Concepts: What is sockfuzz?
-
-AFL and AFL++ are excellent at what they do, but they have limitations. One such limitation is how AFL feeds input to the target program: it only works with programs that read from STDIN or from a file. In many cases, this is sufficient; lots of C programs take their input from STDIN or a file.
-  
-However, this project is about creating a HTTP server. Servers don't read input through a file or STDIN - they read from network sockets. So, the question becomes: how can we force a HTTP server to read input from STDIN, so we can fuzz it with AFL? Additionally, how can we do this without modifying your source code?
-  
-Sockfuzz is a small C library I developed to solve this problem. It works by "overloading" the `accept` system call and running some extra code to establish an internal connection to your server. Using the special `LD_PRELOAD` environment variable, it can convince your server to use sockfuzz's copy of `accept`, rather than the actual system call.
-
-![](./images/img_sockfuzz_diagram1.png)
-  
-Once called, sockfuzz only allows one thread to finish the call to `accept`. The others are forced to block on a call to `sem_wait`. The one thread that is allowed through runs code that makes a connection to the server, spawns two threads, and calls the _real_ `accept` system call, returning its value. From your point of view, your server behaves just about the same when preloaded with sockfuzz, apart from using only one its threads and setting up that internal connection.
-  
-A screenshot of sockfuzz's overloaded `accept` function shows what your server's threads will do when they call sockfuzz's version of the function:
-
-![](./images/img_sockfuzz_code1.png)
-  
-The two threads that get spawned are designated as the "input thread" and the "output thread." The input thread's job is to read STDIN (until EOF is reached) and feed it through the open network socket to the server. Once STDIN is exhausted, it exits. The output thread's job is to receive bytes from the network socket and send them straight to STDOUT. Once the connection is closed, this thread exits. Collectively, these two threads form a system to send the contents of STDIN to your server and dump the server's response to STDOUT.
-
-![](./images/img_sockfuzz_example1.png)
--- a/sfi/sfi_concepts_sockstorm.md
+++ b/sfi/sfi_concepts_sockstorm.md
@ -0,0 +1,17 @@
+# Concepts: What is sockfuzz?
+
+AFL and AFL++ are excellent at what they do, but they have limitations. One such limitation is how AFL feeds input to the target program: it only works with programs that read from STDIN or from a file. In many cases, this is sufficient; lots of C programs take their input from STDIN or a file.
+  
+However, this project is about creating a HTTP server. Servers don't read input through a file or STDIN - they read from network sockets. So, the question becomes: how can we force a HTTP server to read input from STDIN, so we can fuzz it with AFL? Additionally, how can we do this without modifying your source code?
+  
+Sockstorm is a C library I developed to solve this problem. It works by "overloading" the `accept` system call and running some extra code to establish an internal connection to your server. Using the special `LD_PRELOAD` environment variable, it can convince your server to use sockfuzz's copy of `accept`, rather than the actual system call.
+
+## Connection Multiplexing
+
+Once called, sockstorm's version of the `accept` system call spawns a controller thread. This controller threads reads input via stdin, expecting a specific file format (dubbed the **comux** file format). These comux files are designed to specify the data to be sent to the target server across multiple connections. The controller thread parses the input file, then spawns individual threads to send "chunks" of data to the target server across specific connections.
+
+This approach allows for multiple internal client connections to be made to your server, increasing the probability of finding multithreading-related bugs. As a bonus, it requires *zero* modification to your source code. All you have to do is prepend `LD_PRELOAD=/path/to/sockstorm-preload.so` to your command-line invocation of your server, then pipe one of these comux files to your process via stdin.
+
+## AFL++ Custom Mutator
+
+The other half of sockstorm is an AFL++ custom mutator. AFL++ does great when fuzzing many programs on its own, but for more complex file formats (such as the **comux** files being used here), a custom mutator can be implemented to ensure the file's structure doesn't get overwritten during fuzzing. Sockstorm's mutator (`sockstorm-mutator.so`) does just that; it maintains the structure of each comux file while also randomly modifying (fuzzing) the connection data to be sent to the target server.
--- a/sfi/sfi_how_to_fuzz.md
+++ b/sfi/sfi_how_to_fuzz.md
@ -8,22 +8,22 @@ Fuzzing your server is as simple as typing `fuzz-pserv.py --src-dir <your_src_di
  
 To understand everything displayed on the status screen, check out [AFL++'s documentation](https://aflplus.plus/docs/status_screen/). You'll probably be most interested in the "overall results" section of the status screen, displayed in the top-right corner. This gives a report of all unique crashes and hangs, as well as how many "paths" the fuzzer has discovered. (A "path" describes a unique path of code executed by your server. A "unique" crash/hang describes a crash/hang that was found on one such path.)

-## Parallel Fuzzing
+## Research Participation

-By default, this script invokes AFL++ using a single core on the system. However, you can specify any number of cores (up to the maximum) to spawn _multiple_ AFL++ processes (one on each core). These processes work together to find crashes/hangs - as a whole, they can typically find more bugs faster than a single process on a single core.
-  
-You can use the `--fuzz-cores` switch to specify the number of cores you wish to use.
+When the fuzzing script starts, you'll be presented with a brief menu asking about research participation. This project is part of Connor Shugg's M.S. Thesis project work. If you choose to grant consent, your source code and fuzzer results will be collected and stored in a secure location for research purposes. Before making a decision, please read the forum post and full consent form displayed on the course forum and course website.

-### Timing/Core Limits
+## Fuzzing Results

-As you might know, RLogin can get pretty cluttered as we move closer to project deadlines. Parallel fuzzing is very effective, but using too many cores on a machine can prevent others from getting work done. Because of this, limits are established to prevent any one student from fuzzing with too many cores for too long.
-  
-This limit is described in "CPU-Seconds" - a maximum number of time you can fuzz that varies with the number of cores you use. The more cores you specify with `--fuzz-cores`, the less maximum time you'll be allowed to run AFL++. Using a single core (the default), you can run AFL++ for the longest time. Using two cores, you can run AFL++ for half that time. With three cores, you can run for a third of that time. (And so on.)
-
-### Fuzzing Results
-
-Once AFL++ has terminated (either by timeout or by Ctrl-C), the script will print a summary of the crashes/hangs that were found. By default, the output directory will be placed in your pserv's src directory (specified by `--src-dir`). However, you can use the `--out-dir` switch to specify otherwise.
+Once AFL++ has terminated, (either when finding a bug or after a set timeout) the script will print a summary of the crashes/hangs that were found. By default, the output directory will be placed in your pserv's src directory (specified by `--src-dir`). However, you can use the `--out-dir` switch to specify otherwise.

 ![](./images/img_fuzz_pserv_screenshot2.png)
-  
-If crashes or hangs are found, the directories containing the crash-inducing input files are listed in the summary. Time to investigate those bugs!
+
+If crashes or hangs are found, the directories containing the crash-inducing input files are listed in the summary. Two shell scripts will be generated and placed within the output directory. Simply give those a run to reproduce the crashes and get debugging!
+
+## Extra Credit
+
+Once fuzzing has finished and the fuzzing summary has been printed, you might notice a message regarding extra credit being printed:
+
+![](./images/img_fuzz_pserv_screenshot3.png)
+
+Using this fuzzer allows you the chance to earn extra credit on project 4. This extra credit scores you for how well your server performed under fuzzing for certain time periods. A `.tar` file is produced after each fuzzing run and is submittable to a special `p4-ex` grading endpoint for extra credit evaluation.
--- a/sfi/sfi_overview.md
+++ b/sfi/sfi_overview.md
@ -4,7 +4,7 @@ The security of computer systems is extremely important. If vulnerabilities exis
  
 Web servers are one such type of computer system, and since most are directly connected to the internet, they're tested (and often deliberately attacked) every day by thousands of users. How can we be sure a web server can gracefully handle any sort of input?
  
-Some may argue that it's impossible to uncover _every_ bug in a system. But, we as computer scientists and computer engineers can use some effective techniques to catch most of them. Fuzzing is one such technique. This "fuzzing interface" allows you to utilize AFL++ (an advanced fuzzer) along with a special `LD_PRELOAD` library (called "sockfuzz") to fuzz your pserv implementation. This will help you uncover any bugs in your code that cause your server to crash or hang.
+Some may argue that it's impossible to uncover _every_ bug in a system. But, we as computer scientists and computer engineers can use some effective techniques to catch most of them. Fuzzing is one such technique. This "fuzzing interface" allows you to utilize AFL++ (an advanced fuzzer) along with a special `LD_PRELOAD` library (called "sockstorm") to fuzz your pserv implementation. This will help you uncover any bugs in your code that cause your server to crash or hang.

 A quick crash-course on how to get started is below. However, many more useful details can be found throughout the documentation.

@ -14,30 +14,29 @@ A quick crash-course on how to get started is below. However, many more useful d

 - [What is fuzzing?](./sfi_concepts_fuzzing.md)
 - [What is AFL++?](./sfi_concepts_afl.md)
- [What is sockfuzz?](./sfi_concepts_sockfuzz.md)
+- [What is sockstorm?](./sfi_concepts_sockstorm.md)

 ### **Fuzzing Interface**

 - [How do I fuzz my server?](./sfi_how_to_fuzz.md) (`fuzz-pserv.py`)
- [What do I do after fuzzing?](./sfi_after_fuzzing.md) (`fuzz-utils.py`)
+- [What do I do after fuzzing?](./sfi_after_fuzzing.md)

 ## Quickstart: Fuzzing your Server

 To fuzz your server, do the following:

-1.  Run `fuzz-pserv.py --src-dir <path_to_your_pserv_src_dir>`
+1.  Run `fuzz-pserv.py --src-dir /path/to/your/pserv/src`
 2.  Wait for it to finish (might be a while), or hit Ctrl-C once you're satisfied.
 3.  Look at the summary: if any crashes or hangs are found, use these files to debug.

-![](./images/gif_fuzz_demo.gif)
+![](./images/gif_fuzz_run.gif)

 ## Quickstart: Reproducing a Crash/Hang

 To reproduce a crash found by the fuzzer, do the following:

 1.  Choose one of the crash files in the output directory.
-2.  Run your server in one terminal.
-3.  Open another terminal and run: `fuzz-utils.py --tool send <server_address> <server_port> < <path_to_crash_file>`
+2.  Run the following command: `/path/to/fuzzer_output/fuzz-rerun-gdb.sh < /path/to/crash/file`
 4.  Investigate! Your server should crash or hang, depending on the file you chose.

-![](./images/gif_crash_reproduce_demo.gif)
+![](./images/gif_fuzz_debug.gif)