Fuzzing Documentation

2021-04-14 00:33:52 -04:00 · 2021-04-14 00:33:52 -04:00 · da34c84933
commit da34c84933
parent 9919b5b4b9
26 changed files with 238 additions and 5 deletions
--- a/README.md
+++ b/README.md
@ -1,10 +1,12 @@

+# Personal Server
 This repository contains the base files for the CS 3214
 "Personal Secure Server" project.

-To get started, run the script:
-
-. install-dependencies.sh
-
-Then cd into src and type make.
+- `src` - contains the base code's source files.
+- `tests` - contains unit tests, performance tests, and associated files.
+- `react-app` - contains a JavaScript web app.
+- `fuzz` - contains documentation for the 'server fuzzing interface'.

+## Get Started
+Run the script: `./install-dependencies.sh`. Then, `cd` into `src` and type `make` to build the base code.
--- a/install-dependencies.sh
+++ b/install-dependencies.sh
--- a/sfi/images/gif_crash_reproduce_demo.gif
+++ b/sfi/images/gif_crash_reproduce_demo.gif
--- a/sfi/images/gif_crash_reproduce_demo_small.gif
+++ b/sfi/images/gif_crash_reproduce_demo_small.gif
--- a/sfi/images/gif_fuzz_demo.gif
+++ b/sfi/images/gif_fuzz_demo.gif
--- a/sfi/images/gif_fuzz_demo_small.gif
+++ b/sfi/images/gif_fuzz_demo_small.gif
--- a/sfi/images/img_afl_instrumentation.png
+++ b/sfi/images/img_afl_instrumentation.png
--- a/sfi/images/img_afl_processes.png
+++ b/sfi/images/img_afl_processes.png
--- a/sfi/images/img_afl_status_screen.png
+++ b/sfi/images/img_afl_status_screen.png
--- a/sfi/images/img_fuzz_pserv_screenshot1.png
+++ b/sfi/images/img_fuzz_pserv_screenshot1.png
--- a/sfi/images/img_fuzz_pserv_screenshot2.png
+++ b/sfi/images/img_fuzz_pserv_screenshot2.png
--- a/sfi/images/img_fuzz_utils_screenshot1.png
+++ b/sfi/images/img_fuzz_utils_screenshot1.png
--- a/sfi/images/img_fuzz_utils_screenshot2.png
+++ b/sfi/images/img_fuzz_utils_screenshot2.png
--- a/sfi/images/img_fuzz_utils_screenshot3.png
+++ b/sfi/images/img_fuzz_utils_screenshot3.png
--- a/sfi/images/img_fuzz_utils_screenshot4.png
+++ b/sfi/images/img_fuzz_utils_screenshot4.png
--- a/sfi/images/img_fuzzing_analogy1.png
+++ b/sfi/images/img_fuzzing_analogy1.png
--- a/sfi/images/img_fuzzing_analogy2.png
+++ b/sfi/images/img_fuzzing_analogy2.png
--- a/sfi/images/img_sockfuzz_code1.png
+++ b/sfi/images/img_sockfuzz_code1.png
--- a/sfi/images/img_sockfuzz_diagram1.png
+++ b/sfi/images/img_sockfuzz_diagram1.png
--- a/sfi/images/img_sockfuzz_example1.png
+++ b/sfi/images/img_sockfuzz_example1.png
--- a/sfi/sfi_after_fuzzing.md
+++ b/sfi/sfi_after_fuzzing.md
@ -0,0 +1,46 @@
+# What do I do after fuzzing? (`fuzz-utils.py`)
+
+Once you've completed a fuzzing run, you'll most likely have a few output files whose contents caused your server to crash or hang. (If the fuzzer didn't report any, congratulations! Your server must be pretty robust.) Each of these files contains the exact input that was sent to your server that caused the issue. For example, from the example shown in `how_to_fuzz.md`, we can see a few of the crash files in `./fuzz_out/fuzz0/crashes`:
+
+![](./images/img_fuzz_utils_screenshot1.png)
+  
+We can open one of these files up. Running `cat ./fuzz_out/fuzz0/crashes/id:000000,sig:11,src:000008+000068,time:193665,op:splice,rep:4` shows us the exact message that caused the server to crash:
+
+```
+GET /Opi/login HTTP/1.1
+Host: hornbe/1.1
+Host: hornbeam.rloginJ21564
+Accept-Encoding: identity
+Content-Length:
+GET /api/log 64
+
+{"username": "user0
+```
+  
+Based on the name of the file (we can see `sig:11`), your server received a SIGSEGV signal (a segmentation fault). Using `fuzz-utils.py`, we can take a closer look at the file's contents and use it to recreate the same crash.
+
+## Post-Fuzzing Utilities
+
+Running `fuzz-utils.py` will produce a help menu, with a few tools you can use in tandem with these crash files:
+
+![](./images/img_fuzz_utils_screenshot2.png)
+
+With the `--tool` switch, you can specify a tool to invoke. They're discussed below.
+
+## Hexdump Tool
+
+To get a better idea of what's inside each crash/hang-inducing output file, a hexdump tool is built into `fuzz-utils.py`. The contents of a file are printed, in hexadecimal, when the script is invoked like so: `fuzz-utils.py --tool hex <path_to_file>`.
+  
+For example, if we take the crash file shown above and run it through the hexdump tool, we get:  
+
+![](./images/img_fuzz_utils_screenshot3.png)
+
+## Sending Tool
+
+To actually reproduce a crash/hang found by AFL++, you'll want to send the exact same input to your server. To do this, launch your server in one terminal, and run this in another: `fuzz-utils.py --tool send <server_address> <server_port>`. The script will read STDIN and send it through a socket to your server. To send the contents of a file, simply use I/O redirection: `fuzz-utils.py --tool send <server_address> <server_port> < <path_to_file>`.
+  
+If we consider the same example as before, we can start our server (something like: `./server -p 13650`), then, on the same machine, run `fuzz-utils.py --tool send 127.0.0.1 13650 < ./fuzz_out/fuzz0/crashes/id:000000,sig:11,src:000008+000068,time:193665,op:splice,rep:4`. When the server tries to parse the input, the segmentation fault will occur.
+
+![](./images/img_fuzz_utils_screenshot4.png)
+  
+With this, you'll be able to debug: launch your server in GCC in one terminal, send the crash file's contents in another, and go to town.
--- a/sfi/sfi_concepts_afl.md
+++ b/sfi/sfi_concepts_afl.md
@ -0,0 +1,54 @@
+# Concepts: What is AFL++?
+
+AFL++ is a **fuzzer**: a program specifically designed to craft crash-inducing inputs and feed them to a target program (your server). It's actually an upgraded, supercharged spinoff of AFL. (AFL is short for "American Fuzzy Lop") The original AFL was developed by [Michał "lcamtuf" Zalewski](https://lcamtuf.coredump.cx/), a Polish white-hat hacker and security expert.
+
+*   **AFL++**: [Website](https://aflplus.plus/)
+*   **AFL++**: [GitHub Repository](https://github.com/AFLplusplus/AFLplusplus)
+*   **AFL**: [Website](https://lcamtuf.coredump.cx/afl/)
+*   **AFL**: [GitHub Repository](https://github.com/google/AFL)
+
+(For the sake of this explanation, we'll refer to both AFL++ and AFL as just "AFL".)
+
+## Source-Code Instrumentation
+
+AFL is a fuzzer designed specifically for C programs. With a special compiler, it instruments the target program at compile-time with small blocks of assembly. This assembly acts as a series of indicators to help AFL understand how the program behaves when executed with a certain input.
+
+![](./images/img_afl_instrumentation.png)
+  
+This instrumentation accesses a shared memory space between the target program and AFL used to count the executions of basic blocks. A **basic block** is a straight-line code sequence with a single entry point and single exit point. (In assembly, this is a chunk of code no branch instructions, save the entry and exit branches. In C, you can roughly think of these as small chunks of code inside if-statements, for-loops, and other control structures.)
+  
+Thanks to this, AFL is able to determine exactly how the target program behaves when supplied with a certain input, down to each control structure the programmer wrote.
+
+## Input Generation
+
+Many "blind" fuzzers (those that _don't_ benefit from source-code instrumentation) generate random inputs to send to a target program. While this can have some success, it largely depends on the luck of the draw. AFL isn't blind: it is able to learn from the instrumented binary by watching which inputs create new behaviors. Armed with this, it recycles old "interesting" inputs and further mutates them to narrow down specific problems in the target program.
+  
+In order to do this, there's one caveat: AFL needs an "input corpus" - that is, a set of files that depict typical input for the target program. For example: if we were fuzzing a program that reads in two numbers from STDIN, we might provide these "typical inputs" to AFL:
+
+```
+2 3
+```
+
+```
+-7 12
+```
+
+```
+506 2323
+```
+
+New "interesting" inputs (inputs that create new behavior in the target program) are stored in a queue, from/to which AFL pulls/pushes while fuzzing.
+
+### Fuzzing the Target
+
+Once started, AFL runs indefinitely, unless specified otherwise. As it runs, it executes the target program several times per second, supplying an unique input and watching how it reacts. If a crash or timeout occurs, the input that caused the issue is stored in a separate location as part of the fuzzer's output.
+  
+While it's running, the user is presented with a neat status screen that depicts the fuzzer's progress. You can watch this to see the AFL's progress:
+
+![](./images/img_afl_status_screen.png)
+  
+AFL works by repeatedly spawning a child process to execute the target program, then waiting for the child process to terminate. If AFL sees that the child process exited by receiving a deadly signal, it flags it as a crash. If AFL sees the child process didn't exit after a certain amount of time, it flags it as a hang.
+
+![](./images/img_afl_processes.png)
+  
+Once fuzzing has completed, the user can view the crash-inducing and hang-inducing input files AFL found. They can be used to recreate (and hopefully fix!) the discovered bugs.
--- a/sfi/sfi_concepts_fuzzing.md
+++ b/sfi/sfi_concepts_fuzzing.md
@ -0,0 +1,40 @@
+# Concepts: What is Fuzzing?
+
+**Fuzzing** is a software testing and security technique that involves giving a program unexpected input, with the intention of crashing the program or altering its behavior. It's somewhat comparable to unit testing, but there's an important difference.
+
+## An Analogy: Slicing Fruit
+
+Let's say you've created a machine, and you've called it the "fruit slicer." You created it to accomplish one specific task: cut fruit into slices. You can put fruit in it, and it's supposed to cut it up and spit the fruit slices out. (In our analogy, the fruit slicer is like a C program.)
+  
+Now, to make sure this fruit slicer works correctly, you decide it's best to test it out on various types of fruits. You put in an apple, a pear, a peach, some grapes, a banana, and every other fruit you can get your hands on. The fruit slicer works great! (This would be considered **unit testing**.)
+
+![](./images/img_fuzzing_analogy1.png)
+  
+At this point, you might think your machine is perfect: it cuts every kind of fruit flawlessly. Enter, fuzzing. What happens if you put something into the fruit slicer that it isn't expecting? Some unexpected inputs might be: water, bread, a paper towel, a rock, or a spoon. These aren't fruits, and some of them aren't even food. Will your machine break?
+
+![](./images/img_fuzzing_analogy2.png)
+  
+By fuzzing, we can uncover bugs that might later be revealed by attackers or flawed inputs. Identifying such bugs and fixing them will make our programs more robust and secure.
+
+## Fuzzing HTTP Servers
+
+Since this project is about writing a HTTP server, let's look at a simple example of what fuzzing might look like with HTTP messages. Consider this simple message:
+
+```
+GET /index.html HTTP/1.1
+Host: localhost:13650
+User-Agent: Mozilla/4.0
+Connection: Keep-Alive
+```
+
+This is a simple GET request we might send to a HTTP server to retrieve the contents of `index.html`. Any correctly-implemented server should handle this just fine. What happens if we "fuzz" this input before sending it to the server?
+
+```
+GONT /index.htlol HTTP/1.111
+H_?ost: localhost:13650
+User-Agent: Mozilla/4.0
+  
+Conn---ection: Keep-Alivey
+```
+
+As is common with most fuzzers, the input was "mutated" to produce this odd, incorrect HTTP request message. Will your server see the `GONT` and recognize it as an invalid method? Will it realize `HTTP/1.111` is an invalid version number? If your program handles this badly, and ends up crashing or doing something it shouldn't, then fuzzing was a success. An attacker could take advantage of this vulnerability to bring your server down while it's doing important work! Of course, a proactive programmer could also take this knowledge and patch up the problem to prevent future crashes.
--- a/sfi/sfi_concepts_sockfuzz.md
+++ b/sfi/sfi_concepts_sockfuzz.md
@ -0,0 +1,19 @@
+# Concepts: What is sockfuzz?
+
+AFL and AFL++ are excellent at what they do, but they have limitations. One such limitation is how AFL feeds input to the target program: it only works with programs that read from STDIN or from a file. In many cases, this is sufficient; lots of C programs take their input from STDIN or a file.
+  
+However, this project is about creating a HTTP server. Servers don't read input through a file or STDIN - they read from network sockets. So, the question becomes: how can we force a HTTP server to read input from STDIN, so we can fuzz it with AFL? Additionally, how can we do this without modifying your source code?
+  
+Sockfuzz is a small C library I developed to solve this problem. It works by "overloading" the `accept` system call and running some extra code to establish an internal connection to your server. Using the special `LD_PRELOAD` environment variable, it can convince your server to use sockfuzz's copy of `accept`, rather than the actual system call.
+
+![](./images/img_sockfuzz_diagram1.png)
+  
+Once called, sockfuzz only allows one thread to finish the call to `accept`. The others are forced to block on a call to `sem_wait`. The one thread that is allowed through runs code that makes a connection to the server, spawns two threads, and calls the _real_ `accept` system call, returning its value. From your point of view, your server behaves just about the same when preloaded with sockfuzz, apart from using only one its threads and setting up that internal connection.
+  
+A screenshot of sockfuzz's overloaded `accept` function shows what your server's threads will do when they call sockfuzz's version of the function:
+
+![](./images/img_sockfuzz_code1.png)
+  
+The two threads that get spawned are designated as the "input thread" and the "output thread." The input thread's job is to read STDIN (until EOF is reached) and feed it through the open network socket to the server. Once STDIN is exhausted, it exits. The output thread's job is to receive bytes from the network socket and send them straight to STDOUT. Once the connection is closed, this thread exits. Collectively, these two threads form a system to send the contents of STDIN to your server and dump the server's response to STDOUT.
+
+![](./images/img_sockfuzz_example1.png)
--- a/sfi/sfi_how_to_fuzz.md
+++ b/sfi/sfi_how_to_fuzz.md
@ -0,0 +1,29 @@
+# How do I fuzz my server? (`fuzz-pserv.py`)
+
+Fuzzing your server can be done using the `fuzz-pserv.py` python script, located in the CS 3214 bin folder on RLogin (`/home/courses/cs3214/bin/fuzz-pserv.py`). To get started, simply run `fuzz-pserv.py` - you'll be presented with a help menu:
+
+![](./images/img_fuzz_pserv_screenshot1.png)
+  
+Fuzzing your server is as simple as typing `fuzz-pserv.py --src-dir <your_src_dir>`. The script will compile your code with AFL++'s compiler, perform a small test run, then launch AFL++. You'll be presented with the AFL++ status screen. You can choose to either wait until the fuzzer times out (this time varies - see below), or you can use Ctrl-C to terminate it.
+  
+To understand everything displayed on the status screen, check out [AFL++'s documentation](https://aflplus.plus/docs/status_screen/). You'll probably be most interested in the "overall results" section of the status screen, displayed in the top-right corner. This gives a report of all unique crashes and hangs, as well as how many "paths" the fuzzer has discovered. (A "path" describes a unique path of code executed by your server. A "unique" crash/hang describes a crash/hang that was found on one such path.)
+
+## Parallel Fuzzing
+
+By default, this script invokes AFL++ using a single core on the system. However, you can specify any number of cores (up to the maximum) to spawn _multiple_ AFL++ processes (one on each core). These processes work together to find crashes/hangs - as a whole, they can typically find more bugs faster than a single process on a single core.
+  
+You can use the `--fuzz-cores` switch to specify the number of cores you wish to use.
+
+### Timing/Core Limits
+
+As you might know, RLogin can get pretty cluttered as we move closer to project deadlines. Parallel fuzzing is very effective, but using too many cores on a machine can prevent others from getting work done. Because of this, limits are established to prevent any one student from fuzzing with too many cores for too long.
+  
+This limit is described in "CPU-Seconds" - a maximum number of time you can fuzz that varies with the number of cores you use. The more cores you specify with `--fuzz-cores`, the less maximum time you'll be allowed to run AFL++. Using a single core (the default), you can run AFL++ for the longest time. Using two cores, you can run AFL++ for half that time. With three cores, you can run for a third of that time. (And so on.)
+
+### Fuzzing Results
+
+Once AFL++ has terminated (either by timeout or by Ctrl-C), the script will print a summary of the crashes/hangs that were found. By default, the output directory will be placed in your pserv's src directory (specified by `--src-dir`). However, you can use the `--out-dir` switch to specify otherwise.
+
+![](./images/img_fuzz_pserv_screenshot2.png)
+  
+If crashes or hangs are found, the directories containing the crash-inducing input files are listed in the summary. Time to investigate those bugs!
--- a/sfi/sfi_overview.md
+++ b/sfi/sfi_overview.md
@ -0,0 +1,43 @@
+# PServ Fuzzing: Overview
+
+The security of computer systems is extremely important. If vulnerabilities exist in the underlying systems used to complete tasks, exchange important information, communicate with others, etc., a cunning attacker could deal some serious damage.
+  
+Web servers are one such type of computer system, and since most are directly connected to the internet, they're tested (and often deliberately attacked) every day by thousands of users. How can we be sure a web server can gracefully handle any sort of input?
+  
+Some may argue that it's impossible to uncover _every_ bug in a system. But, we as computer scientists and computer engineers can use some effective techniques to catch most of them. Fuzzing is one such technique. This "fuzzing interface" allows you to utilize AFL++ (an advanced fuzzer) along with a special `LD_PRELOAD` library (called "sockfuzz") to fuzz your pserv implementation. This will help you uncover any bugs in your code that cause your server to crash or hang.
+
+A quick crash-course on how to get started is below. However, many more useful details can be found throughout the documentation.
+
+## Table of Contents
+
+### **Concepts**
+
+- [What is fuzzing?](./sfi_concepts_fuzzing.md)
+- [What is AFL++?](./sfi_concepts_afl.md)
+- [What is sockfuzz?](./sfi_concepts_sockfuzz.md)
+
+### **Fuzzing Interface**
+
+- [How do I fuzz my server?](./sfi_how_to_fuzz.md) (`fuzz-pserv.py`)
+- [What do I do after fuzzing?](./sfi_after_fuzzing.md) (`fuzz-utils.py`)
+
+## Quickstart: Fuzzing your Server
+
+To fuzz your server, do the following:
+
+1.  Run `fuzz-pserv.py --src-dir <path_to_your_pserv_src_dir>`
+2.  Wait for it to finish (might be a while), or hit Ctrl-C once you're satisfied.
+3.  Look at the summary: if any crashes or hangs are found, use these files to debug.
+
+![](./images/gif_fuzz_demo.gif)
+
+## Quickstart: Reproducing a Crash/Hang
+
+To reproduce a crash found by the fuzzer, do the following:
+
+1.  Choose one of the crash files in the output directory.
+2.  Run your server in one terminal.
+3.  Open another terminal and run: `fuzz-utils.py --tool send <server_address> <server_port> < <path_to_crash_file>`
+4.  Investigate! Your server should crash or hang, depending on the file you chose.
+
+![](./images/gif_crash_reproduce_demo.gif)