skip to content
← Go back

Oh For Fuzz Sakes,

Oh For Fuzz Sakes,

One of the most powerful tools that a program analysis tool can use is fuzzing, where we generate a multitude of inputs against the execution a source code.

import MailingList from ”@/components/blog/NewsletterForm”;

https://twitter.com/lmc_security/status/1677279710793932800?s=20

Formal Representation

todo…

Invariant Testing

The only way to generalise invariant generation is to have a main goal - can I extract ETH, tokens or mutate the state to my will.

Data Flow Analysis

Symbolic Analysis

invariants and symbolic execution are hugely intertwined. and fuzzing leverages invariants as what to look for.

Static Analysis Intro

Static Analysis Techniques

  1. Bound Analysis: Discovers the upper and lower bounds or limits that a variable can be.
  2. Taint Analysis: Tracks the flow of data and identifies which variables were influenced beyond that point.
  3. Abstract Interpretation: Turning each variable algebretic expressions. E.g. if x has the bounds [a, b] and we were to add z to x with [c, d] bounds. Then the resulting z will be [a + c, b + d]. This gives us the possible range of values z can possibly be. Anything over that is an error or unexpected bug - e.g. when an attack might influence w/ division by zero or overflow/underflows.
  4. Access Control Analysis: Identifies which control flows are accessible through authorization and which ones are not. E.g. finding if something uses onlyOwner() modifier.
  5. Automated Theorem Proving (ATP): Once the bytecode is in a suitable formal representation, ATP techniques are used to analyze and reason about the properties of interest. This may involve applying logical inference rules, employing automated deduction algorithms, or utilizing specialized techniques such as model checking or symbolic execution.

Fuzzing Intro

There is an astounding difference a normal fuzzer and a smart one. It’s quite simple and easy to create a fuzzer, but to make it smart and efficient is a whole other ball game.

The most effective fuzzing technique depends on the specific goals and constraints. However, input quality and generation significantly impact the effectiveness of a fuzzer.iv

Fuzzing Techniques

  1. Mutation Based Fuzzing: Create random input parameters and slightly adjust existing ones to run that against the execution code to see if it yields a different result. They user defined constraints and rules to form the mutations. E.g, if execution reverts add 1 to each parameter and run again and break out if we reach maximum number.
  2. Generation Based Fuzzing: Using a set of predefined specifications the program knows what ranges of inputs to create from scratch. E.g. for an ERC20, we know there is the function transferFrom(address from, address to, uint256 amount) and so our specification will be address, address and uint256, or the bounds will be 0 to 20 bytes for the first 2 parameters and 0 to 32 bytes for the last parameter, and this will never change. Or on a grander scale, we know UniswapV2 has [swap, mint, burn] for their core functions that interact with the ERC20 standard. So we can generate order of operations in a way that adheres to this specification of all functions in UniswapV2 and ERC20, e.g. one that will pass will be [approve, transferFrom, swap] and one that will fail is [transferFrom, swap] (since no approve to move the tokens).
  3. Coverage Guided Fuzzing: Using a feedback system of how far a specific input set got down a pathway in the control flow, it adjusts to try and discover unexplored pathways in the control flow. Once a new flow is identified it will try to follow all if statement branches it found. So when we reach an if statement there are two possibilities that can occur - true or false. The aim is to cover both and see where it leads us.
  4. Evolutionary Fuzzing: By generating and “evolving” test cases iteratively based on feedback derived from previous executions, the fuzzer is able to evaluate whether an action was success or not and mutate it to attempt to go deeper into the path it’s taken or realize it should switch to another path. The fuzzer initially populates the inputs which can be randomly generated or derived from known valid/invalid inputs. These are then executed and evaluated, given a fitness score based on specific criteria, e.g. errors encountered, state changes or code coverage. Tests w/ higher scores are selected as the contenders of the next population as they yield the most potential for progress. The new contenders are mutated slightly to attempt to progress further, involving characteristics from the parent population. This process is repeated over multiple generations to get our fuzzer to a state of improved understanding.
  5. Stateful Fuzzing: Explicitly tracks the contract’s previous state and interactions with the previous inputs as part of the fuzzing strategy. For example, the initial input caused an SSTORE to store one element on an enum and a mutate version of the initial input triggers another element of the enum. We’re checking for different state changes.
  6. Non-stateful Fuzzing: Doesn’t explicit track the previous state and inputs of the last fuzz round. Each input is executed independently of the previous state of the contract. E.g. we wouldn’t remember there was an input that generated another element of the enum. This tactic allows for faster exploration than stateful fuzzing because it doesn’t need to check previous states, however, it may not be suitable for uncovering vulnerabilities that specifically rely on the contract’s state or interactions that span multiple inputs or transactions.
  7. Ad-hoc Fuzzing: An unstructured fuzzing technique where inputs are generated and executed without a predefined strategy. Typically, inputs undergo minimal modifications from existing inputs.
  8. Structured Fuzzing: A technique that applies structured input generation strategies (e.g. increment by 1 until 10 then increment by 10 until 100, etc) to systematically explore the input space, as apposed to random or ad-hoc fuzzing.
  9. Differential Fuzzing: Involves comparing the behavior of multiple implementations of the same protocol in different contexts. E.g. a yield farming vault will have the same traits but the underlying tokens being used have their own unique functionality (e.g. tax or rebase) which may cause the vault to have some unexpected edge-cases.
  10. In-Memory Fuzzing: Switching out variables of a running program to see how different states react to altered inputs. So, instead of running a program from the start you can save the context and alter a memory slot, for example to see if it causes a condition to pass. Then you know what memory is required to bypass the condition.

Fuzzing Architectures

  1. Hybrid Fuzzers: A combination of different fuzzing techniques to leverage their strengths. For example we can use mutation based fuzzing to generate the inputs for external calls that occur within our specification of a custom contract. Then we will go down each path possible to cover as much as we can.
  2. Intelligent Fuzzers: These are the ones that make fuzzers effective. You will have to employ advanced techniques such as symbolic execution, to effectively fuzz what’s fuzzable and something along the lines of constraint solver to satisfy conditions to reach the paths you wouldn’t be able to if you didn’t pass the check. The only downside is the time complexity. Due to all the possibilities you will need to gather as much information about the program as possible to employ a time effective strategy to prevent waiting days for the program to solve every relevant possibility.
  3. Greybox Fuzzers: Typically these have limited information, namely just the source code’s binary/bytecode. They use static analysers and symbolic execution to collect information about the program to guide the creation and mutation of inputs intelligently. E.g. you may have the contract’s bytecode and state from onchain and that’s it.
  4. Blackbox Fuzzers: With no knowledge of the structure of the program, these focus on generating inputs to try and extract information out of the program via seeing what type of error occurs, if it succeeds, etc. Then from these outputs it can formulate code coverage and so forth. Tldr; given a binary containing multiple embedded languages and good luck.

Final

I appreciate you for taking the time to read this article. I hope you found value in this, anon! , please share with your frens and free to support me 0x82828f6aFf831e0D8b366D7b33caf12B39232772 :)

Although it’s a short article, I hope you find value in this technique and use it in your endeavours!

Resources

Share this Article

Recent Articles