Smart Contract Obfuscation Techniques
13 min read
How do you prevent MEV frontrunners from stealing your transactions, copying your smart contracts and understanding your strategies built into your smart contracts on-chain? Let me take you into the depths of the dark forest, where bleeding-edge smart contract bytecode obfuscation techniques are developed to keep your secrets hidden for longer.
Intro
For the past couple months I’ve dived into the world of reverse engineering via building a bytecode decoder capable of constructing the storage layout of a contract, discover all possible function pathways, find vulnerabilities, and more. Whilst working on this I’ve faced many challenges that caused my brain work in overdrive. One of said challenges, and one of the more interesting concepts I’ve discovered is bytecode obfuscation. Placing specific pieces of bytecode in locations to hinder people like me from understanding the nuances of an unverified smart contract’s bytecode to prevent alpha from being leaked, e.g. a strategy of somesorts.
To be able to come up with obfuscation techniques, one is required to know how to write low level contracts, with either Huff or raw bytecode. The reason behind this is due to compilers having a deterministic way of setting things up, e.g. jump tables and optimization techniques - which can be easily integrated into decoder to detect. When you understand the standard(s) only then are you able to come up with out-of-the-box problems and solutions.
Having said this, I hope to inspire you and new techniques to be released for myself to decode :P (I know, how selfish of me!).
Lets twindle with some bytecode bits and bytes!
Function Selector Matching
Currently, compiled contracts create jump matchers that cycle all function selectors within the
contract checking to see if the provided calldata’s 4-far-left-bytes (0x00000000
) is equal to an
existing 4-byte selector. If no match is identified, the provided calldata doesn’t target any
function within the contract.
To learn more about calldata, feel free to read my article Reversing The EVM: Raw Calldata. It’ll teach you how to read raw calldata with a step-by-step examples and help you to understand the rest of the article :)
Lets begin by analysing this solidity compiled contract’s mnemonics to understand the fundamentals before we go into writing custom bytecode:
First of all, let’s assume each function has no parameters. We want to call the function selector
b3bcfa82
, so we need to build calldata that enables us to tell the bytecode we’re interacting with
this function:
Now lets see how this is processed and how the function selector is extracted from calldata. To understand this I’ll walk you through with what’s happening with:
Great, now our stack is:
And we begin our function matching sequence, beginning with:
As you can see, this matching sequence continues until:
- A function selector match is found.
- No function selector match occurs and we reach the end of the match:
Terrific! Now we know how function matchers work :D
Notice how this strategy is extremely inefficient, especially in contracts that have a lot of functions! Imagine cycling through 10+ functions…what a nightmare. Can you think of any techniques we can implement to make this more efficient? Keep that in mind! Now that you have the fundamental understanding, I can explain our first technique, using a program counter (pc) matcher.
The ideology behind this is to scramble our function selectors to any basic decoder so they can’t easily identify our functions.
Therefore, instead of doing the traditional selector matcher:
We can do a slight modification to use only a single byte. Lets provide the following calldata:
And send this to our bytecode:
Single Word Jumptable
Alternatively, we can create a highly optimised version: Packing all of our locators into a single
PUSH32
, where each 0000
is a program counter’s locator to a function’s body, like:
We would access these locators with a single byte from our calldata which will represent the SHR
amount we want to apply to our jump table. After we shift the desired bits we then mask the pc
locator and boom, we have access to our fn!
Lets see this practically, starting with our calldata containing the SHR
value.
Since we’re making custom bytecode there’s no 4-byte selector required in the calldata!
Now this calldata will be passed into our custom contract’s bytecode. Lets examine what’s happening with the following:
As you can see we only used 9 opcodes and 32 gas (3,3,3,3,3,3,3,3,8) for our custom function selector jump table!
But, why is this significant?
The solidity compiler’s way of using the following block for every function in a contract adds up to become 22 gas for executing it and 2,200 gas to deploy the bytes (200 for each byte).
If there were 16 functions in a compiled contract’s bytecode using this format it would cost 352 gas and 35,200 deployment gas to run, compared to our 32 gas and 9,000 deployment gas!
Scrambling Calldata
This alone is great for bytecode optimisation and adding on some more work for decoders…but what if we wanted to scramble calldata decoders too?
It’s common knowledge (if you’re into this stuff) that function selectors are always represented at the start of calldata, e.g.
We changed the selector to a single byte:
We can further confuse reverse engineers by putting the new “selector” on the other end:
Why is this significant?
If you’ve read my article on calldata titled Reversing The EVM: Raw Calldata, you are aware that the word is read as a variable similar to uint. Once converted to uint, it represents the value 192. This is a significant obfuscation technique that deviates from the standard and is rarely used (I haven’t come across any contract that uses it).
CFG Spammer
Reverse engineers typically attempt to identify every possible flow path that a contract can produce
through the use of the JUMPI
and JUMP
opcodes. These opcodes allow the program to jump to a
destination indicated by the JUMPDEST
opcode.
There are important differences between the JUMPI
and JUMP
opcodes in Ethereum.
JUMPI
is a conditional jump opcode, which means that it checks a prior condition, and if that
condition is true, it jumps to the JUMPDEST
opcode. If the condition is false, it continues with
the current flow and ignores the JUMPI
.
JUMP
, on the other hand, always jumps to the provided JUMPDEST
opcode. If the input to JUMP
is dynamic, the destination can be anything. If the input is hardcoded, the destination is obvious
and doesn’t create multiple potential flows.
You’re probably thinking “this is so trivial…I can gather all the existing JUMPDEST
opcodes to
discover all the potential flows”.
Not so fast normie reverse engineer, that’s not the chadest of solutions. If someone purposely is creating arbitrary paths to bamboozle your shitty decoder you bet they would of thought of this.
Enter plotting JUMPDEST
s in specific locations.
These obfuscators will be thinking of putting JUMPDEST
s, JUMPI
s and JUMP
s in places that go
back and forth to to normal flows as well as scattering JUMPDEST
s everywhere to make the jumps
boom your CFG generator, maybe through in a couple of infinite loops that jump to other infinite
loops just to brighten your day a bit :)
All in all, these are rare cases. Only exploiters and bot operators will be using these techniques in order for you not to yoink their strategies. So if you find someone using these techniques you’ve most likely have struck gold, ser.
Function Body Logic
The most difficult thing to determine without a sophisticated system is dynamic inputs. For example, using a 1-byte MOD operation to get another 1-byte value that serves as the SHR value to calculate the PC value in the jump table we created earlier.
This is more a thinking exercise to get you in the mindset of obfuscating further from the techniques discussed here. Try to create ways that generate potential pathways that is hard to traverse backwards from without a reference - hint: usually something to do with math and bitwise operations ;)
Address Scattering
To deter reverse engineers from analyzing your bytecode and to prevent frontrunners from cloning your contract and replacing your addresses with their own, you can use an address obfuscation technique.
I thought of this while writing the article
There are 2 ways developers tend to implement an address into their contract(s):
- Importing via calldata.
This approach can be picked up from a frontrunner with basic heuristics to clone the tx, taking the opportunity for themselves…and reading it is quite simple.
- Hardcoding in the bytecode.
This approach however is slightly more advanced. An engineer can read it quite easily, however a frontrunner needs to disassemble and replace the address with one of their own. For a more deep dive on this, read Memware: Generalised Frontrunners.
Now we know the most common strategies, we are able to formalise an new off-meta technique, providing a portion of an address via calldata and hardcoding another portion.
Let me explain…
We want to have the following address be the destination to send our funds to without being easily replaced by frontrunners via calldata or bytecode.
We break this up into 4-bytes and 16-bytes respectively.
82828f6a
is passed in via calldata.Ff831e0D8b366D7b33caf12B39232772
is hardcoded in.
But why are we grabbing 4-bytes from the address?
Most decoders will scan for function selectors at the start, therefore we can create the illusion that we’re calling this “function selector”. Using our custom jump table and calldata ordering technique we can shift everything around to completely ruin a reverse engineers day :D
Then at the bytecode level we would do something like this:
Now we have an effective strategy to prevent frontrunners from both replacing our calldata with their own address and cloning our contract and replacing the address to steal our tx.
Unless they have an extremely sophisticated system, I’m guessing they wont have the heuristics to deal with this, yet - especially since it’s such a niche technique.
Anti Frontrunner Replication
A simple tactic to prevent frontrunners from taking your profits is to make sure that the address the funds are being sent to is a contract by checking if the destination address doesn’t have 0 code. If it has 0 code then we revert the transaction. This forces the frontrunner to be sophisticated enough to create a contract to send the funds to that has a withdraw or self destruct function!
laughs at loser frontrunner
Final
All-in-all obfuscation techniques isn’t a silver bullet that stops reverse engineers from figuring out the control flow, however it will more than likely hinder their progress (depending on how advanced they are) by requiring them to add more sophistication to their programs and forcing them to sacrifice more time to read and understand what’s happening.
Whether you use the techniques or not, I hope you learned something new and have gained inspiration to create your own techniques and/or create systems that are capable of reverse engineering those that implement bleeding-edge obfuscation techniques!
I never imagined I would be talking about smart contract obfuscation techniques. I thought these were only used in malware to avoid detection. What a crazy world crypto is.
If you enjoy my content, please share with your frens and free to support me
0x82828f6aFf831e0D8b366D7b33caf12B39232772
:)
Share this Article
Recent Articles
-
Generating Custom Assembly Smart Contracts
2 years ago I wrote a component that allowed me to generate any custom assembly smart contract on the fly to automatically create exploits without needing to do any manual work. I had a montiroing system that would provide the inputs and the bytecode generator would chug along and spit out an executable program that I could deploy and call on with a bundle of transactions. In this article I'll share the core of that codebase to get you up to speed! Buckle up, anon. There isn't any other article like this revealing these trade secrets!
-
Baiting MEV Bots: UniV2 Token Trapper
So many MEV bots take money from people but why don't people take money from them? I always thought about this when I was in my web3 cybersec assembly arc. I got quite fascinated with reverse engineering them and the contracts they interected with and realised there are some interesting things you can do with the uniswap code, since it has a few assumptions with the tokens you provide to create the pairs. Although not very practical it's definitely an intereting thought experiment that can provoke some further creativity!
-
Starting Malware Development
After spending years in MEV and web3 infosec DeGatchi explains his reasoning behind switching to malware development. Although seemingly less money and entering an alreaedy mature field, it's clearly the most powerful long term decision to be made -- especially when combining malware development with custom evolutionary AI algorithms.