Generating Custom Assembly Smart Contracts
18 min read
2 years ago I wrote a component that allowed me to generate any custom assembly smart contract on the fly to automatically create exploits without needing to do any manual work. I had a montiroing system that would provide the inputs and the bytecode generator would chug along and spit out an executable program that I could deploy and call on with a bundle of transactions. In this article I'll share the core of that codebase to get you up to speed! Buckle up, anon. There isn't any other article like this revealing these trade secrets!
Foreword
By far one of the most interesting program I’ve ever written was my smart contract code generator. It was such a fascinating problem. The objective was to create a program that creates other programs, giving some level of autonomy to the father program. I think this concept is even more interesting in the realm of ML/AI and is the core reason why I’m pursuing mathematics to go down the route of modelling evolutionary based systems that criticise and adapt their own codebases.
Since I no longer participate in this endeavour I thought it would be quite exciting to share what I kept away for almost 2 years, especially since I made it bleeding edge with custom obfuscation and evasion tactics to deter automated detection tools. The even more interesting part of this is how it can be applied to MEV and generalised frontrunning alongside exploit generation (the original use case). I’m not sure if there are any resources on this matter and therefore hope to serve your interests well.
The Problem
When dealing with the combinatronic hell of repeatable function permutations, e.g.,
you hit this wall of state explosion and need some way to limit your action space with a bound. The obvious bounds are computational complexity and time but in every single case you work within these bounds to achieve your objective. The most I’ve seen in all of the blockchain hacks I’ve observed has to be ~20 sequential functions max, with the pickle finance hack. But when you achieve some sequence that gets you the desired outcome of profit, e.g.,
then you need some way of generating this custom functionality on the fly since you’ll be making many permutations for very specific use cases.
The Solution
Generic Generation
Luckily, the EVM is super simple and can be generalised quite easily with generic code. There were only a few opcodes actually needed to achieve this generic contract generator to work in most cases but it can get quite advanced if your dealing with generating many contacts that return values to other generated contracts and deal with returndatasize
and returndatacopy
to chain together functions.
However, many of the contracts that will tend to be generated for exploits will use something like this
So basically you can build up a contract using symbols that were spat out from the exploit generation system, whether that was a generalised frontrunner detecting “oh we should replace their address in this calldata to this address and steal their contract” or creating an entirely new function sequence from scratch.
For example, if we wanted to generate a contract with a sequence of: [approve, transfer, deposit] then we would first create an approve module based off the opcodes
Perfect, so now we want to create a module to automate this process. Note, I’m going to use String
s to make this much easier for myself, you can optimise it by using [u8; 32]
or [u8; 20]
as you please.
Defining The Struct
Lets start off with defining our struct, which only has 3 things really that are important
Calldata Generator
Then we can add our calldata extender which is essentially adding new dynamic inputs into the calldata that we pass into the contract we create. This could be amounts for a transfer, receiver addresses, really anything that is dynamic.
Notice how we can pass in basically anything and it’ll add it. This is a feature, not a bug, to bitpack our calldata to reduce the amount of gas being used (calldata costs a fuck load for each byte used, espc non zeros).
So whenever we pass in parameters to a function we can’t pass this bitpacked data in because it follows a left-side padded standard. Therefore, we need to create a function to left-pad it from the right-padding.
Auto Padder
Since we’re passing our calldata into functions we need to store it into memory and referencing that using the MSTORE
opcode and then specifying the size, offset (which we’ll always make 0x00
to make it easier — you can optimise further later), and finally followed by the size and then CALL
.
Since this a standard function we’ll reuse we can just hardcode what’s happening. The core of it is figuring out how many bytes we used, calculating how many bytes we need to shift it over the the right (to pad it left) and then MSTORE
so we can then inject it before our call.
So now we have everything to build the core function call generation part — we’ll just do the approve function and leave the rest to you to figure out (thats how you actually learn). You can also generalise this if you please but it may take some thinking to get it right — you’ll need to make a generalised version down the line anyway for custom CALL
s, but that’ll be left for you to do. I’m merely showing you the door, you must walk through it.
Approve Generator
So now we have our approve_token function that takes in 3 dynamic variables: token, to, amount. The token will be hardcoded into the contract (although you can modify it to be obfuscated or dynamically inputted). The to and amount are calldata because why not. We could technically hardcode everything but this will be useful to have so our deployment costs are enormous (especially in the bullmarket). Good to learn, you know.
And there we fucking have it!
A domain expert system that generates custom bytecode to execute functions! You can use this how you please, whether its with MEV or cybersecurity. This project will be fantastic for any resume, in and out of crypto. It’s quite unique to be able to generate assembly dynamically and you probably wont find too many resources like this one out there.
Test + Output
And just to play around with it we can write a lil unit test or throw it in the main function. Just plug in our token address, to address and amount uint256 into our approve token function to construct the first component of our generated exploit contract
Which outputs the following printed lines for some visualisation of what’s happening
You can see the extending calldata part with each iteration its adding a new part. The signature first in iteration 01 then adding the address after and finally the amount to approve. How fucking cool is that?!
Then we have the calldata bitpacking unpacker giving our gas saved calldata left padding to be able to be put into memory and passed into functions via CALL
.
And finally the assembly sequence it generates to add onto the start of the approve function, which gets kind of long because the hardcoded token address.
Magical! Code generation without an LLM, only domain expertise. I had to rewrite this entire project from scratch which took ~3:30hrs just for this article, lmao.
Full Code
You can either copy it here or reference it on github
Going Further
Nonetheless you can extend this further with many functions like a transfer()
, balanceOf()
using STATICCALL
, a way to handle receiving ETH, onlyOwner()
guard with JUMPI
and JUMPDEST
and then continue on into a generalised version with generic call calldata handling (shouldn’t be too hard with the calldata handling functions I added). Although struct arrays are not going to be fun, I have another article that actually explains how you can do struct arrays, here, “Reversing The EVM: Reading Raw EVM Calldata”.
You can also add automated obfuscation using my articles **Smart Contract Obfuscation Techniques and Swimming Safely In The Public Mempool: MEV Smart Contract Obfuscation Techniques as references (this is what I did in my private messy version). You can also add a fake compiler version at the front and then POP
all of the opcodes if need be and possibly dynamic JUMPI
s to do state explosions to fuck with detector tools.
AI Endgame
What I imagine the future of infosec to be is to have this generic code generation be dynamically adjusted on the fly with neural nets, redesigning obfuscation techniques that don’t show patterns, mix up language (such as variables, comments, etc) used in source code, and adapt to systems as they come about — whether it’s a database, enterprise website, social engineering through emails with LLMs to reference old emails they have if there was an event happening and using GANs to generate documents from higher ups with their signatures, spoofing names, etc. This is merely the beginning of a devastating future imo. I wanted to just showcase the coolness of the profession :)
Final
I hope you found this as fascinating as I did when I first attempted to create this around 2 years ago. Whether you’re looking to beef up your resume or try do some cool shit this is definitely a unique project that will get you to the next level. I don’t think I met anyone that doesn’t code generation like this. And with AI in play you’ll have a better idea of how things worked before the new age and can use this knowledge to be more creative :) Assembly is always a good language to learn!
I didn’t touch on the contract deployment because that’s a transaction based event and everything else relating to execution regards transactions. Maybe in a future article I’ll explain that step but for now I just wanted to explain how to actually generate code and not give you all the answers (wheres the fun in that?!).
Anyway, glhf with the future, anon-kun.
Share this Article
Recent Articles
-
Baiting MEV Bots: UniV2 Token Trapper
So many MEV bots take money from people but why don't people take money from them? I always thought about this when I was in my web3 cybersec assembly arc. I got quite fascinated with reverse engineering them and the contracts they interected with and realised there are some interesting things you can do with the uniswap code, since it has a few assumptions with the tokens you provide to create the pairs. Although not very practical it's definitely an intereting thought experiment that can provoke some further creativity!
-
Starting Malware Development
After spending years in MEV and web3 infosec DeGatchi explains his reasoning behind switching to malware development. Although seemingly less money and entering an alreaedy mature field, it's clearly the most powerful long term decision to be made -- especially when combining malware development with custom evolutionary AI algorithms.
-
Understanding Gradient Descent
Machine Learning (ML) and Artificial Intelligence (AI) relies heavily on this very intuitive differentiation-based algorithm called gradient descent. It gets us closer to our desired minima or maxima by starting with a value and adjusting that value based on the change of our gradient. In this article you'll learn to understand what is gradient descent and all the component apart of it. And then explore a few modifications of it from simple to advanced to solidify your understanding!