Generating Custom Assembly Smart Contracts

Foreword

By far one of the most interesting program I’ve ever written was my smart contract code generator. It was such a fascinating problem. The objective was to create a program that creates other programs, giving some level of autonomy to the father program. I think this concept is even more interesting in the realm of ML/AI and is the core reason why I’m pursuing mathematics to go down the route of modelling evolutionary based systems that criticise and adapt their own codebases.

Since I no longer participate in this endeavour I thought it would be quite exciting to share what I kept away for almost 2 years, especially since I made it bleeding edge with custom obfuscation and evasion tactics to deter automated detection tools. The even more interesting part of this is how it can be applied to MEV and generalised frontrunning alongside exploit generation (the original use case). I’m not sure if there are any resources on this matter and therefore hope to serve your interests well.

The Problem

When dealing with the combinatronic hell of repeatable function permutations, e.g.,

\{ A, A, B, A, C \}

you hit this wall of state explosion and need some way to limit your action space with a bound. The obvious bounds are computational complexity and time but in every single case you work within these bounds to achieve your objective. The most I’ve seen in all of the blockchain hacks I’ve observed has to be ~20 sequential functions max, with the pickle finance hack. But when you achieve some sequence that gets you the desired outcome of profit, e.g.,

\{ \text{deployContract}, \ \text{approve}, \ \text{transfer}, \ \text{deposit}, \text{rebalance}, \ \text{withdraw} \}

then you need some way of generating this custom functionality on the fly since you’ll be making many permutations for very specific use cases.

The Solution

Generic Generation

Luckily, the EVM is super simple and can be generalised quite easily with generic code. There were only a few opcodes actually needed to achieve this generic contract generator to work in most cases but it can get quite advanced if your dealing with generating many contacts that return values to other generated contracts and deal with returndatasize and returndatacopy to chain together functions.

However, many of the contracts that will tend to be generated for exploits will use something like this

pub enum CommonCalls {
    Erc20BalanceOf,
    Erc20Approve,
    Erc20Transfer,
    Erc20TransferFrom,
}
pub enum CodeBlock {
    Ether {
        to: [u8; 20],
        amount: [u8; 32],
    },
    Erc20BalanceOf {
        token: [u8; 20],
        of: [u8; 20],
    },
    Erc20Approve {
        token: [u8; 20],
        to: [u8; 20],
        amount: [u8; 32],
    },
    Erc20Transfer {
        token: [u8; 20],
        to: [u8; 20],
        amount: [u8; 32],
    },
    Call {
        cd_offset: [u8; 2],
        cd_size: [u8; 2],
        to: [u8; 20],
        // execution_num: Option<usize>,
        calldata: Vec<u8>,
        callvalue: [u8; 32],
    },
    Revert,
    Stop,
}

So basically you can build up a contract using symbols that were spat out from the exploit generation system, whether that was a generalised frontrunner detecting “oh we should replace their address in this calldata to this address and steal their contract” or creating an entirely new function sequence from scratch.

For example, if we wanted to generate a contract with a sequence of: [approve, transfer, deposit] then we would first create an approve module based off the opcodes

PUSH1 0x00 // return size
PUSH1 0x00 // return offset
PUSH1 0x.. // argSize: dynamic
PUSH1 0x.. // argOffset: dynamic, we usually want 0x00
PUSH1 0x00 // ether, approve so none
PUSH20 0x.. // receiver, who we send the payload to
GAS // default gas for tx
CALL // call the receiver with the payload
POP // remove success result bc tx will fail otherwise

// ------ Converted to hex -------

// ".." marks the dynamic parts
0x60 00 60 00 .. .. 60 00 60 00 73 .. 5A F1 50

Perfect, so now we want to create a module to automate this process. Note, I’m going to use Strings to make this much easier for myself, you can optimise it by using [u8; 32] or [u8; 20] as you please.

Defining The Struct

Lets start off with defining our struct, which only has 3 things really that are important

#[derive(Default)]
struct Contract {
    /// All the calldata related to our custom contract
    calldata: String,
    /// The offset of each variable in our calldata
    calldata_offsets: Vec<usize>,
    /// The contract we generate
    source: String,
}
impl Contract {
    // All the opcodes we'll need for this implementation
    pub const P1_OP: &'static str = "60";
    pub const P20_OP: &'static str = "73";
    pub const GAS_OP: &'static str = "5A";
    pub const CALL_OP: &'static str = "F1";
    pub const POP_OP: &'static str = "50";
    pub const CD_LOAD_OP: &'static str = "35";
    pub const MSTORE_OP: &'static str = "52";
    pub const SHR_OP: &'static str = "1C";

    pub const APPROVE_SIG: &'static str = "095ea7b3";

    pub fn default() -> Self {
        Self {
            calldata: String::new(),
            calldata_offsets: vec![],
            source: String::new(),
        }
    }
}

Calldata Generator

Then we can add our calldata extender which is essentially adding new dynamic inputs into the calldata that we pass into the contract we create. This could be amounts for a transfer, receiver addresses, really anything that is dynamic.

// Adds onto the end of our existing calldata with new PACKED inputs
// Meaning they aren't left side padded to save gas and be more compact
pub fn extend_calldata(&mut self, new_calldata: Vec<&str>) {
    println!("\n[Extending Calldata: {}]", new_calldata.len());
    println!("- [00] Old: {}", &self.calldata);

    for (i, item) in new_calldata.iter().enumerate() {

        // Add it onto the end of our existing calldata
        self.calldata.extend([*item]);

        // Record the offset of where we just added so we can ref it later
        match self.calldata_offsets.len() == 0 {
            true => {
                // We don't init with anything since we wont have calldata (duh)
                // So we add it here to initialise + our own calldata
                self.calldata_offsets.push(0);
                // Since we're dealing with strings it'll be double the
                // len -- we want the amount of bytes instead so 1/2 it
                self.calldata_offsets.push(self.calldata.len() / 2);
            }
            false => self.calldata_offsets.push(self.calldata.len() / 2),
        }

        // println!("self.calldata_offsets {:?}", &self.calldata_offsets);
        println!("- [{:02x}] New: {}", i + 1, &self.calldata);
    }
}

Notice how we can pass in basically anything and it’ll add it. This is a feature, not a bug, to bitpack our calldata to reduce the amount of gas being used (calldata costs a fuck load for each byte used, espc non zeros).

So whenever we pass in parameters to a function we can’t pass this bitpacked data in because it follows a left-side padded standard. Therefore, we need to create a function to left-pad it from the right-padding.

Auto Padder

Since we’re passing our calldata into functions we need to store it into memory and referencing that using the MSTORE opcode and then specifying the size, offset (which we’ll always make 0x00 to make it easier — you can optimise further later), and finally followed by the size and then CALL.

// Convert our packed calldata into left side padded uint
// This is what functions normally take:
// 0x000000000000000000000000000000000000000000000000000000AABBCCDD
//
// instead of right padded
// 0xAABBCCDD000000000000000000000000000000000000000000000000000000
//
// But signatures for protocols always 4 bytes long at the start
// of the calldata
pub fn pad_cd_to_mem(&self, cd_from: usize, cd_to: usize) -> String {
    println!("\n---\n");
    println!("[Unpack Calldata]");

    let mut seq = String::new();
    let mut mem_offset: usize = 4; // start from 4 bc we'll skip the sig

    // Load and store sig far left
    // 0x AABBCCDD 000000000000000000000000000000000000000000000000000000
    let sig_offset = self.calldata_offsets[cd_from];
    seq.extend([format!(
        "{}{:02x}{}{}00{}",
        Self::P1_OP,
        sig_offset,
        Self::CD_LOAD_OP,
        Self::P1_OP,
        Self::MSTORE_OP,
    )]);

    // Skip the first offset because that'll be our signature
    // and the following offsets are the variables we use for said
    // function call
    //
    // You can extend this to be more optimised, of course, but it
    // doesn't serve too much purpose aside from cost of tx
    for (o, offset) in self.calldata_offsets[cd_from..cd_to]
        .iter()
        .enumerate()
        .skip(1)
    {
        // edit: realised i did this in the most complex way possible lmao
        let next_offset = self.calldata_offsets[o + 1];
        let mut word = self.calldata.split_at(*offset * 2).1;
        word = word.split_at((next_offset - offset) * 2).0;
        let padded = format!("{:0>64}", word);

        // how much we'll shift our rigt padded word to be left padded
        // and therefore function calldata compatable
        let shr_amt = 64 - word.len();

        let calldata_load = format!("{:02x}{}", offset, Self::CD_LOAD_OP);
        let shr = format!("{:02x}{}", shr_amt, Self::SHR_OP);
        let mstore = format!("{:02x}{}", mem_offset, Self::MSTORE_OP);
        let sub_seq = format!("{}{}{}", calldata_load, shr, mstore);

        println!("[SHR 0x{:02x} Word {}]", shr_amt, word);
        println!("- [Calldata To MSTORE: {}] {}", o, sub_seq);
        println!("- [x] Padded Word {}\n", padded);

    // move onto the next word
        mem_offset += 32;

        seq.extend([sub_seq]);
    }

    println!("[Calldata To MSTORE: Sequence] {}", seq);
    println!("\n---");
    seq
}

Since this a standard function we’ll reuse we can just hardcode what’s happening. The core of it is figuring out how many bytes we used, calculating how many bytes we need to shift it over the the right (to pad it left) and then MSTORE so we can then inject it before our call.

So now we have everything to build the core function call generation part — we’ll just do the approve function and leave the rest to you to figure out (thats how you actually learn). You can also generalise this if you please but it may take some thinking to get it right — you’ll need to make a generalised version down the line anyway for custom CALLs, but that’ll be left for you to do. I’m merely showing you the door, you must walk through it.

Approve Generator

So now we have our approve_token function that takes in 3 dynamic variables: token, to, amount. The token will be hardcoded into the contract (although you can modify it to be obfuscated or dynamically inputted). The to and amount are calldata because why not. We could technically hardcode everything but this will be useful to have so our deployment costs are enormous (especially in the bullmarket). Good to learn, you know.

// Modularised approve token component. Whenever you want an approve
// call this helper function to generate the CALL for an ERC20 approve
//
// You could definitely make a single generalised function and have
// your monitoring system spit our the variables instead of hardcoding
// like i did here
pub fn approve_token(&mut self, token: &str, to: &str, amount: &str) {

    // Get our start offset so we can calculate our last one after
    // adding to the calldata and calldata_offsets
    let latest_offset = if let Some(x) = self.calldata_offsets.last() {
        *x
    } else {
        0
    };

    // Each nibble = 1 character (pretty convenient right?!)
    let calldata_len: usize = (Self::APPROVE_SIG.len() + to.len() + amount.len()) / 2;
    self.extend_calldata(vec![Self::APPROVE_SIG, to, amount]);

    // Calldata To Memory before CALL (which it references)
    let cd_to_mem = self.pad_cd_to_mem(latest_offset, latest_offset + 3);

    // CALL arguments
    // selector, word, word
    let mem_size: String = format!("{:02x}", 4 + 32 + 32);
    let ret: String = format!("{}00{}00", Self::P1_OP, Self::P1_OP);
    let arg: String = format!(
        "{}{}{}00",
        Self::P1_OP,
        mem_size,
        Self::P1_OP, // our memory
    );
    let token: String = format!("{}{}", Self::P20_OP, token); // assuming isnt pass in as '0x...'
    let gas_call_pop: String = format!("{}{}{}", Self::GAS_OP, Self::CALL_OP, Self::POP_OP);

    let seq = format!(
        "{}{}{}{}{}",
        // we inject the "calldata to memory" anywhere to make manual
        // reversing cancer to do, but here makes it clearer rn
        cd_to_mem,
        ret,
        arg,
        token,
        gas_call_pop
    );

    let source_b = self.source.clone();
    self.source.extend([seq]);
    println!(
        "\n[Extending Source] Token Approval\n- [ ] Old {}\n- [ ] New {}\n",
        source_b, &self.source
    );
}

And there we fucking have it!

A domain expert system that generates custom bytecode to execute functions! You can use this how you please, whether its with MEV or cybersecurity. This project will be fantastic for any resume, in and out of crypto. It’s quite unique to be able to generate assembly dynamically and you probably wont find too many resources like this one out there.

Test + Output

And just to play around with it we can write a lil unit test or throw it in the main function. Just plug in our token address, to address and amount uint256 into our approve token function to construct the first component of our generated exploit contract

/*
cargo test test -- --nocapture
*/
#[cfg(test)]
mod test {
    use super::*;

    /*
    cargo test test::test_approve_token -- --nocapture
     */
    #[test]
    fn test_approve_token() {
        let mut contract = Contract::default();
        contract.approve_token(
            "C02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2", // weth
            "9FC3da866e7DF3a1c57adE1a97c9f00a70f010c8", // some randos addy
            "3635C9ADC5DEA00000", // 1,000 tokens
        );

        println!("\n[Calldata]\n{}", contract.calldata);
    }
}

Which outputs the following printed lines for some visualisation of what’s happening

You can see the extending calldata part with each iteration its adding a new part. The signature first in iteration 01 then adding the address after and finally the amount to approve. How fucking cool is that?!

Then we have the calldata bitpacking unpacker giving our gas saved calldata left padding to be able to be put into memory and passed into functions via CALL.

And finally the assembly sequence it generates to add onto the start of the approve function, which gets kind of long because the hardcoded token address.

Magical! Code generation without an LLM, only domain expertise. I had to rewrite this entire project from scratch which took ~3:30hrs just for this article, lmao.

Full Code

You can either copy it here or reference it on github

use std::{default, fmt::format};
use hex::{decode, encode};

fn main() {}

// Implement this to plug into `Contract::build` to generate the contracts
pub enum Cmds {
    ApproveErc20,
    TransferErc20,
    CustomCall,
}

#[derive(Default)]
struct Contract {
    calldata: String,
    calldata_offsets: Vec<usize>,
    source: String,
}
impl Contract {
    // Add more as you please/need
    pub const P1_OP: &'static str = "60";
    pub const P20_OP: &'static str = "73";
    pub const GAS_OP: &'static str = "5A";
    pub const CALL_OP: &'static str = "F1";
    pub const POP_OP: &'static str = "50";
    pub const CD_LOAD_OP: &'static str = "35";
    pub const MSTORE_OP: &'static str = "52";
    pub const SHR_OP: &'static str = "1C";

    pub const APPROVE_SIG: &'static str = "095ea7b3";

    pub fn default() -> Self {
        Self {
            calldata: String::new(),
            calldata_offsets: vec![],
            source: String::new(),
        }
    }

    pub fn build(cmds: Vec<Cmds>) -> Self {
        let mut contract = Self::default();

        // implement cmd handling
        // ...

        contract
    }

    // Adds onto the end of our existing calldata with new PACKED inputs
    // Meaning they aren't left side padded to save gas and be more compact
    pub fn extend_calldata(&mut self, new_calldata: Vec<&str>) {
        println!("\n[Extending Calldata: {}]", new_calldata.len());
        println!("- [00] Old: {}", &self.calldata);

        for (i, item) in new_calldata.iter().enumerate() {

                // Add it onto the end of our existing calldata
            self.calldata.extend([*item]);

                    // Record the offset of where we just added so we can ref it later
            match self.calldata_offsets.len() == 0 {
                true => {
                        // We don't init with anything since we wont have calldata (duh)
                        // So we add it here to initialise + our own calldata
                    self.calldata_offsets.push(0);
                    // Since we're dealing with strings it'll be double the
                    // len -- we want the amount of bytes instead so 1/2 it
                    self.calldata_offsets.push(self.calldata.len() / 2);
                }
                false => self.calldata_offsets.push(self.calldata.len() / 2),
            }

            // println!("self.calldata_offsets {:?}", &self.calldata_offsets);
            println!("- [{:02x}] New: {}", i + 1, &self.calldata);
        }
    }

    // Convert our packed calldata into left side padded uint
    // This is what functions normally take:
    // 0x000000000000000000000000000000000000000000000000000000AABBCCDD
    //
    // instead of right padded
    // 0xAABBCCDD000000000000000000000000000000000000000000000000000000
    //
    // But signatures for protocols always 4 bytes long at the start
    // of the calldata
    pub fn pad_cd_to_mem(&self, cd_from: usize, cd_to: usize) -> String {
        println!("\n---\n");
        println!("[Unpack Calldata]");

        let mut seq = String::new();
        let mut mem_offset: usize = 4; // start from 4 bc we'll skip the sig

        // Load and store sig far left
        // 0x AABBCCDD 000000000000000000000000000000000000000000000000000000
        let sig_offset = self.calldata_offsets[cd_from];
        seq.extend([format!(
            "{}{:02x}{}{}00{}",
            Self::P1_OP,
            sig_offset,
            Self::CD_LOAD_OP,
            Self::P1_OP,
            Self::MSTORE_OP,
        )]);

        // Skip the first offset because that'll be our signature
        // and the following offsets are the variables we use for said
        // function call
        //
        // You can extend this to be more optimised, of course, but it
        // doesn't serve too much purpose aside from cost of tx
        for (o, offset) in self.calldata_offsets[cd_from..cd_to]
            .iter()
            .enumerate()
            .skip(1)
        {
            // edit: realised i did this in the most complex way possible lmao
            let next_offset = self.calldata_offsets[o + 1];
            let mut word = self.calldata.split_at(*offset * 2).1;
            word = word.split_at((next_offset - offset) * 2).0;
            let padded = format!("{:0>64}", word);

            // how much we'll shift our rigt padded word to be left padded
            // and therefore function calldata compatable
            let shr_amt = 64 - word.len();

            let calldata_load = format!("{:02x}{}", offset, Self::CD_LOAD_OP);
            let shr = format!("{:02x}{}", shr_amt, Self::SHR_OP);
            let mstore = format!("{:02x}{}", mem_offset, Self::MSTORE_OP);
            let sub_seq = format!("{}{}{}", calldata_load, shr, mstore);

            println!("[SHR 0x{:02x} Word {}]", shr_amt, word);
            println!("- [Calldata To MSTORE: {}] {}", o, sub_seq);
            println!("- [x] Padded Word {}\n", padded);

        // move onto the next word
            mem_offset += 32;

            seq.extend([sub_seq]);
        }

        println!("[Calldata To MSTORE: Sequence] {}", seq);
        println!("\n---");
        seq
    }


    // Modularised approve token component. Whenever you want an approve
    // call this helper function to generate the CALL for an ERC20 approve
    //
    // You could definitely make a single generalised function and have
    // your monitoring system spit our the variables instead of hardcoding
    // like i did here
    pub fn approve_token(&mut self, token: &str, to: &str, amount: &str) {

        // Get our start offset so we can calculate our last one after
        // adding to the calldata and calldata_offsets
        let latest_offset = if let Some(x) = self.calldata_offsets.last() {
            *x
        } else {
            0
        };

        // Each nibble = 1 character (pretty convenient right?!)
        let calldata_len: usize = (Self::APPROVE_SIG.len() + to.len() + amount.len()) / 2;
        self.extend_calldata(vec![Self::APPROVE_SIG, to, amount]);

        // Calldata To Memory before CALL (which it references)
        let cd_to_mem = self.pad_cd_to_mem(latest_offset, latest_offset + 3);

        // CALL arguments
        // selector, word, word
        let mem_size: String = format!("{:02x}", 4 + 32 + 32);
        let ret: String = format!("{}00{}00", Self::P1_OP, Self::P1_OP);
        let arg: String = format!(
            "{}{}{}00",
            Self::P1_OP,
            mem_size,
            Self::P1_OP, // our memory
        );
        let token: String = format!("{}{}", Self::P20_OP, token); // assuming isnt pass in as '0x...'
        let gas_call_pop: String = format!("{}{}{}", Self::GAS_OP, Self::CALL_OP, Self::POP_OP);

        let seq = format!(
            "{}{}{}{}{}",
            // we inject the "calldata to memory" anywhere to make manual
            // reversing cancer to do, but here makes it clearer rn
            cd_to_mem,
            ret,
            arg,
            token,
            gas_call_pop
        );

        let source_b = self.source.clone();
        self.source.extend([seq]);
        println!(
            "\n[Extending Source] Token Approval\n- [ ] Old {}\n- [ ] New {}\n",
            source_b, &self.source
        );
    }
}

/*
cargo test test -- --nocapture
*/
#[cfg(test)]
mod test {
    use super::*;

    /*
    cargo test test::test_approve_token -- --nocapture
     */
    #[test]
    fn test_approve_token() {
        let mut contract = Contract::default();
        contract.approve_token(
            "C02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2", // weth
            "9FC3da866e7DF3a1c57adE1a97c9f00a70f010c8", // some randos addy
            "3635C9ADC5DEA00000", // 1,000 tokens
        );

        println!("\n[Calldata]\n{}", contract.calldata);
    }
}

Going Further

Nonetheless you can extend this further with many functions like a transfer(), balanceOf() using STATICCALL, a way to handle receiving ETH, onlyOwner() guard with JUMPI and JUMPDEST and then continue on into a generalised version with generic call calldata handling (shouldn’t be too hard with the calldata handling functions I added). Although struct arrays are not going to be fun, I have another article that actually explains how you can do struct arrays, here, “Reversing The EVM: Reading Raw EVM Calldata”.

You can also add automated obfuscation using my articles **Smart Contract Obfuscation Techniques and Swimming Safely In The Public Mempool: MEV Smart Contract Obfuscation Techniques as references (this is what I did in my private messy version). You can also add a fake compiler version at the front and then POP all of the opcodes if need be and possibly dynamic JUMPIs to do state explosions to fuck with detector tools.

AI Endgame

What I imagine the future of infosec to be is to have this generic code generation be dynamically adjusted on the fly with neural nets, redesigning obfuscation techniques that don’t show patterns, mix up language (such as variables, comments, etc) used in source code, and adapt to systems as they come about — whether it’s a database, enterprise website, social engineering through emails with LLMs to reference old emails they have if there was an event happening and using GANs to generate documents from higher ups with their signatures, spoofing names, etc. This is merely the beginning of a devastating future imo. I wanted to just showcase the coolness of the profession :)

Final

I hope you found this as fascinating as I did when I first attempted to create this around 2 years ago. Whether you’re looking to beef up your resume or try do some cool shit this is definitely a unique project that will get you to the next level. I don’t think I met anyone that doesn’t code generation like this. And with AI in play you’ll have a better idea of how things worked before the new age and can use this knowledge to be more creative :) Assembly is always a good language to learn!

I didn’t touch on the contract deployment because that’s a transaction based event and everything else relating to execution regards transactions. Maybe in a future article I’ll explain that step but for now I just wanted to explain how to actually generate code and not give you all the answers (wheres the fun in that?!).

Anyway, glhf with the future, anon-kun.