Reversing The EVM: Raw Calldata

You may have have wondered how to decipher and read evm calldata, then attempted to read the transaction calldata of an Ethereum smart contract, only to become confused at a certain point. The EVM (and other L1 forks) encode and decode calldata in a specific way for static and dynamic types, which can be initially confusing. In this article, we will delve into the encoding sequence of calldata so that you can comprehend any verified or unverified smart contract transactions and understand the bytes. By doing so, I hope to empower you to create your own raw calldata.

What Is Calldata?

Calldata is the encoded parameters that we send to functions, in this case to smart contracts on the Ethereum Virtual Machine (EVM). Each piece of calldata is 32 bytes long, or 64 characters. There are two types of calldata: static and dynamic.

Static variables are fairly straightforward to understand. Dynamic variables, on the other hand, are much more complex, and this is likely the reason why you may have difficulty reading raw calldata intuitively. However, once we go through how dynamic variables work, you will be able to read raw calldata with ease.

To begin, let’s understand how calldata is encoded and decoded to establish a foundation of how it all works.

Encoding Calldata

To encode types, you can pass them into the `abi.encode(parameters)“ method to generate raw calldata.

If you want to encode calldata for a specific interfaced function, you can use abi.encodeWithSelector(selector, parameters). This will be the same as passing in the function and it’s parameters directly.

For example:

interface A {
  function transfer(uint256[] memory ids, address to) virtual external;
}

contract B {
  function a(uint256[] memory ids, address to) external pure returns(bytes memory) {
    return abi.encodeWithSelector(A.transfer.selector, ids, to);
  }
}

The method .selector generates the 4-bytes that represents that method on the interface. We use it to tell the EVM that we’re sending our calldata to that function. This is how UniswapV2 enables flashswaps!

There is also abi.encodePacked(...) which is efficiently put all dynamic variables with eachother, removing the 0 padding. The problem with it is that it doens’t prevent collisions and should only be used when you know for certain the types and lengths of the parameters.

Decoding calldata

So you have your calldata, how do you decode it?

If the calldata was created with abi.encode(...) then we can decode the parameters with abi.decode(...) by passing in the parameters we want to decode the calldata into.

For example:

(uint256 a, uint256 b) = abi.decode(data, (uint256, uint256))

Where data represents the calldata being passed in.

Now that we understand how to encode and decode parameters, we can move onto the different variable types and how they are reflected in the calldata output.

Static Variables

Static variables are simply the encoded representation of the following types: uints, ints, address, bool, bytes1 to bytes32 (including function selector), and tuples (however they can have dynamic variables in them).

For example, lets say we’re interacting with the following contract:

pragma solidity 0.8.17;
contract Example {
    function transfer(uint256 amount, address to) external;
}

With the input parameters:

amount: 1300655506
address: 0x68b3465833fb72A70ecDF485E0e4C7bD8665Fc45

We would generate the calldata: 0x000000000000000000000000000000000000000000000000000000004d866d9200000000000000000000000068b3465833fb72a70ecdf485e0e4c7bd8665fc45

But… how tf do we read this?

Well, lets chop it up into readable pieces, by first removing the prefix 0x and then breaking each line into 64 character (or 32 byte) parts

0x
// uint256
000000000000000000000000000000000000000000000000000000004d866d92
// address
00000000000000000000000068b3465833fb72a70ecdf485e0e4c7bd8665fc45

Cool, now we know the first 32-bytes is the uint256 amount variable and the 2nd 32-bytes is the address to one.

Functions

But what if we wanted to call the transfer function directly?

We would need to know what parameters types takes in order and use a hashing mechanism called keccak256 which turns the inputted data into a 32-byte hash.

In this case, to hash:

function transfer(uint256 amount, address to) external;

We would do:

keccak256("transfer(uint256,address)");

Which would return the following 32-byte hash:

0xb7760c8fd605b6ef5a068e1720c115665f9699a5c439e3c0ee9709290ff8a3bb

To get the function signature we only need the first 4-bytes (or 8 characters, excluding the 0x prefix), b7760c8f.

This 4-byte signature, b7760c8f, is how we tell the EVM that we’re interacting with that function and the following calldata is being passed in as parameters.

For example, if we were to call transfer with the same parameters as before in # Static Variables we would grab the existing calldata:

0x000000000000000000000000000000000000000000000000000000004d866d9200000000000000000000000068b3465833fb72a70ecdf485e0e4c7bd8665fc45

And add b7760c8f to the start of the first 4 bytes of the first 32-bytes:

0xb7760c8f000000000000000000000000000000000000000000000000000000004d866d9200000000000000000000000068b3465833fb72a70ecdf485e0e4c7bd8665fc45

0x
b7760c8f
000000000000000000000000000000000000000000000000000000004d866d92
00000000000000000000000068b3465833fb72a70ecdf485e0e4c7bd8665fc45

You may be wondering, how is the calldata parameter actually inputted into the function with the embedded signature?

The answer is, the contract’s bytecode reads it by targetting the function b7760c8f, then replacing it with 00000000 padding then passing in the parameter.

Dynamic Variables

Dynamic variables are non-fixed-size types, including bytes, string, and dynamic arrays <T>[], as well as fixed arrays <T>[N].

The structure of dynamic types always starts with an offset, which is the hexadecimal representation of where the dynamic type begins. For example, a hex of 20 represents 32-bytes. Once we reach the offset, there is a smaller number that represents the length of the type.

Tldr; 1st 32-bytes = offset, 2nd 32-bytes = length, the rest are elements.

For arrays, this length represents the number of elements contained in the array. For bytes and string types, it represents the length of the type. For example, the string "Hello World!" is 12-bytes long, with each character being 1-byte. Keep in mind that these types start on the left-hand side of the calldata, rather than the right-hand side like everything else.

For example, here’s the string "Hello World!" encoded:

0x
0000000000000000000000000000000000000000000000000000000000000020
000000000000000000000000000000000000000000000000000000000000000c
48656c6c6f20576f726c64210000000000000000000000000000000000000000

Observe how the first 32-bytes represents the hexadecimal offset of 20, which is 32 in decimal. So we skip 32-bytes from the start of 0000000000000000000000000000000000000000000000000000000000000020, bringing us to the next line with a hex of 0c, a decimal of 12, representing length of our string in bytes. Now when we convert 48656c6c6f20576f726c6421 to a string type it returns our original value.

Congrats! Now you know how to read dynamic types.

Decoding Static And Dynamic Parameters

Lets say we’re interacting with the following contract:

pragma solidity 0.8.17;
contract Example {
    function transfer(uint256[] memory ids, address to) external;
}

With the following parameters for transfer:

ids: ["1234", "4567", "8910"]
to: 0xf8e81D47203A594245E36C48e151709F0C19fBe8

We would generate the calldata: 0x8229ffb60000000000000000000000000000000000000000000000000000000000000040000000000000000000000000f8e81d47203a594245e36c48e151709f0c19fbe8000000000000000000000000000000000000000000000000000000000000000300000000000000000000000000000000000000000000000000000000000004d200000000000000000000000000000000000000000000000000000000000011d700000000000000000000000000000000000000000000000000000000000022ce

We can chop this up into a more readable form:

// prefix we discard
0x
// fn selector we're calling (`transfer(uint[], address)`)
8229ffb6
// `uint256[] ids` param array offset (64-bytes below from start of this line)
0000000000000000000000000000000000000000000000000000000000000040
// `address to` param
000000000000000000000000f8e81d47203a594245e36c48e151709f0c19fbe8
// length of `ids` array; 3 inputs
0000000000000000000000000000000000000000000000000000000000000003
// 1st `ids` param
00000000000000000000000000000000000000000000000000000000000004d2
// 2nd `ids` param
00000000000000000000000000000000000000000000000000000000000011d7
// 3rd `ids` param
00000000000000000000000000000000000000000000000000000000000022ce

Notice how the array parameter is represented by an offset to where the array begins. Then we move onto the second param, the address type, then finishing off the array type.

Now that we know how to read both static + dynamic parameters, let’s dissect a more complex example!

Decoding A Multicall’s Calldata

We’re going to be a UniswapV3 multicall’s input calldata from this transaction. Here the user calls 3 different functions from the multicall function.

Etherscan is nice enough to give us a simple decoded version:

MethodID: 0xac9650d8
0000000000000000000000000000000000000000000000000000000000000020
0000000000000000000000000000000000000000000000000000000000000003
0000000000000000000000000000000000000000000000000000000000000060
0000000000000000000000000000000000000000000000000000000000000120
00000000000000000000000000000000000000000000000000000000000002c0
0000000000000000000000000000000000000000000000000000000000000084
13ead56200000000000000000000000061fe7a5257b963f231e1ef6e22cb3b4c
6e28c531000000000000000000000000c02aaa39b223fe8d0a0e5c4f27ead908
3c756cc200000000000000000000000000000000000000000000000000000000
00002710000000000000000000000000000000000000000000831162ce86bc88
052f80fd00000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000164
8831645600000000000000000000000061fe7a5257b963f231e1ef6e22cb3b4c
6e28c531000000000000000000000000c02aaa39b223fe8d0a0e5c4f27ead908
3c756cc200000000000000000000000000000000000000000000000000000000
00002710ffffffffffffffffffffffffffffffffffffffffffffffffffffffff
fffaf17800000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000002e3bdc2534919
6582d720000000000000000000000000000000000000000000000000c249fdd3
2778000000000000000000000000000000000000000000000002e1e525c2ef9d
cec50c53000000000000000000000000000000000000000000000000c1cd7c9a
dfb0d9dc000000000000000000000000ed6c2cb9bf89a2d290e59025837454bf
1f144c5000000000000000000000000000000000000000000000000000000000
635ce8bf00000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000004
12210e8a00000000000000000000000000000000000000000000000000000000

We will modify this a bit and expand upon this line-by-line to make it even more readable. Keep in mind, each value is in hexadecimal format and 20 hex == 32-bytes for quick reference.

MethodID: 0xac9650d8
// offset of array_1 (starting next line)
0000000000000000000000000000000000000000000000000000000000000020
// length of array_1 (how many elements in array)
0000000000000000000000000000000000000000000000000000000000000003
// offset of 1st element in array_1, array_1A (96-bytes / 32 = 3)
0000000000000000000000000000000000000000000000000000000000000060
// offset of 2nd element in array_1, array_1B (288-bytes / 32 = 9)
0000000000000000000000000000000000000000000000000000000000000120
// offset of 3rd element in array_1, array_1C (704-bytes / 32 = 22)
00000000000000000000000000000000000000000000000000000000000002c0

// length 1st element of array_1, array_1A (132-bytes (inc. selector))
0000000000000000000000000000000000000000000000000000000000000084

// here we'll read the next 132-bytes
// fn selector; 4 of 132
13ead562
// 1st param; 36 of 132
00000000000000000000000061fe7a5257b963f231e1ef6e22cb3b4c6e28c531
// 2nd param; 68 of 132
000000000000000000000000c02aaa39b223fe8d0a0e5c4f27ead9083c756cc2
// 3rd param; 100 of 132
0000000000000000000000000000000000000000000000000000000000002710
// 4th param; 132 of 132
// this marks the end of array_1A
000000000000000000000000000000000000000000831162ce86bc88052f80fd

// 32-bytes of `0` indicating next elemet
0000000000000000000000000000000000000000000000000000000000000000
// length 2nd element of array_1, array_1B (356-bytes (inc. selector))
// we have 4-bytes missing due to the embedded fn selector, 13ead562
// the next fn selector, 88316456, will be inserted here
00000000000000000000000000000000000000000000000000000164

// here we'll read the next 356-bytes
// fn selector; 4 of 356
88316456
// 1st param; 36 of 356
00000000000000000000000061fe7a5257b963f231e1ef6e22cb3b4c6e28c531
// 2nd param; 68 of 356
000000000000000000000000c02aaa39b223fe8d0a0e5c4f27ead9083c756cc2
// 3rd param; 100 of 356
0000000000000000000000000000000000000000000000000000000000002710
// 4th param; 132 of 356
// notice how all the `0`s are `f`s. this indicates a `int` type!
fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffaf178
// 5th param; 164 of 356
// we have 32-bytes of `0`, but since we're still reading the bytes
// we know this is a paramter, representing 0 of a type
0000000000000000000000000000000000000000000000000000000000000000
// 6th param; 196 of 356
00000000000000000000000000000000000000000002e3bdc25349196582d720
// 7th param; 228 of 356
000000000000000000000000000000000000000000000000c249fdd327780000
// 8th param; 260 of 356
00000000000000000000000000000000000000000002e1e525c2ef9dcec50c53
// 9th param; 292 of 356
000000000000000000000000000000000000000000000000c1cd7c9adfb0d9dc
// 10th param; 324 of 356
000000000000000000000000ed6c2cb9bf89a2d290e59025837454bf1f144c50
// 11th param; 356 of 356
// this marks the end of array_1B
00000000000000000000000000000000000000000000000000000000635ce8bf

// 32-bytes of `0` indicating next elemet
0000000000000000000000000000000000000000000000000000000000000000
// this is the same thing as before, the length!
// we can see there's only 32-bytes left so we can conclude
// that it's going to be a fn with no inputs
00000000000000000000000000000000000000000000000000000004

// a call to the fn selector 12210e8a; 4 of 4
12210e8a00000000000000000000000000000000000000000000000000000000

Now you’re able to read raw embedded dynamic types!

Final

I hope this information has helped you understand how calldata is encoded, decoded, and read. It took me some time to research and experiment with it all in order to learn, but it was worth it. The next step from here is to learn how to read bytecode in order to understand the EVM at its lowest level (Then everything becomes open-source >:D).

I appreciate you for taking the time to read this article. I hope you found value in this, anon!

What Is Calldata?

Encoding Calldata

Decoding calldata

Static Variables

Functions

Dynamic Variables

Decoding Static And Dynamic Parameters

Decoding A Multicall’s Calldata

Final

Recent Articles

Leaving For London

Swimming Safely In The Public Mempool: MEV Smart Contract Obfuscation Techniques

DeGatchi's Life Plan

Reversing The EVM: Raw Calldata

What Is Calldata?

Encoding Calldata

Decoding calldata

Static Variables

Functions

Enjoying the article? Stay updated!

Dynamic Variables

Decoding Static And Dynamic Parameters

Decoding A Multicall’s Calldata

Final

Recent Articles

Leaving For London

Swimming Safely In The Public Mempool: MEV Smart Contract Obfuscation Techniques

DeGatchi's Life Plan