A Low-Level Guide To Solidity's Storage Management
10 min read
Learn how the EVMs storage system works by interacting with it through smart contracts using solidity's inline assembly/yul, taking you a step closer to bridging the gap between high and low level programming! We'll walkthrough each encounter you will face by learning deal with them using bitwise operations alongside SLOAD and SSTORE to control the EVMs storage at will.
What Is Storage?
Storage is a persistent mapping with 2^256 - 1 available 32 byte words for each contract. When you set a state variable’s value it stores it at the assigned slot where it will remain in the EVM unless overriden with another value of the same type.
When To Use Storage Vs Memory
You may have seen that it’s better to use memory over storage to optimise your smart contract. If you don’t know exactly why let me explain it to you, anon. Having a deep understanding of this is critical to writing optimised contracts at any language level, whether it’s solidity or assembly.
When we first load a storage slot it’s cold, meaning it’s more expensive at 2100 gas and whenever we call that newly used storage slot again it’s a warm storage slot, meaning it’ll be 100 gas but not as cheap as memory which is at least 3 gas; but can go higher if memory expansion occurs!
Let’s see correct use of storage and memory assignments with a contract that is poorly written and one that optimises it.
Unoptimised contract:
Notice how s.b
is loaded twice from storage? We could optimise this by creating a memory variable
that assigns s.b
to it and then later use that variable.
But why?
Because we’re only using storage load (SLOAD) once to store it in memory instead of twice to perform all checks which reduces the gas costs since memory loads (MLOADs) are significantly cheaper in gas than SLOADs.
Knowing this, the optimised contract would look like:
The trick to remember is that initially loading a storage variable into memory calls SLOAD behind the scenes to copy it into the memory slot. So if you use a variable more than once then it’s better to SLOAD it.
With structs and arrays you load each child in the type. With S memory memory_s
, so you have to
make sure you’re using each variable more than once, otherwise you should specifically choose which
child to store to memory, like what we did with uint256 b = s.b
.
Manually Assigning Storage
The most glorified devs in crypto all have low-level granular expertise when it comes to smart contracts. They are able to manipulate the state at the lowest level, with either inline assembly/yul or huff. You need to know this in order to become an actual master of smart contract development, contradictory to what most courses say when they teach you the basics without diving into the roots of the language.
So, lets take a look at a new smart contract using structs as the main variable to analyse. I want to teach with structs because they will be the most complex type you need to deal with majority of the time and with this knowledge you can understand the basics.
Basic Types
If we wanted to access uint256 boring
it’s quite simple. We only need the slot it’s assigned,
which in this case is 0x00
since it’s the initial global state variable.
Bitpack Loading
Lets step it up a notch with a bitpacked struct! Bitpacked means storing multiple variables in a single slot (32 bytes) by ordering the byte size of the variables in a way that results the slot being equal or less to 32 bytes. In this case we pack a total of 25 bytes into a single slot at 0x01 using:
uint16 a
(2 bytes).uint24 b
(3 bytes).address c
(20 bytes).
s_struct
’s slots would be:
s_struct.d
isn’t contained in the same slot because it would overflow it: 25 + 20 = 45 but the
maximum size is 32, that’s why it’s given it’s own seperate slot.
We can break up the slot values intos a readable format:
Notice how a
, b
, c
are in the same slot. The way we grab any of these values by shifting the
bits and using masking to grab a specific string of bits in a slot.
First, it’s good to know how masking is done. In the next example we use 0xFFFFFF
which
represents:
When we state 0xFFFFFF
we mean 1111111111111111
bits which we use bitwise operations on. A
single byte is made up of 8 bits represented with 1s and/or 0s. 0s are empty bits and 1s for bits
that form byte values.
Lets go through a practical example to solidify our new knowledge by grabbing s_struct.b
:
Bitpack Setting
But what if we wanted to change the value of s_struct.b
?
We have a few extra steps to add to viewing it but we can do it like this (keep in mind I haven’t optimised it at all):
Now when we run set_b()
it will change s_struct.b
to 500
!
Special thanks to vectorized.eth for helping me out with this!
Arrays
A fixed array’s length is known so it takes up a predetermined amount of slots. However, dynamic array lengths are unknown and new elements assign slots after deployment so the EVM handles this with keccak256 hashing!
For our example contract, we can access any element in s_array
with the formula:
keccak256(array_slot) + var_slot
Lets say we want to access s_array[0].d
:
Mappings
Since mappings are a dynamic type and we will never know the length of the mapping the EVM uses hashes as a key to identify the corresponding value in order to avoid state collisions, similar to arrays.
There are 2 formulas we need to know for mappings:
keccak256(mapping_key . mapping_slot)
keccak256(mapping_key . mapping_slot) + i
, for when the struct uses multiples slots we just add the slot we needi
.
The .
means concatenatinating the left and right value together into a string.
Lets say we want to access s_map[2].b
from our example contract:
What about accessing s_map[4].d
?
Speical thanks to 0xKitetsu for breaking this down for me!
Strings & Bytes
string
and bytes
have identical encoding types that are very annoying to deal with:
- When the length of the
string
is 31 bytes or less it’s stored in a single slot starting from the left side and thelength * 2
is stored in the final byte on the right.
- For anything larger than 31 bytes the storage process is similar to an array. Where the slot of
the
string
storeslength * 2 + 1
and the data is stored viakeccak256(slot) + i
This way you can see what type of string
it is from checking if lowest bit is set (the far right
byte).
You honestly wont encounter the long string type frequently, probably once in a blue moon, but if
you do you can check if it’s the short version if ànd(0x1, <value>)
equals zero.
Final
This knowledge bridges the gap from being a normie solidity dev to becoming a low-level chad. Now that you understand how to interact with storage using assembly and how it functions you’ll be able to write in any low-level smart contract language! I’m proud of you for taking another step closer to writing raw bytecode >:D
I would like to thank everyone that’s answered my questions and have helped me with learning this mountain of a challenge. Special thank you to the Huff Discord, especially 0xKitetsu, Franfran, Godel, vectorized, jtriley, devtooligan, merkleplant and Sabnock for dealing with my infinite questions :,) (sorry if I forgot you, dm me!).
This article took a while to research and write about in a very digestable way. For any future article suggestions, please dm me.
I appreciate you for taking the time to read this article and hope you found value in this, anon!
Share this Article
Recent Articles
-
Generating Custom Assembly Smart Contracts
2 years ago I wrote a component that allowed me to generate any custom assembly smart contract on the fly to automatically create exploits without needing to do any manual work. I had a montiroing system that would provide the inputs and the bytecode generator would chug along and spit out an executable program that I could deploy and call on with a bundle of transactions. In this article I'll share the core of that codebase to get you up to speed! Buckle up, anon. There isn't any other article like this revealing these trade secrets!
-
Baiting MEV Bots: UniV2 Token Trapper
So many MEV bots take money from people but why don't people take money from them? I always thought about this when I was in my web3 cybersec assembly arc. I got quite fascinated with reverse engineering them and the contracts they interected with and realised there are some interesting things you can do with the uniswap code, since it has a few assumptions with the tokens you provide to create the pairs. Although not very practical it's definitely an intereting thought experiment that can provoke some further creativity!
-
Starting Malware Development
After spending years in MEV and web3 infosec DeGatchi explains his reasoning behind switching to malware development. Although seemingly less money and entering an alreaedy mature field, it's clearly the most powerful long term decision to be made -- especially when combining malware development with custom evolutionary AI algorithms.