Deconstructing a Solidity Contract —Part I: Introduction

Post originally written by @ajsantander on medium (Aug 13, 2018)

By Alejandro Santander in collaboration with Leo Arias.

Image from www.szzljy.com

You’re on the road, driving fast in your rare, fully restored 1969 Mustang Mach 1. The sunlight shimmers on the all-original, gorgeous plated rims. It’s just you, the road, the desert, and the never-ending chase of the horizon. Perfection!

In the blink of an eye, your 335 hp beast is engulfed in white smoke, as if transformed into a steam locomotive, and you’re forced to stop on the side of the road. With determination, you pop the hood, only to realize that you have absolutely no idea what you’re looking at. How does this damn machine work? You grab your phone and discover that you have no signal.

Image from www.mustangandfords.com


Could this perhaps be an analogy for your current knowledge of dApp development? In the analogy, the Mustang is your set of smart contracts, the rims are all those well-thought-out little details and the :heart: you put into them. And the popping of the hood is you looking into your contract’s EVM bytecode and having absolutely no idea what’s going on.

If this sounds familiar, then not to worry! The purpose of this series of articles is to deconstruct a simple Solidity contract, look at its bytecode, and break it apart into identifiable structures down to the lowest level. We’ll pop the hood on Solidity. By the end of the series, you should feel comfortable when looking at or debugging EVM bytecode. The whole point of the series is to demystify the EVM bytecode produced by the Solidity compiler. And it’s really much simpler than it seems.

Note: This series is aimed at developers who already feel comfortable with and have experience in developing Solidity contracts, but want to understand how things work at a slightly deeper/lower level — that is, how Solidity is translated into EVM bytecode by the Solidity compiler, and how the EVM executes such bytecode. If you aren’t there just yet, I recommend reading this great introduction by Facu Spagnuolo: A Gentle Introduction to Ethereum Programming.

Here’s the contract we’ll deconstruct:

pragma solidity ^0.4.24;

contract BasicToken {
  
  uint256 totalSupply_;
  mapping(address => uint256) balances;
  
  constructor(uint256 _initialSupply) public {
    totalSupply_ = _initialSupply;
    balances[msg.sender] = _initialSupply;
  }

  function totalSupply() public view returns (uint256) {
    return totalSupply_;
  }

  function transfer(address _to, uint256 _value) public returns (bool) {
    require(_to != address(0));
    require(_value <= balances[msg.sender]);
    balances[msg.sender] = balances[msg.sender] - _value;
    balances[_to] = balances[_to] + _value;
    return true;
  }

  function balanceOf(address _owner) public view returns (uint256) {
    return balances[_owner];
  }
}

Note: This contract is susceptible to an overflow attack, but we’re just keeping it simple here for the purpose at hand.

Compiling the contract

To compile the contract, we’ll be using Remix. Go ahead and create a new contract by clicking on the + button on the top left, above the file browser area. Set the filename to BasicToken.sol. Now, paste the above code into the editor section.

In the right-hand section, go to the Settings tab and make sure Enable Optimization is selected. Also, verify that the selected version of the Solidity compiler is “version:0.4.24+commit.e67f0147.Emscripten.clang”. These two details are very important, otherwise you’ll be looking at slightly different bytecode from what will be discussed here.

If you go to the Compile tab and click on the Details button, you should see a popup with all the stuff that the Solidity compiler generates, one of which is a JSON object named BYTECODE that has an “object” property, which is the compiled code of the contract. It looks like this:

608060405234801561001057600080fd5b5060405160208061021783398101604090815290516000818155338152600160205291909120556101d1806100466000396000f3006080604052600436106100565763ffffffff7c010000000000000000000000000000000000000000000000000000000060003504166318160ddd811461005b57806370a0823114610082578063a9059cbb146100b0575b600080fd5b34801561006757600080fd5b506100706100f5565b60408051918252519081900360200190f35b34801561008e57600080fd5b5061007073ffffffffffffffffffffffffffffffffffffffff600435166100fb565b3480156100bc57600080fd5b506100e173ffffffffffffffffffffffffffffffffffffffff60043516602435610123565b604080519115158252519081900360200190f35b60005490565b73ffffffffffffffffffffffffffffffffffffffff1660009081526001602052604090205490565b600073ffffffffffffffffffffffffffffffffffffffff8316151561014757600080fd5b3360009081526001602052604090205482111561016357600080fd5b503360009081526001602081905260408083208054859003905573ffffffffffffffffffffffffffffffffffffffff85168352909120805483019055929150505600a165627a7a72305820a5d999f4459642872a29be93a490575d345e40fc91a7cccb2cf29c88bcdaf3be0029

Yup. That’s completely unreadable (at least for a normal human being).

Deploying the contract

Next, go to the Run section in Remix. At the top, make sure you’re using the Javascript VM. This is basically an embedded Javascript EVM + network, our ideal Ethereum playground. Make sure BasicToken is selected in the ComboBox, and enter the number 10000 in the Deploy input box. Next, click the Deploy button. This should deploy an instance of our BasicToken contract, with an initial supply of 10000 tokens owned by the account currently selected at the top of the account ComboBox, which will hold the totality of our token supply.

Lower in the Run tab, in the Deployed Contracts section, you should see the deployed contract, with fields to interact with its three functions: transfer, balanceOf, and totalSupply. Here, we’ll be able to interact with the instance of the contract we just deployed.

But before that, let’s take a look at exactly what “deploying” the contract means. At the bottom of the page, in the console area, you should see the log “creation of BasicToken pending…”, followed by a transaction entry with various fields: from, to, value, data, logs, and hash. Click on this entry to expand the transaction’s info. Even though abbreviated, you should see that the data/input of the transaction is the same bytecode we presented above. This transaction is sent to the 0x0 address, and as a result, a new contract instance is created, with its own address and code. We’ll examine this process in detail in the next article.

Disassembling the bytecode

To the right of the transaction’s data, still in the console, click on the Debug button. This will activate the Debugger tab in Remix’s right-hand area. Let’s take a look at the Instructions section. If you scroll down, you should see the following:

000 PUSH1 80
002 PUSH1 40
004 MSTORE
005 CALLVALUE
006 DUP1
007 ISZERO
008 PUSH2 0010
011 JUMPI
012 PUSH1 00
014 DUP1
015 REVERT
016 JUMPDEST
017 POP
018 PUSH1 40
020 MLOAD
021 PUSH1 20
023 DUP1
024 PUSH2 0217
027 DUP4
028 CODECOPY
029 DUP2
030 ADD
031 PUSH1 40
033 SWAP1
034 DUP2
035 MSTORE
036 SWAP1
037 MLOAD
038 PUSH1 00
040 DUP2
041 DUP2
042 SSTORE
043 CALLER
044 DUP2
045 MSTORE
046 PUSH1 01
048 PUSH1 20
050 MSTORE
051 SWAP2
… (abbreviated)

To make sure that you’re following the same set of opcodes described in this series, please compare what you see in Remix with the bytecode in this gist.

This is the disassembled bytecode of the contract. Disassembly sounds rather intimidating, but it’s quite simple, really. If you scan the raw bytecode by bytes (two characters at a time), the EVM identifies specific opcodes that it associates to particular actions. For example:

0x60 => PUSH
0x01 => ADD
0x02 => MUL
0x00 => STOP
...

The disassembled code is still very low-level and difficult to read, but as you will see, we can start making sense out of it.

Opcodes

Before we get started on our ambitious endeavour of completely deconstructing the bytecode, you’re going to need a basic tool set for understanding individual opcodes such as PUSH, ADD, SWAP, DUP, etc. An opcode, in the end, can only push or consume items from the EVM’s stack, memory, or storage belonging to the contract. That’s it.

To see all the available opcodes that the EVM can process, check out this handy gist from Pyethereum showing a list of the opcodes. To understand what each one does and how it works, Solidity’s assembly documentation is a great reference. Even though it’s not a one-on-one relationship with the raw opcodes, it’s pretty close (it’s actually Yul, an intermediate language between Solidity and EVM bytecode). Finally, if you can speak scientician, there’s always the Ethereum Yellow Paper to fall back on.

There’s no point in reading these resources from start to finish right now; just keep them around for reference. We’ll be using them as we go along.

Instructions

Each line in the disassembled code above is an instruction for the EVM to execute. Each instruction contains an opcode. For example, let’s take one of those instructions, instruction 88, which pushes the number 4 to the stack. This particular disassembler interprets instructions as follows:

88 PUSH1 0x04
|    |     |
|    |     Hex value for push
|    Opcode
Instruction number

Even though the disassembled code brings us one step closer to understanding what’s going on, it’s still quite intimidating. We’re going to need a strategy for deconstructing the whole thing, which has 596 instructions!

The Strategy

Problems that appear to be overwhelming at first usually succumb to the all-powerful, all-mighty “divide-and-conquer” strategy, and this problem is no exception to the rule. We’ll identify split points in the disassembled code and reduce it bit by bit, until we end up with small, digestible chunks, which we’ll walk through step by step in Remix’s debugger. In the following diagram, we can see the first split we can make on the disassembled code, which we’ll analyze completely in the next article.

You can find the end result of the entire deconstruction in the deconstruction diagram. Don’t worry if you don’t understand the diagram at first. You’re not supposed to. This series will go through it step by step. Keep it around so you can keep track of the big picture as we go along.

The series is divided into the following set of articles. If you’re up for the challenge, get started with the actual deconstruction in Part II. See you there!

2 Likes