How would your life be IF you start learning assembly?

andresbach · August 1, 2019, 3:42pm

NOTE: This article is part of the series of posts about coding in assembly; please check out the entire series for more fun.

Captain's log day 0

You wake up one day in your blockchain dev life and you think:

– Huh, what's that assembly that everyone is talking about? What should I do?

Start reading right now the amazing post series from Ale, in which you will learn how the EVM interprets and executes the Solidity code. Only after that, you may RETURN.

– Ok, I read it! you may say...
Great then! JUMPI(DAY1,1)

yep, that's a wormhole

Captain's log day 1

If now you wonder:

– Huh, how should I work with conditionals in assembly?
Don't leave because THAT is what we will be dealing with today.

Let's start...

Writing the code, the hello-world of `if` cases

Our first goal is to tackle the most used if conditional statement. In assembly it's used like the regular Solidity one. The following is a trivial use of the conditional in Solidity:

function solidityIf() public pure returns (uint256 output)  {
    if (true) { output=0x1; }
}

It's written in assembly as:

function assemblyIf() public pure returns (uint256 output)  {
    assembly {
        if 0x1 { output:=0x1 }
    }
}

See? You don't have to be afraid of it =)
Just let me point out that calling solidityIf consumes 323 gas while calling assemblyIf only needs 310. In certain situations, like this one, coding in assembly can reduce the overall gas consumption. Yeey! Nevertheless, although you would like to squeeze every wei, it's recommended to achieve a better readability in the contracts rather than a minor gas optimization. That's a task for the compiler. So don't base your decision of using assembly on that; use assembly only when a higher level language - like Solidity - can't achieve certain functionality. You have been warned.

Writing the code, a true `if` situation (see the irony?)

In a real situation we wouldn't be hardcoding a true value inside the conditional, otherwise we would be wasting gas. So let's make things "a bit more complex" and let's try to output a one when I input a 26, number that it's strangely also the index of the Z letter in the English alphabet. Both functions should look like:

function solidityIf(uint256 input) public pure returns (uint256 output)  {
    if (input == 0x1a) { output=0x1; }
}

function assemblyIf(uint256 input) public pure returns (uint256 output)  {
    assembly {
        if eq(0x1a, input) { output:=0x1 }
    }
}

Okey, we have a new instruction: eq. This instruction eq(x, y) takes two parameters and outputs a one in case that both x and y are equal, but a zero in all other cases.

In our example, when the input value is 26 (0x1a in hexadecimal) the output is assigned as a one, but in all other cases, the output will be the default value: zero.

Now we realize that we wanted to do the opposite; we want to output a one in all cases EXCEPT when we introduce a 26. For that, the codes changes to:

function solidityIf(uint256 input) public pure returns (uint256 output)  {
    if (input != 0x1a) { output=0x1; }
}

function assemblyIf(uint256 input) public pure returns (uint256 output)  {
    assembly {
        if eq(eq(0x1a, input), 0x0) { output:=0x1 }
    }
}

With assembly it's better to go slow because it can get complicated easily, so let's re-analyze that line. The conditional if statement will come to life if, and only if, you have a true statement. In the previous case, the eq(0x1a, input) would return a one when input == 0x1a and a zero in all other cases. Taking advantage of that, you can use those zeros and compare it with a zero using eq again, reversing the output.

You may say Why don't we use the not instruction for that?
Because we can't, that's why. The not(x) instruction changes all the bits of the word at the same time, it acts as a bitwise complement or inverter. E.g. not(0x0) - where 0 is a 32 byte word full of zeros - would become a 32 byte word full of ones - the maximum number that we can represent with 256 bits -, but if we have now a not(0x1), that doesn't become a zero, it becomes a not(0x0)-0x1 where all the digits are ones except the last one - which is a zero.

What we could have used instead of the second eq is another instruction called iszero(x), which returns a one when x == 0x0 and a zero otherwise. That line would have been:

if iszero(eq(0x1a, input)) { output:=0x1 }

STOP! OPCODE TIME

Now that we are back on track, let's make use of what we learned from the post I mentioned in the beginning and try to see what bytecode does.

For the assemblyIf function we see that after the input is already placed in the stack, the important instructions are:

#	Opcode	Value
376	PUSH1	00
378	DUP1
379	DUP3
380	PUSH1	1A
382	EQ
383	EQ
384	ISZERO
385	PUSH2	0189
388	JUMPI
389	PUSH1	01
391	SWAP1
392	POP
393	JUMPDEST
394	SWAP2
395	SWAP1
396	POP
397	JUMP

Let's try to not get a nightmare from this.

Remember, we have ~~magically~~ the input in the stack. The first thing the bytecode does with the PUSH 1 00, DUP1, DUP3, and PUSH1 1A is setting the values for both comparisons (remember, the first one checks if the input is 26 and then the other EQ checks if the result was a zero). Those instructions are then called in 382 and 383.

Pay extra attention here because it's where the rest of the magic comes in. After 383 we should have a result based on the input in the stack: 1 for a non-26-ish value and a 0 for the 26. When the ISZERO instruction is executed, the value is inverted. After that, the value 0x0189 is pushed which corresponds to the line 393 in hex, and the JUMPI instruction is called which is a conditional JUMP that only jumps if the second parameter is a one. Here we have a crossroads: if the input value was not 26, the second parameter would be a zero (so it wouldn't jump), but if the input was 26, the parameter would be a one and the 388 JUMPI would takes us to line 393.

Okey, we skipped 4 lines, what gives? - you may wonder, but in those 4 lines (389-392) we swap a zero for the value that will be output - in our case, a one - :o

After that, the JUMP in 397 takes us to another place of the code where it finds an empty place in memory to save the final output, saves it, and returns it using the RETURN instruction. In the case that we jumped in 388, the output would be 0 because we didn't swap it for the desired value, but if we didn't jump - when input is not 26 - the output is 1.

Pretty cool, huh?
Could it be better? Yeah, sure! See that the ISZERO instruction in 384 mirrors the result from the second EQ? That means that the bytecode is mirroring the result twice, and the final result is the input that comes to the second EQ, so those may not be required to have the same functionality.

Conclusion

As we saw, coding in assembly shouldn't be as scary as it sounds. Yes, you should know what you are doing in assembly because its usage increases the likelihood of errors, but also sometimes it reduces the gas costs - as we saw in this case.

As you may know, this is the first article of a series of posts that we will be doing to have fun with assembly. In the next chapter we will continue having fun with another conditional statement that can be used in assembly but not in straight Solidity. ~~Can you imagine which one it is?~~

ylv-io · August 5, 2019, 3:12pm

Love the style and the puns!

balajipachai · August 12, 2019, 4:49am

It is a good starting point to get into assembly.
However, I noticed there is no line number 390 in the instruction set.

andresbach · August 12, 2019, 7:49pm

Hey @balajipachai, thanks for the note and for reaching out!

As you may see, there are more places where 1 - or more - line is missing like the one you said. Other places are the 377, 381, and 386-387.

Those lines correspond to the data or value that uses the instruction and add to the execution flow. For instance, 377 corresponds to the pushed value 00, 381 to the 1A, and so on. For the PUSHi instructions, you will be using i more lines to enter the values after the instruction. That’s why the PUSH2 of line 385 skips 2 lines after the instruction, but the rest of the PUSH1 only takes one more line with them.

balajipachai · August 13, 2019, 2:42am

Thanks, the explanation helps, cheers!!!

Topic		Replies	Views
How to survive in the Assembly world? Guides and Tutorials tutorial , assembly-series	1	2259	August 12, 2019
SWITCH the way you use conditionals! Guides and Tutorials tutorial , assembly-series	2	4930	October 30, 2019
Playing with dynamic arrays in assembly Guides and Tutorials tutorial , assembly-series	3	5792	July 31, 2020
Using assembly to loop and modify memory-array? Smart Contracts	3	350	August 26, 2023
Assembly: How concatenation opcodes in Creation phase working? Support	2	441	April 17, 2020