NOTE: This article is part of the series of posts about coding in assembly; please check out the entire series for more fun.
We always recommend to not use assembly. But there are a few cases when there is no other option, and it's better to be prepared. So, I'm crashing the series from my teammate @bachi to write little posts about assembly and what we are learning these days. On this one we will play a little with dynamic bytes arrays in memory.
Before I start deciphering assembly I need some courage. For this, I always turn to the series of posts written by @ajsantander about Deconstructing a Solidity Contract. And I always come out of it feeling stronger.
 But anyway, let's start slowly.
 But anyway, let's start slowly. 
First, let's paste this simple contract into the remix editor:
pragma solidity ^0.5.10;
contract AssemblyArrays {
  
  bytes testArray;
  
  function getLength() public view returns (uint256) {
      return testArray.length;
  }
  
  function getElement(uint256 index) public view returns (bytes1) {
      return testArray[index];
  }
  
  function pushElement(bytes1 value) public {
      testArray.push(value);
  }
  
  function updateElement(bytes1 value, uint256 index) public {
      testArray[index] = value;
  }
}
Familiarize yourself with the editor. Select the compiler, compile the contract, deploy it, execute a few functions, and debug them. If you have problems with any of this, you can ask for help opening a new topic on this forum.
Now, let's modify the getLength function to write our first lines of assembly 
function getLength() public view returns (uint256) {
  bytes memory memoryTestArray = testArray;
  uint256 result;
  assembly {
    result := mload(memoryTestArray)
  }
  return result;
}
There are a lot of things going on in these few lines. That's the thing with assembly, lots of code to do very simple things, we better get used to it  .
.
We are copying the testArray from storage to memory, because that's the focus of this post. We can talk another day about storage slots.
Before digging into the assembly block, let's note that the assembly instructions operate on words of 32 bytes. So, the mload instruction (or memory load) will push into the stack the 32 bytes stored at the memory position pointed by memoryTestArray.
Let's debug this. In Remix you can put a breakpoint by clicking on the line number. Let's put a breakpoint on line #11, so it looks like this:

Make sure you have compiled and deployed the contract again after updating the getLength function. Now let's insert the byte 0x05 into the array by calling the pushElement function, and call the getLength function which should return 1. Remember we are on this together, if you get lost, leave a comment and we'll find the way out together 
So, after calling getLength we can debug it. Hit the "Debug" button on the last call of the bottom panel, which will open the debugger on the left sidebar. There is a button for fast forward (like  ) that jumps to the next breakpoint. Let's click that one. If you have a different compiler or different settings things might not be exactly the same for you, but the core of it will be the same. The idea is to get the debugger before the
) that jumps to the next breakpoint. Let's click that one. If you have a different compiler or different settings things might not be exactly the same for you, but the core of it will be the same. The idea is to get the debugger before the mload is executed, which in my environment is instruction #0871.

While we are here, let's take a look at the Stack section of the debugger sidebar:
On top of the stack, on position 0 we have 0x0...80. This is the position in memory of the memoryTestArray, which will be the argument for the mload instruction.
Now, let's take a look at the Memory section of the debugger sidebar, starting at position 0x0...80:

What we have here are 31 bytes of 0x00, followed by 1 byte 0x01. Then 1 byte 0x05, followed by 31 bytes of 0x00. This might be a little confusing, so let's step back for a moment to note that 1 byte (8 bits) is represented by 2 hexadecimal digits (1 hexadecimal digit represents 4 bits). Also, that 0x10 in hexadecimal is equal to 16 in decimal. So on memory, position 0x80 holds 16 bytes, position 0x90 (which is 0x80 + 0x10) holds the following 16 bytes, then position 0xa0 (which is 0x90 + 0x10) holds the following 16 bytes, and position 0xb0 holds the final 16 bytes. Because the instructions in assembly operate on 32 bytes, if we call mload(0x80), it will put into the stack the 32 bytes that are in memory, starting at position 0x80.
Let's see this in action. Let's execute the mload instruction by clicking the "step into" button in the debugger, which is an arrow pointing down. Now, take a look at the top of the stack:
The mload instruction took what was on top of the stack: 0x0...80, and pushed instead the 32 bytes saved at that location in memory: 0x0...1. This is the most important thing to know about bytes arrays in memory, the first 32 bytes store the length.
Try inserting the element 0x06 into the array by calling the pushElement function. Then call getLength and debug it again. Again, mload will load 32 bytes of memory starting at position 0x80, but this time the contents of that memory will be 0x0...2. When we pushed the new element, Solidity updated the size of the array for us.
The other thing that changed in memory is that now at position 0xa0 we have 0x050600...00. So, in memory, a bytes array variable stores the length on the first 32 bytes, and then it starts storing the elements. First we pushed 0x05, and now we have just pushed 0x06.

Try pushing a few more elements, call getLength and debug it, to see the new bytes in memory. This will become clearer if we translate getElement to assembly:
function getElement(uint256 index) public view returns (bytes1) {
    uint256 length = getLength();
    require(index < length);
    bytes memory memoryTestArray = testArray;
    bytes1 result;
    assembly {
      let wordIndex := div(index, 32)
      let initialElement := add(memoryTestArray, 32)
      let resultWord := mload(add(initialElement, mul(wordIndex, 32)))
      let indexInWord := mod(index, 32)
      result := shl(mul(indexInWord, 8), resultWord)
    }
    return result;
}
Okay, this got scary all of a sudden! But I said things will be clearer. I promise they will be clearer. Just, as usual, let's go slowly and carefully.
The first super important thing is that we added the require statement to check that the index is not out of bounds. This is crucial when calling mload. We need to make sure that we are loading the right position in memory, otherwise we might be leaking information that the caller is not supposed to access, and this could open the door to critical attacks to our contract.
Next, let's look at the assembly block. Because mload reads 32 bytes at a time, it is not easy to read only 1 byte. If we divide the index by 32 and take the integer part, this will tell us the index of the 32 byte word in which the element we are looking for is present. Let's run a few scenarios in our mind:
div(0, 32) = 0
div(18, 32) = 0
div(32, 32) = 1
div(65, 32) = 2
Looks good. But remember that the first word at the location in memory pointed by memoryTestArray stores the length. So we need to add 32 bytes to find the initial element. Taking all of that into account, we are ready to load the 32 bytes that contain the 1 byte we need:
It is in the memory position of memoryTestArray, plus 32 bytes to skip the length, plus the wordIndex multiplied by 32 because each word has 32 bytes.
But we are not done yet. Now we need to extract 1 byte from that word. To do that, we need to find the index of that byte inside the word. This is the remainder of the index divided by 32, which we get with the mod instruction. Again, let's run a few scenarios in our mind:
mod(0, 32) = 0
mod(18, 32) = 18
mod(32, 32) = 0
mod(65, 32) = 1
Nice. Let's do one last thing to extract that byte. We shift to the left all the required bits in order to leave our byte at the front. The shl instructions does the shift one bit at a time, so we have to multiply the indexInWord by 8 to shift bytes.
Once we assign this 32 bytes word that starts with our byte to the result variable, it will drop all the other bytes because we declared it as bytes1.
A similar exercise to clarify the internals of arrays in memory and to play more with assembly would be to translate the updateElement function. I'll leave that as homework for you. If somebody wants to share their assembly, we can review it here together.
The case with pushElement is a little more complicated, though. If we have an array of 0 elements and want to push one, we need to make sure that there is free memory contiguous to the position in which our array is stored. The same will happen at each memory boundary. This could be a good opportunity to talk about the free memory pointer and the msize instruction, but this post is already big and the idea is to avoid multiple headaches all at once. Let's push it to the future.
Quite a journey, right? The final question is: in which cases should we handle arrays directly in assembly? I feel a very strong need to say "never", and leave this just as an exercise in which we learned about Solidity internals the hard way. What's really important are the friends we made along the way, not the gas we might save 
If you think you need to use assembly, I invite you to share your use case here and we will try our best to convince you otherwise. If we fail to convince you, then go ahead and use assembly. But we encourage you to request an audit afterwards.



