How linear is the relationship of solidity code to compiled bytecode?

scripples · September 8, 2021, 3:48am

Hello all, amateur solidity programmer and python guy here. I've been thinking about ways of plying some of my machine learning experience for blockchain applications, and I decided to try and see if some of the recent advances in transformers might be useful in transformimg bytecode/decompiler bytecode into more human-readable functions.

Unfortunately, I don't have the raw compsci knowledge that would help to answer a pretty fundamental question about whether or not this will work at all--and that has to do with how closely compiled bytecode mirrors the smart contract it is compiled from, in terms of which variables and functions follow which, if the two are simply compared as raw strings.

This is important because transformers are generally designed to match encodings near each-other, sequentially speaking. If, say, the fallback() function is declared at the beginning or the middle of the solidity code, but is always placed at the back of the bytecode, this would introduce noise, especially in longer contracts.

So how linear are they, really? Does the compiler like to move stuff around, or are they fairly close to 1:1? If the latter, there's a good chance that a transformer based solution could provide a very useful tool for reverse-engineering bytecode.

Topic		Replies	Views
Stupid linearizer (to optimize for gas)? General	6	546	January 2, 2020
Problem with Bytecode from EVM Tutorial on OpenZeppelin Site Smart Contracts tutorial	2	682	June 28, 2024
Different solidity contracts return the same binary Smart Contracts	0	461	January 25, 2022
Noob solidity dev needs help to build a simple contract Smart Contracts erc20	4	294	February 7, 2024
Deconstructing a Solidity Contract — Part II: Creation vs. Runtime Guides and Tutorials	0	1873	May 18, 2019

How linear is the relationship of solidity code to compiled bytecode?

Related topics