How to find a function's body in the bytecode of a contract?

Hi all, I’m trying to figure out how to dynamically look-up a function’s body in the bytecode of a compiled contract.

The basis of my project is a helpful tool for users and the community to use where they input a token contract address from Ethereum Mainnet that they want to interact with, and the web-app tells them if the token is safe or not.
The idea is to help mitigate falling for scam tokens for less technical users - as during this bull market there’s many abound and I dislike seeing community users getting hurt.

The tool will pull the bytecode of the inputted address and attempt to validate specific functions of the token

Ex. Look up transfer(to, amount) and validate it doesn’t have a hidden fee on the user.

I read through the entirety of the extremely helpful series from OpenZeppelin here: https://blog.openzeppelin.com/deconstructing-a-solidity-contract-part-v-function-bodies-2d19d4bef8be/ and it answered most of my outstanding questions. Unfortunately, when putting it into practice I got a bit lost on the last step of the process.

I’m able to locate the function selector switch statement, in this case the first 8 bytes of transfer(address,uint256) hashed. Then I find the JUMP to instruction number to get to the function wrapper. I seem to find it well - but now I have trouble finding where to JUMP to get to the function body.

The example in the articles shared above are simple enough to follow, but in practice it seems like the function wrapper changes across different [valid] ERC20 contracts, while I thought it would remain constant as the parameters of the function and modifier (ie. non-payable) remain the same. Naturally, I’d expect the function body to change, but not the function wrapper.

Overall, I may be over-complicating this and instead of trying to dynamically read the contract to find the function body I could just check the entirety of the contract bytecode for a match against my validated transfer(...) function bytecode as a string? Then I think my issue is simply knowing how to transform a transfer(...) implementation into the equivalent bytecode/opcodes - which I currently don’t know how to do.

Please let me know your thoughts - I think this would be a cool user security tool!

2 Likes

Hi @Dyno,

You could also have a look at: OpenZeppelin's online ERC20 verifier: behind the scenes

Hi @abcoathup thanks for the link. I did see that - unfortunately it only checks function signatures, and not function bodies which is the added step I want to verify.

I may reach out to the original OpenZeppelin team that wrote the series and see if they have any tips.

@elopio @ajsantander sorry to tag you directly but noticed you did help author the original series I’m basing my knowledge on. Any chance you have any suggestions on how to implement such a tool?

I appreciate any help, thanks!

I’m sorry, I’ve been far from assembly for a while, so I can’t help much with the technical details as the compiler keeps changing.

You can take a look at how disassemblers work: https://github.com/Arachnid/evmdis
With the same problem, they have to be in sync with the compiler but this one hasn’t been updated in years.

Here chriseth points out some relevant details when trying to make the bytecode readable: https://ethereum.stackexchange.com/a/238
The compiled code will change depending on the compiler version and its flags.

So the way you are going would mean to deal with very hard to read bytecode.

Have you considered the alternative of analyzing the verified sourcecode uploaded to etherscan? The way I see it, if a contract doesn’t upload it’s sourcecode there these days, it’s hard to trust it anyway.