How to find a function's body in the bytecode of a contract?

Hi all, I’m trying to figure out how to dynamically look-up a function’s body in the bytecode of a compiled contract.

The basis of my project is a helpful tool for users and the community to use where they input a token contract address from Ethereum Mainnet that they want to interact with, and the web-app tells them if the token is safe or not.
The idea is to help mitigate falling for scam tokens for less technical users - as during this bull market there’s many abound and I dislike seeing community users getting hurt.

The tool will pull the bytecode of the inputted address and attempt to validate specific functions of the token

Ex. Look up transfer(to, amount) and validate it doesn’t have a hidden fee on the user.

I read through the entirety of the extremely helpful series from OpenZeppelin here: https://blog.openzeppelin.com/deconstructing-a-solidity-contract-part-v-function-bodies-2d19d4bef8be/ and it answered most of my outstanding questions. Unfortunately, when putting it into practice I got a bit lost on the last step of the process.

I’m able to locate the function selector switch statement, in this case the first 8 bytes of transfer(address,uint256) hashed. Then I find the JUMP to instruction number to get to the function wrapper. I seem to find it well - but now I have trouble finding where to JUMP to get to the function body.

The example in the articles shared above are simple enough to follow, but in practice it seems like the function wrapper changes across different [valid] ERC20 contracts, while I thought it would remain constant as the parameters of the function and modifier (ie. non-payable) remain the same. Naturally, I’d expect the function body to change, but not the function wrapper.

Overall, I may be over-complicating this and instead of trying to dynamically read the contract to find the function body I could just check the entirety of the contract bytecode for a match against my validated transfer(...) function bytecode as a string? Then I think my issue is simply knowing how to transform a transfer(...) implementation into the equivalent bytecode/opcodes - which I currently don’t know how to do.

Please let me know your thoughts - I think this would be a cool user security tool!

3 Likes

Hi @Dyno,

You could also have a look at: OpenZeppelin's online ERC20 verifier: behind the scenes

Hi @abcoathup thanks for the link. I did see that - unfortunately it only checks function signatures, and not function bodies which is the added step I want to verify.

I may reach out to the original OpenZeppelin team that wrote the series and see if they have any tips.

@elopio @ajsantander sorry to tag you directly but noticed you did help author the original series I’m basing my knowledge on. Any chance you have any suggestions on how to implement such a tool?

I appreciate any help, thanks!

I’m sorry, I’ve been far from assembly for a while, so I can’t help much with the technical details as the compiler keeps changing.

You can take a look at how disassemblers work: https://github.com/Arachnid/evmdis
With the same problem, they have to be in sync with the compiler but this one hasn’t been updated in years.

Here chriseth points out some relevant details when trying to make the bytecode readable: https://ethereum.stackexchange.com/a/238
The compiled code will change depending on the compiler version and its flags.

So the way you are going would mean to deal with very hard to read bytecode.

Have you considered the alternative of analyzing the verified sourcecode uploaded to etherscan? The way I see it, if a contract doesn’t upload it’s sourcecode there these days, it’s hard to trust it anyway.

@Dyno have you made any progress on this? I'm currently working on trying to figure out if ERC20 tokens are safe as well. Specifically, I'm want to know if there any transfer fees, and if so what how much is the transfer fee as a percentage. Have you seen honeypot.is? They have something similar and I'm trying to reverse engineer what they are doing. I've been able to find a contract they have on Binance, here: 0x2bf75fd2fab5fc635a4c6073864c708dfc8396fc. I've looked at the decompiled code and it seems like they're calling a contract with in the contact, which in turn calls the router address of PancakeSwap, on the getAmountsOut and swapExactTokensForETHSupportingFeeOnTransferTokens, in order to figure out the difference in tokens received. Does anyone know what the benefit would be to having a contract inside a contract for this purpose, or have any resources that would help figure this out?