ERC20 Contract with UTF-8 Decoded

I need to write a contract that uses verbiage from a latin based language. Which takes special characters such as é, ç, ã, ô etc…

I’ve found an example here, also using open Zeppelin:

See line 520 532
https://etherscan.io/address/0xd15db6e8203fb3f9ce70714b306a1a2b03ebfe45#code

When trying to do something similar in my contract I get an error: Invalid character in string.

contract Ouro is ERC20, Ownable {

string public constant tokenMinuta = "Céu, Carraço"; // Céu, Carraço" is the Problem to solve here

constructor() ERC20("Ouro", "ORO") {
    _mint(msg.sender, 1000000000 * 10**decimals());
}

function mint(address to, uint256 amount) public onlyOwner {
    _mint(to, amount);
}

}

I then added UTF-8-encoded to those caracters with hopes that would compile correctly on either BSCScan or EtherScan

i.e.
string public constant tokenMinuta = "C\xC3\xA9u, Carra\xC3\xA7o";

The above compiles as it should in my terminal see here:

However it does not work when I add the source code to BSCScan or EtherScan.

https://testnet.bscscan.com/address/0x0C2cF99Fa25429B95C2916025997d91bdB0fEcE5#code

See where I added the UTF-8 - My hope is that this would compile on a human readable way and not like these:

My issue is how can write those special characters in the body of contract without getting compiling errors and therefore be able to verify that contract with those characters on a humanly readable ways on either BSCSan or EtherScan? To work just like the very first example I’ve provided above.

@abcoathup @Skyge Do you guys have any experience on mitigating this issue :point_up_2:
Thanks in advance!

Hi, I think it is ok to write special characters. Try to use a lower compiler version, I have deployed one on the Kovan: Storage | 0x9f234e8aeb06aabe3f2bb353b73babac5f0e4458 (etherscan.io)

You can have a look.

2 Likes

@Skyge Thank you for getting back to me - By using your recommended compiler version, pragma solidity ^0.5.11 - How do I still leverage the Open Zeppelin imports?

import "@openzeppelin/contracts/token/ERC20/ERC20.sol";
import "@openzeppelin/contracts/access/Ownable.sol";

The above errors since it wants the compiler to be ^0.8.0

Source file requires different compiler version (current compiler is 0.8.4+commit.c7e474f2.Emscripten.clang) - note that nightly builds are considered to be strictly less than the released version

Screen Shot 2021-05-03 at 1.32.40 PM

Hey! I would advise against going with an older solidity version, since new versions of the compiler have bug fixes that it’s good to have in your project. Besides, the check for non-ascii characters was introduced due to a critical vulnerability.

@d3vonks26 I’d say the important thing here is that the contract returns the correct string when queried, regardless of how it looks like in the source code. Any dev reviewing the source code should be familiar with that utf8 notation. I understand it’s not the answer you were looking for, but IMO it’s the cleanest solution.

2 Likes

@spalladino I see your point and from a development perspective it makes total sense. The source code should be the source code. I was initially intrigued if and how it was possible to do so based on the first example I provided in my question. Looks like I won’t be able to have both worlds here. Thanks for the advice and the link for the docs

1 Like

If you want to paste arbitrary utf-8 characters, you can simply use a Unicode literal:

string public constant tokenMinuta = unicode"Céu, Carraço";

This was not necessary in older Solidity versions but as @spalladino has pointed out, it was deemed a vulnerability and had to be changed in newer ones. Unicode is very complex and provides many characters that can change the way the text is displayed. It can be easily misused to trick editors into displaying strings in such a way that they look like a part of the executable code. Sanitizing them is very error-prone and the solution chosen in the compiler was to only allow plain ASCII characters in normal strings. You have to use escape sequences like \xC3 to get anything more exotic.

Unicode literals give you the old behavior but the fact that they have a prefix at least clues you in that the string might not be what it seems and it should really be a red flag if you see one.

4 Likes

@cameel that makes sense, it’s good to know this option is there but also the side effects of introducing such patterns. thank you