ERC721 Development: Should you write metadata for tokens before or after minting?

IttaiSvidler · August 17, 2021, 9:06pm

If you're building an ERC721 collectible series, should you write metadata (to IPFS for off-chain metadata, and directly to the contract for on-chain metadata) before a user mints a token, or should you write the metadata after a user mints a token?

If the answer is before, then doesn't that effectively hurt the "randomness" of the minting process, since users can inspect IPFS or the on-chain metadata to see which tokens they would mint? If yes, does it matter?

If the answer is after, should you implement this by having a server listening for a "minted" event from the smart contract, that then writes then generates and writes the metadata?

frangio · August 19, 2021, 12:40am

Yes, in fact Meebits was exploited in this way. Whether it matters depends on the project. In this case, the attacker was able to mint a rarer NFT which presumably would be worth more, by abusing the sale.

In this space, the ideal is generally to minimize trust placed in other parties. So a centralized server, in my opinion, would generally be against the ethos.

I know I'm not giving concrete answers, sorry. I think it's possible to come up with a good design with nice properties, but it will depend on the specifics of each project.

swixx · August 24, 2021, 6:04am

To be fair, the Meebits exploit happened due to the way in which randomness was "achieved". Had they used VRF, I don't believe that would have happened.

I think the larger question (outside of how one gets their randomness) is the sequencing of generating a random NFT.

Let's say I have an ERC721 with 100 tokens, and there are 3 different rarities - bronze, silver, and gold. Furthermore, let's say as part of the metadata I would like to associate a given contextual attribute with the metadata as well - let's say what device minted it - iPhone, Android, etc.

I cannot seem to find a workflow that supports this. You can't write all metadata before, because you are missing a fundamental attribute. However, you can't do any lazy minting, because then theoretically you could game the randomness.

What would you call best practice to avoid this dilemma?

frangio · August 24, 2021, 8:51pm

Are you aware of NFT projects using VRFs? That sounds interesting.

One thing you can do is store that "contextual" data on-chain at the time of purchase. You can later use this stored information to generate the token URI. Some options:

Include a query parameter in the URL:
tokenURI(id) = baseURI() + "/" + id + "?device=" + device(id)
Though this will not be embedded in the metadata file itself if your baseURI is an IPFS directory.
Have separate baseURI per device:
tokenURI(id) = baseURI(device(id)) + "/" + id
Each can be a different IPFS directory.
Return a data URI, something like:
tokenURI(id) = "data:application/json;base64," + Base64.encode(metadataJson(id, device(id))
See UniswapV3 for a reference of this approach.

swixx · August 25, 2021, 2:06am

In my upcoming project we'll be using VRF to generate a random id to mint! I'll keep you posted

On the three ideas:

I like this idea but there is something so tangible about it being in the JSON that I would opt out for this
This entails (through no fault of your own - it comes from my own faulty example) that we know a priori all device_ids, which unfortunately for my use case, we don't.
I found this approach to be very interesting. It does entail holding all metadata on chain (besides, theoretically, the image/media itself, which could still be held on Arweave and just the id held on chain). I wonder about the price of this.

Let's say we had metadata of something like:

metadata = [
   { value: nftA, frequency: 2, arweaveLink: 0xA },
   { value: nftB, frequency: 3, arweaveLink: 0xB },
   { value: nftC, frequency: 5, arweaveLink: 0xC },
]

I'm thinking we then could do something (off chain) like:

distribution = []
for (int i = 0; i < metadata.length; i++) {
    metadata[i].id = i;
    for (int j = 0; j < metadata[i].frequency; j++) {
         distribution.push({ id: i, customValue : 0});
    }
}

We then can have a contract like:

contract My721 is ERC721 {
  /* STRUCTS */
  
  struct TokenStruct {
      uint256 id;
      uint256 customValue;
    }

  struct FamilyNFT {
    uint256 id;
    uint256 frequency;
    bytes32 value;
    bytes32 arweaveLink;
  }

  /* STATE VARIABLES */
  TokenStruct[] public tokens;

  mapping(uint256 => bool) public minted;

  // NOTE: I suppose this could be a mapping,
  // but I didn't want to loop through the `_differentNfts` on-chain.
  // The correct data structure I suppose depends on the size, i.e. number
  // of "uniqueNfts" we have?
  FamilyNFT[] public differentNfts;

  // these probably are memory/calldata or something :)
  constructor(TokenStruct[] _tokens, FamilyNFT[] _differentNfts) ERC721("NFT", "My721") {
    tokens = _tokens;
    differentNfts = _differentNfts;
  }

  // Of course we wouldn't use an unbounded loop or this naive of an implementation,
  // but just to get the point across...
  function mint(uint256 _customValue) public {
    // this represents something that theoretically is only known at mint time
    // but we want in the metadata
    require(_customValue != 0, "mint:: customValue must be non zero!");

    while (false) {
      uint256 randomNumber = getRandomNumber() % tokens.length;
      // indicates not yet minted
      if (minted[randomNumber].customValue != 0) {
        minted[randomNumber].customValue = _customValue;
        _mint(randomNumber, msg.sender);
        break;
      }
    }
  }

  function tokenURI(uint256 tokenId) public returns (string) {
    // First get this token
    TokenStruct token = tokens[tokenId];
    // Then get the family it belongs to
    FamilyNFT nftData = differentNfts[token.id];
    return
      string(
        abi.encodePacked(
          "data:application/json;base64,",
          Base64.encode(
            bytes(
              abi.encodePacked(
                '{"value":"',
                nftData.value,
                '", "tokenId":"',
                tokenId,
                '", "customValue":"',
                customValue,
                '", "imageLink": "',
                nftData.arweaveLink,
                '"}'
              )
            )
          )
        )
      );
  }
}

Thoughts?

frangio · August 26, 2021, 3:36pm

Looks good, except that for numbers encodePacked is not going to work. It will include the bits that represent the number, and you want to concatenate its ASCII representation, so you should use Strings.toString(uint256).

swixx · August 26, 2021, 4:57pm

Thanks so much for the review!

Wondering at what point that amount on chain becomes so expensive as to not be feasible? Any ideas how to test this/pitfalls to look out for?

In any event, I do think it may be the only way to achieve what I'm looking for, so there will be tradeoffs

frangio · August 26, 2021, 8:33pm

Well this is a function that is really only ever invoked off-chain via eth_call. So the fact that it is expensive isn't so much of a concern. You could look at the gas cost of Uniswap V3's tokenURI to have as a reference value, because it looks expensive and it can serve as a safe upper bound.

swixx · August 26, 2021, 9:41pm

Sorry, was referring to the idea of that amount of storage on chain. You would turn the tokenURI into a view function anyway so as not to take any gas.

It's basically the question of how much is too much when it comes to on-chain storage.

frangio · August 31, 2021, 2:09am

I don't know how much more data you plan to store, but from what you've shown so far it sounds very reasonable to me.

You can use smaller data types (like uint32) to fit more values into a single storage slot. It will be cheaper.

swixx · August 31, 2021, 5:17am

One concern is that we cannot copy calldata or memory to storage, which means we would have to iterate over _tokens and _differentNfts.

I worry this is prohibitively expensive gas wise. For example, I just ran the following with eth-gas-reporter and ids.length == 5000, and exceeded the gas limit by quite a bit:

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.7;

pragma experimental ABIEncoderV2;

contract Contract {
    struct Struct {
        uint id;
        bytes32 imageLink;
    }

    mapping(uint => Struct) structs;
    function initialize(uint[] calldata ids, bytes32[] calldata links) public {
        for (uint i; i < ids.length; i++) 
            structs[i] = Struct(ids[i], links[i]);
    }
}

InvalidInputError: Transaction gas limit is 13694632 and exceeds block gas limit of 12450000.

This drastically prohibits the feasibility of this approach. I hate to dip into the world of assembly, and I'm not sure if that will help or not.

Is there anything you can thing of to avoid this?

frangio · September 2, 2021, 8:23pm

Using assembly will not help. Writing to storage is expensive in any way you do it.

This is not something you should do in a single transaction. I imagined you were going to store the data each time an NFT is minted. This should be ok in terms of gas.

If you know all of this data at the time of deployment, you should use IPFS!

serg · November 1, 2021, 5:03am

I would like to share my idea on solving this problem (been thinking a lot lately about fair random minting) Please check the evolution of my ideas. First two assume all metadata is uploaded to IPFS prior the minting for transparency and fairness. In this case we need to ensure randomness of assigning the tokens:

My first idea was kind of what @swixx shown above with their code snippet (generating random ID and checking it against the list of minted token IDs), but it does become prohibitively expensive progressively as the number of non-minted tokens depletes because you have to read a lot from expensive storage as mentioned by @frangio - not an option (especially for tokens with average-to-big supply). Not to mention the on-chain RNG which is prone to attacks like the mentioned Meebits exploit.
Getting RNG seed off-chain (eg. Chainlink VRF) - solves RNG problem, but does not solve gas fees problem with iterations - still not an option.

Then eventually I came to option 3 - sequential token IDs with batched metadata uploads. In this case, the contract creator would upload metadata to IPFS periodically, eg. every N mints. With total supply being T = M x N, M would the number of such batches.

Now you would say it defeats the purpose of on-chain fairness - but I have a solution to this. The contract creator can hard-code an M-length array of MD5 digests for each metadata batch, and then just unveil the batches JSON on their website as they are uploaded to IPFS. This guarantees that all the metadata was pre-generated in advance and was not tampered with afterwards (you can easily calculate MD5 digest on a string and compare it with what's in the contract).

The only risk with this approach is that the contract creator may disappear and won't upload the remaining metadata. In this case it may affect certain number of buyers in the last non-revealed batch, and afterwards the project would go bust anyway. But I think this problem is much wider than a few buyers not getting their token metadata, so for serious projects this shouldn't be a concern.

I am happy to hear back your thoughts on this.

Topic		Replies	Views
Building a design for NFTs Smart Contracts	2	604	August 23, 2021
Dynamically create metadata for ERC721 Support	6	1432	March 12, 2021
Understanding the Meebits Exploit Security erc721	7	7724	May 18, 2021
ERC721 Metadata Storage + Reading Contract Data Support erc721 , design	9	2687	September 20, 2021
Set token metadata right before mint Smart Contracts	2	499	January 21, 2023

ERC721 Development: Should you write metadata for tokens before or after minting?

Related topics