Are NFT projects doing starting index randomization and provenance wrong or is it just me?

guyo13 · August 18, 2021, 2:48pm

Hi,

There is a pattern in many PFP projects where the tokenURI does something like:

function tokenURI(uint256 tokenId) public view virtual override returns (string memory) {
        require(_exists(tokenId), "ERC721Metadata: URI query for nonexistent token");

        string memory baseURI = _baseURI();
        return bytes(baseURI).length > 0 ? string(abi.encodePacked(baseURI, tokenId.toString())) : "";
}

Where the baseURI is some IPFS directory.

This is the standard implementation from OpenZeppelin v4 ERC721.sol and it's fine.

However, relying on this implementation along with random startingIndex generation and (inevitably) a setBaseURI method - negates the entire point of a fair and random distribution:

Since the startingIndex is not known beforehand, it is not possible to generate the mapping between the tokenId and the initialSequenceId.

This leads the devs to upload the metadata only after the startingIndex has been set.

The mapping formula for reference :
F: {0...MAX_TOKENS-1} -> {0...MAX_TOKENS-1}
F(tokenId) = initialSequenceId = (tokenId + startingIndex) % MAX_TOKENS
(depends on the constant startingIndex)

This presents us with some problems:

Primo, the devs and team are free to determine (aka Man-in-the-Middle) the order and allocation of the metadata (items in the collection) - the so-called "Original"/"Initial" sequence ID is meaningless.

Secundo, the metadata of the entire collection is not known before the sale ends which usually means the art itself will not be visible, unless otherwise exposed by the team - which requires additional and unnecessary efforts (see solution below). Take note that it is OK to expose the collection before and during sale - it only becomes a "metadata-leak" if the order in relation to the tokenId (aka mapping) is known as well.

Tertio, the devs are forced to implement a setBaseURI method - which allows them to defacto change the allocation of items in the collection - unless a locking mechanism is implemented.

Proposed Solution:

Follow these steps:

Generate the metadata (and art if this is a generative art project).
Establish the collection order using a computer RNG and enumerate the metadata (and art) files from 0 to MAX_TOKENS - 1. If you want to do a late reveal and display a mystery image, create an additional metadata file named -1 which will point to an IPFS link to a mystery image/video/file.
Create the provenanceHash (sha256 string of concatenation of sha256 strings of ordered art files/images).
Upload the art directory to IPFS and update the metadata files to point to the correct IPFS directory link (or Arweave path manifest, whatever you use).
Upload the metadata directory to IPFS.
Deploy your ERC721 contract with the baseURI variable set to the IPFS dir of the metadata
and the provenanceHash variable set to the calculated sha256 string (you can use a const variable in Solidity or pass a constructor argument).

In your contract, implement tokenURI as follows:

function tokenURI(uint256 tokenId) public view virtual override(ERC721) returns (string memory) {
        require(_exists(tokenId), "ERC721Metadata: URI query for nonexistent token");

        string memory baseURI = _baseURI();
        string memory sequenceId;

        if (startingIndex > 0) {
            sequenceId = ( (tokenId + startingIndex) % MAX_TOKENS ).toString();
        } else {
            sequenceId = "-1";
        }
        return string(abi.encodePacked(baseURI, sequenceId));
}

Make sure your startingIndex cannot be set to 0!

In this manner, the original sequence order as well as the provenance hash are well-known at contract deployment time (enforced by IPFS/Arweave immutability), they have actual meaning and there is no need to implement setBaseURI (although you could if you want).

Or am I wrong?

frangio · August 18, 2021, 8:42pm

This seems interesting but the post lacks some background information and context that I don't have. I don't understand what startingIndex or initialSequenceId are. Can you share an example project and code that have done this?

guyo13 · August 20, 2021, 10:52am

You're right, I didn't provide enough background.

The kind of projects I am referring to are NFT profile pictures which are typically limited to somewhere between 500 to 12000 tokens.

For an example see Bored Ape Yacht Club (Contract).
The creators implemented the same mechanism but suffer from the problem I described.

Back to startingIndex and initialSequenceId.

Imagine you have a collection of these profile pictures (or any other art for that matter).
You would like to setup a way for people to mint items from the collection, without them knowing what items they are going to get at mint time.

The reveal happens when the entire collection has been sold or a certain amount of time had elapsed (Reveal Time).

The rationale is that some items are rarer than others or that you want to create an element of surprise and suspense.

In order to do that, you first assign a certain order to the images in your collection - this is the Initial Sequence - hence each piece in the collection has an initialSequenceId. This sequence is zero-indexed.

Then, after all items have been minted or said time had elapsed, a function invocation inside the contract creates the startingIndex:

function setStartingIndex() public {
        require(startingIndex == 0, "Starting index is already set");
        require(startingIndexBlock != 0, "Starting index block must be set");
        
        startingIndex = uint(blockhash(startingIndexBlock)) % MAX_SUPPLY;
        // Just a sanity case in the worst case if this function is called late (EVM only stores last 256 block hashes)
        if (block.number.sub(startingIndexBlock) > 255) {
            startingIndex = uint(blockhash(block.number - 1)) % MAX_SUPPLY;
        }
        // Prevent default sequence
        if (startingIndex == 0) {
            startingIndex = startingIndex.add(1);
        }
    }

The startingIndexBlock is a sort of "salt" for generating a number between 1 and MAX_SUPPLY - 1.
The startingIndexBlock is set to the block.number of the last mint transaction or the first mint operation following Reveal Time. The owner can also set it prior to that by invoking:

function emergencySetStartingIndexBlock() public onlyOwner {
        require(startingIndex == 0, "Starting index is already set");
        
        startingIndexBlock = block.number;
    }

As we've seen in my previous post, the startingIndex determines the mapping between on-contract tokenIds (integers from 0 to MAX_SUPPLY - 1) and initialSequenceIds as given by the formula:

initialSequenceId = (tokenId + startingIndex) % MAX_SUPPLY
That mapping is a bijection.

For example, given a 10,000 items collection and a startingIndex = 1000, the resulting mapping is:
{ <0, 1000>, <1, 1001>, ..., <8999, 9999>, <9000, 0>, <9001, 1>, <9999, 999>}
Where the first component is the tokenId and second component is the initialSequenceId.

You can see real examples at Bored Ape Yacht Club as well as Hashmasks (search for their "Provenance" page, I can't add more than 2 links )

Hope that provides good context

frangio · August 20, 2021, 1:56pm

Ok I see what you're talking about now.

Yeah I think the approach you propose is a good one. Basically setting a random offset k at the end of the sale so that token i corresponds to image (i+k)%N.

One downside of this is that the token id doesn't match the image id. I don't know how much of a problem that is.

mhbln · August 24, 2021, 6:35am

It is a good approach, but it will be weird for the user to own tokenId: 0 and using functions like tokenByIndex which will tell them they own token 0, but then if they go on opensea or something similar it will say they own Avatar #2355 for example, but you could just not give it a name in the metadata and that would solve the issue for opensea at least. still might be a big confusing.

guyo13 · August 24, 2021, 6:21pm

Yourself and @frangio are correct,

tokenId and image id will not match and it may be confusing -
Comes down to preference I suppose.
Naming each piece in the collection vs using the sequence id is a good solution.

swixx · August 24, 2021, 7:21pm

@guyo13 Great post to expose a potential issue. Is there anyway to use this pattern without the "wait until all pieces minted" portion?

That's to say, have a randomized set of metadata where a user doesn't know what he/she will receive until time of mint but maintain the provenance set?

I suppose you could use a VRF to generate a random number on mint, and then try to find a non-gas intensive way to track if that random number had already been minted. Would that work?

mhbln · August 25, 2021, 8:11am

@swixx You could do it with chainlink but with 10.000 pieces might cost you a bit actually. You could do it in the contract, but than people could cancel there order and redo it until they have a number they want, but this is gonna be price intensive for the attacker cause of gas, so if you drop a collection which is cheap its unlikely anybody will do it, would cost him/her too much.

Chainlink would probably cost you 2 LINK per call on mainnet (at least that is what the website says), so with 10k pieces its at least 520k USD and you need obviously more because there might be more people calling the function and canceling it and what not and you need to charge your contract beforehand. Very expensive.

swixx · August 25, 2021, 5:32pm

Agreed. Seems like it may be unavoidable if that's desired infrastructure, however. I see many projects that have asymmetric information that "hide" certain metadata attributes but that information is available to the team itself, creating inequities in the community. For that reason, I've been looking to see how to solve this problem.

I came across this VRF code yesterday - haven't vetted it yet but could help in this instance.

Also came across this Chainlink announcement yesterday that has an NFT gaming platform use VRF (to be fair, I'm not sure in what capacity.

I wonder if @PatrickAlphaC could help shed some light on the potential high barrier to entry using VRF for a 10.000 piece collection could cause?

PatrickAlphaC · August 25, 2021, 6:13pm

Thanks for tagging me here.

For sure, a piece of 10,000 NFTs wouldn't be much of a problem using Chainlink VRF. You could use the expanding best practice to get all 10,000 random numbers from 1 randomness call. This means, that for 10k pieces, it will still only cost you 2 LINK. Even for 100,000 pieces (so long as it fits in 1 transaction). You'd only have to pay then for the ETH gas.

I'm made some repos that showcase this functionality if you're looking for examples:

I think this solves the issue you're running into. Let me know if I didn't make sense here!

mhbln · August 26, 2021, 3:04pm

@PatrickAlphaC I looked at the contracts and it makes sense to get multiple numbers for different traits lets say, but to asign them a random number on mint they would need to call the function each time?

If you'd do in the constructor on launch and would do it beforehand index 0 means n 34 then everybody would know that before and bots could still take advantage of that, snoop out the metadata and then just mint at the right time.

Or if you do it after setting some sort of offset, like described in this thread you dont necessarily need chainlink for that offset number.

But how would you do it on mint? I dont get how you would pull 10,000 random numbers beforehand without anybody snooping up on the results.

Imagine this is the mint function:

 for (uint256 i = 0; i < _numberOflol; i++) {
            uint256 mintIndex = totalSupply();
            if (totalSupply() < MAX_lol) {
                _safeMint(msg.sender, mintIndex);
            }
        }

I don't see how you would be able to do that with one call for all the 10,000

PatrickAlphaC · August 26, 2021, 3:22pm

If you want to make 10,000 NFTs with each having random traits in 1 call, you could use that randomness expansion I linked. People won't be able to snoop it because they are all processed in the same transaction, the exact same call that the Chainlink returns the random number. And it's still random (so muuucchhh better than just offsetting some pseudo-random number) since no one can guess the "seed" randomness. If you look to expand this number after the original transaction, then yes, that is a "bad" way to get randomness.

However, if you want the random number or NFT creation across transactions, then you'd have to make a Chainlink VRF for each one.

Ideally, you'd mint on a Layer 2 and everything would be much cheaper. You can see all the different fees across networks here.

Does that make sense?

mhbln · August 26, 2021, 3:46pm

@PatrickAlphaC yeah layer 2s are nice and cool, love polygon and optimism, but its to complicated for the average joe, there has been projects who minted on L2 and as I recall it did not go so well, everything happens directly on ETH, nobody wants to bridge their stuff. For the average joe metamask UX is already not grea, but telling my art friends they gotta bridge something because the old eth system is to expensive, not really 2021 user experience you want to have.

Anyway, back to the point. Yeah I would need a random number on mint like following code:

function _mint(address _to) internal returns (uint) {
        uint id = randomIndex();
        numTokens = numTokens + 1;
        _mint(_to, id);
    }

 function randomIndex() internal returns (uint) {
        uint totalSize = TOKEN_LIMIT - numTokens;
        uint index = uint(keccak256(abi.encodePacked(nonce, msg.sender, block.difficulty, block.timestamp))) % totalSize;
        uint value = 0;
        if (indices[index] != 0) {
            value = indices[index];
        } else {
            value = index;
        }

        if (indices[totalSize - 1] == 0) {
      
            indices[index] = totalSize - 1;
        } else {
            indices[index] = indices[totalSize - 1];
        }
        nonce++;
        return value.add(1);
    }

Except that this randomIndex function right here could be snooped up, user than could cancel their transaction if they dont like what they got. In order to this with Chainlink it would need be called on the mint, which means it needs to be called every time right?

PatrickAlphaC · August 26, 2021, 4:47pm

Ah I see, glad you recognize this could be snooped!

So yes, if you want 1 mint == 1 NFT, then yes, you'd have to call Chainlink VRF each time.

However, if you wanted 1 mint == 10,000 NFTs, then you'd only need to call the Chainlink VRF once, and each would still be random and un-snoopable.

In both of these cases, after the original transaction finishes, you can no longer use the expand randomness ability.

Does this make sense?

mhbln · August 26, 2021, 5:42pm

@PatrickAlphaC Completely patrick, but unfortunately for my use case I can not use chain link, because people will mint themselfs, thats leaves me with doing the offset thingy.

swixx · August 26, 2021, 6:34pm

@mhbln interested to hear how you avoid the snooping issue then? Obviously it requires a certain technical talent/willingness to spend gas to cheat, but we've seen from Meebits that exists. So is the best solution to go with the known gameable solution?

@PatrickAlphaC i know this is the problem that Chainlink aims to solve, but with a 2LINK/call fee, that may squeeze some members of the community out from using it. Any thoughts?

mhbln · August 26, 2021, 9:24pm

@swixx it depends. If you have a cheap drop I highly doubt ppl will spend so much gas.
Well, all the solutions kinda suck at this point. The smartest and most "decentral" way to do it is probably the idea proposed here. You can upload the Metadata before and then set the offset on the end. You can do that either a simple block.number solution or you could use chain link, though I think you don't need that. Because at that point everything is already Minted.

Let's face it. People actually don't give two fucks about it. You should look at all these contracts. Pudgy penguins, cool cats, they point to meta data on their own server. They could just take down the server and your nft is gone. Or they could simply replace it with another. It ridiculous. I think in this field mostly ppl think they care but actually don't or have no idea what they invest in. So dont over think it I would say.

PatrickAlphaC · August 26, 2021, 10:28pm

You should code your project whichever you deem best. However, understanding the risks is really important. If do deploy as you say, it will either:

Cap how big your project can be
Give hackers an easy exploit angle

If you gain any value on your project, you're setting yourself up to be hacked.

2 LINK / call is to cover the cost of true randomness, it costs gas to the Chainlink node operators to verify the randomness, and this is only for mainnet Ethereum. However, the added security of your protocol with true randomness gives a level of validity to your project. It's the difference between randomness being worthless and meaningful. I've seen a number of projects be concerned about the added cost, but then realize the validity of true randomness makes the project truly valuable and gone forward to see great success.

There is a reason that projects like Axie Infinity, Ethercards, Aavegotchi, and many other successful NFT projects that need randomness use Chainlink VRF.

mhbln · August 27, 2021, 6:37am

I get it: Chainlink is a great product.
If I would do a serious game like axie, I would use it too, but this generative avatar projects were the randomness would be only necessary for the mint its too much overhead, they usually sell for 0,04 eth. Nobody wants to spend 10.000 x 2 LINK for that if its only necessary for the mint, also no hacking will happen after that, the approach described here is completely fine for the given use case. And yes for another it probably wouldn't be enough and you should rather use Chainlink.

jalil · August 27, 2021, 11:07am

What i don't like about the OpenZeppelin base implementations is the idea of making the baseURI editable. I think common practice should be to deploy the contract WITH the IPFS CID as the provenance hash and have it unchangeable.

This way, devs can't move around metadata files after sale is complete.

Note, i don't mean necessarily publishing / pinning the metadata before launch, but publishing the eventual hash of it before launch.

I think this solves the biggest attack vector / scam opportunity with current NFT practices.

I built a little extension that handles this here: https://github.com/1001-digital/erc721-extensions/blob/main/contracts/WithIPFSMetaData.sol

That paired with semi random assignment (sorry @PatrickAlphaC ) should be enough for 99% of NFT projects to make them SCAM resistant (also by the dev team). Simple (semi) random assignment extension here: https://github.com/1001-digital/erc721-extensions/blob/main/contracts/RandomlyAssigned.sol

Topic		Replies	Views
ERC721 Storing on IPFS without revealing not minted tokens Smart Contracts	3	1149	November 12, 2021
startingIndex and Provenance purpose for generative NFT collection Smart Contracts erc721	4	716	January 10, 2022
How to show the id of the nft in an ERC721 contract after the name when sending custom metadata function tokenURI Smart Contracts erc721 , solidity	21	1150	February 3, 2023
ERC-721 Solidity Openzeppilin Code Append More NFT Token Smart Contracts erc721 , nft	0	988	April 10, 2022
Linking IPFS in ERC-721 Contracts	5	1206	March 25, 2022

Are NFT projects doing starting index randomization and provenance wrong or is it just me?

Related topics