Solana NFT Database - Retrieving On-chain Metadata
Goal
The goal of this document is to present a way to automatically get NFT’s on-chain metadata from the NFT’s mint address.
After retrieving the metadata for a single NFT, an efficient method for batch download of on-chain metadata will be presented.
Introduction
In some of the previously written documents a method for scraping Metaplex NFT’s from Solana blockchain was presented and reasonably efficiently implemented in Node.js. The result was a file named NFTs.pkey
which contained a newline-delimited list of (almost) all Metaplex NFT mint addresses encoded with base58 encoding scheme.
Metaplex NFTs are characterized by having associated on-chain metadata account which contains additional information regarding the NFT. Retrieving this data is a non-trivial task since one has to first calculate the PDA of the metadata account, retrieve its data and then decode it. Also, it is not known how fast the metadata downloading is going to be, since retrieving the list of all the NFT mint addresses turned out to be a not-so-quick task.
Mapping Metaplex NFT Mint address to Metadata
Step 1: Derive PDAs for the NFT accounts
The first step is to derive a Program Derived Address (PDA) for each NFT account address. The PDA that is generated from the NFT token accounts is needed to get the on-chain Metaplex NFT metadata. The best way to see how to generate a PDA is in the metaplex source, which you can find here:
You'll need a few things to generate the PDA. The first is an array of "seeds," which includes:
- The Metaplex seed constant:
metadata
- The metadata public key:
metaqbxxUerdq28cj1RbAWkYQm3ybzjb6a8bt518x1s
- The NFT token account mint address
With that seed data, you can perform the findProgramAddress
operation and that will result in a PDA account/address. Both data items 2 and 3 need to be Base58 encoded.
Step 2: Call getAccountInfo for each PDA account
For each PDA account address, you then need to call the getAccountInfo
JSON RPC method:
https://docs.solana.com/developing/clients/jsonrpc-api#getaccountinfo
Step 3: Borsh deserialize the Metaplex meta
For each account info object that you get back from the previous call, you will need to need to Borsh deserialize the data
property in the resulting JSON.
Implementing the mapper
Fortunately, someone already implemented these steps and uploaded it on github. It’s a TypeScript implemetation that does just this, retrieves the Metaplex Metadata JSON from the NFT’s mint address. I won’t go through the details of the implementation here, it’s a simple script that implements the above steps and works correctly, but I encourage you to go through the steps yourself and figure out what’s doing what, it might be insightful 🙂
Batch-downloading the Metadata
We could run the above script for every mint address from the NFTs.pkey
file, but sending that many http
request would be terribly inefficient. A much more optimal way would be to send batch RPC requests.
In order to do that we first need to calculate the batch size, since the payload
size of the RPC request should be less than 51000 bytes. The request payload for a single RPC request looks like this:
{
"jsonrpc": "2.0",
"id": 1,
"method": "getAccountInfo",
"params": [
"C6Le1Rew1Qx6Bh481KpMiRJr31qEScV8JDBpp8z5eoK7",
{"encoding": "base64"}
]
}
The only changing part here is the mint address, the first element of the params
array. Its size can vary from 32 to 44 characters. Since the above example has a key of maximum size, the maximum size of the payload is calculated by first minifying the above JSON, and then adding one to it, to account for the comma we will need to place after every such JSON. So the maximum size is 131 bytes
. It follows that the number of batched requests should be:
After that, we proceed to modify the script to read the mint addresses from the NFTs.pkey
file and to send batch requests. The modified script can be found here.
Performance
The script was tested on previously generated NFTs.pkey
file of size 7.4MB
, which contained 171132
NFT mint addresses. The file is the result of running the scraper for about ~2.5 hrs
.
The script finished in ~5 mins
. That’s 30
times faster than the time it took scraper to scrape the NFTs. The metadata size was about 11.9
times larger than the public key size, so for all the NFTs we currently know about the metadata size would be 5.72 GB
. Not all of the NFTs had an associated metadata account. From all the NFT mint addresses from the test file, ~1200
of them didn’t have metadata, that’s around 0.7%
. And since the full size of the NFTs.pkey
file is around 480 MB
right now, it would take around 5.4 hrs
to download all the on-chain metadata. Not bad, if you ask me, not bad at all 😎