Discussion on the Principles and Technical Details of the Ordinal Inscription Protocol

In the past two weeks, while researching the BTC ecosystem and various inscription projects, I found that very few articles clearly introduce the principles and technical details: for example, how transactions are initiated during the minting of inscriptions, how the sats in UTXO are tracked, where the inscribed content is placed in the script, and why BRC20 requires two operations during transfers? I realized that without understanding these technical details, it is difficult to grasp the differences between various protocols like BRC20, BRC420, atomicals, stamps, and runes. This article will delve into the basic knowledge of the BTC blockchain and attempt to answer the above questions.

BTC Block Structure#

The essence of blockchain is a multi-user accounting technology, which, in computer science terms, is a distributed database. Records (accounts) for a specific period form a block, which is then expanded chronologically.

We created a table in Excel to illustrate how blockchain works. An Excel file represents a blockchain, where each individual table represents a block, ordered chronologically from 560331 to the latest 560336. Block 560336 will package the most recent transactions. The main part of the block is the double-entry bookkeeping method most commonly seen in accounting: one side records the address as debit (inputs from), and the other side records the address as credit (outputs to). Value corresponds to the BTC amount for the respective address. The number of coins in Inputs will be greater than the number of coins in Outputs, with the difference being the transaction fee at the user level, which is also the fee earned by miners (accountants). The block header retrieves the height of the previous block, the hash of the previous block, the creation time of the current block (timestamp), and a random number. So, as a decentralized accounting technology, who gets to claim the accounting rights for the next block? It relies on this random number and the corresponding hash value. Miners with computational power hash the random number of the current block, and the first miner to obtain a valid hash value gets the accounting rights for the next block and wins the block reward and transaction fees. Finally, there is the script area, which can be used for some extended applications, such as the op_return script, which can serve as a memo field. It is important to note that in actual blocks, the script area is attached to the input and output information, rather than being a separate area. For example, the script attached to the input is the unlocking script (ScriptSig), which requires a wallet address for private key signature authorization to allow the transfer, while the script attached to the output is the locking script (ScriptPubKey), which sets the unlocking conditions for receiving that BTC (generally, the condition is "only the person with the corresponding private key can spend").

Snip20240129_3

Snip20240129_4

The above two images show the original data structure tables for input and output. At the execution level, the script acts as an accompanying parameter for transaction information, where the unlocking script (ScriptSig) requires private key authorization and is also referred to as "witness data."

Segregated Witness and Taproot#

Although the Bitcoin network has been running for over 10 years without any significant events, there have been multiple instances where transaction costs soared to unfeasible heights. Therefore, Bitcoin developers have been discussing the best ways to scale the network to handle the growing transaction volume in the future.

In 2017, this debate reached a climax, splitting the Bitcoin development community into two factions: one supporting the implementation of a feature called SegWit through a soft fork, and the other advocating for direct block size increases, known as the "big block" faction.

We mentioned earlier that the unlocking script requires private key authorization to generate "witness data." So, can we separate this witness data from the block, thereby indirectly increasing the number of transactions each block can accommodate? Segregated Witness (SegWit) was officially activated in August 2017. Its implementation divides all transaction data into two parts: one part is the basic transaction information (Transaction Data), and the other part is the signature information (Witness Data), storing the signature information in a new data structure called "witness," which is transmitted separately from the original transaction.

Snip20240129_5

Technically, the implementation of SegWit means that transactions no longer need to include witness data (which would occupy the originally allocated 1MB space for blocks). Instead, an additional independent space is created at the end of a block for witness data. It supports arbitrary data transfers and has a discounted "block weight," cleverly keeping a large amount of data within the Bitcoin block size limit to avoid the need for a hard fork. Thus, the transaction data size limit for Bitcoin transactions increased, while the transaction fees for signature data decreased. Before the SegWit upgrade, the capacity limit for Bitcoin was 1 MB, while after SegWit, although the pure transaction capacity limit remained 1 MB, the size of the segregated witness space reached 4 MB.

Taproot was implemented in November 2021 and consists of three different Bitcoin Improvement Proposals (BIPs), including: Taproot, Tapscript, and a new digital signature scheme called "Schnorr signatures." Taproot aims to bring numerous benefits to Bitcoin users, such as enhancing transaction privacy and reducing transaction fees. It will also enable Bitcoin to execute more complex transactions, thereby broadening application scenarios (adding some new opcodes).

These updates are key drivers for Ordinals NFTs, which store NFT data in the spent script of the Taproot script path (witness data space). This upgrade made it easier to structure and store arbitrary witness data, laying the foundation for the "ord" standard. With relaxed data requirements, suppose a transaction can fill an entire block with its transaction and witness data—reaching the 4MB block size (witness data space) limit—greatly expanding the types of media that can be placed on-chain.

Some may ask, since we can put some strings in the script, are there no restrictions on these strings? What if these scripts are executed? If we randomly place content, could there be error codes that reject block creation? This brings us to the OP_FALSE instruction. OP_FALSE (also represented as "0" in Bitcoin scripts) ensures that the execution path in the script language never enters the OP_IF branch and remains unexecuted. It acts as a placeholder or no-operation (No Operation) in the script, similar to "comments" in high-level languages, ensuring that subsequent code is not executed.

OP_FALSE

UTXO Transfer Model#

The above discussion has approached the basic principles of BTC from a computer data structure perspective. Now, let's discuss the UTXO model from a financial model perspective.

UTXO stands for Unspent Transaction Outputs, which can be understood as the funds remaining unspent during a transfer. So why does Bitcoin use this concept? This relates to the accounting methods of account transaction models and account balance models.

Having been in a centralized system for too long, we are very accustomed to the account balance model of bookkeeping. When User A transfers 100 yuan to User B, the bank first checks if User A has 100 yuan in their account. If so, it deducts 100 yuan from User A's account and adds 100 yuan to User B's account, completing the transfer.

However, Bitcoin's accounting algorithm does not have the concept of balance. The distributed ledger on the blockchain only records individual transactions and does not directly record the current balance of an account (recording balances generally requires dedicated server nodes, which would centralize it). Suppose User A currently has a balance of 1000 yuan. If User A transfers 100 yuan to User B, this transfer will be recorded as:

Transaction 1: User A transfers 100 yuan to User B

Transaction 2: User A transfers 900 yuan to themselves (UTXO)

Snip20240129_6

Although Transaction 2 is a transaction, functionally it serves the purpose of representing the account balance, indicating that after completing the 100 yuan transfer, User A still has 900 yuan left in their account.

So why create such a UTXO? Because the BTC blockchain can only record transactions and cannot record account balances. Without this UTXO, calculating the balance would require summing all incoming and outgoing transactions for an account, which is very time-consuming and resource-intensive. The emergence of UTXO cleverly avoids the pain point of having to backtrack through all transactions when calculating balances.

UTXO has a characteristic: like coins, it cannot be split. So how do we gather enough input amounts during transactions, and how do we provide change? We can use coins as an analogy (in fact, every time you see the word UTXO, it’s better to automatically translate it as "coin").

Xiaoming transfers 1 Bitcoin to Xiaogang. The entire process is as follows: Xiaoming needs to collect enough inputs. For example, in previous transactions corresponding to Xiaoming's address, he finds a UTXO with a value of 0.9. This is not enough for 1 Bitcoin, but since multiple inputs are allowed in a transaction, Xiaoming also finds a UTXO with a value of 0.2. Thus, in this transfer transaction, there will be two inputs. At the same time, there will also be two outputs: one pointing to Xiaogang's address with a value of 1 Bitcoin, and the other pointing back to Xiaoming's address with a value of 0.1 Bitcoin, which is the change (this example ignores gas fees).

In other words, Xiaoming has two coins in his pocket, one worth 0.9 and the other worth 0.2. At this point, if Xiaoming needs to pay a coin worth 1, he must hand both coins to Xiaogang, who will then give Xiaoming 0.1 as change. Thus, the essence of this accounting model is to avoid "calculating balances" through the action of "giving change."

Ordinal Protocol's Ordering System#

The Ordinal protocol can be said to be the source of the recent explosion in the BTC ecosystem, breaking down homogeneous BTC into the smallest unit, sat, and then assigning a serial number to each sat. How is this done?

We know that the total amount of BTC is 21 million coins, and one BTC can be split into 100 million parts (sat), so the smallest unit of BTC is sat. Whether BTC or the smallest unit sat, they are typical homogeneous tokens (FT). We will now try to assign a serial number (ordinal) to these sats.

Earlier, when discussing the block data structure, we mentioned that transaction information needs to specify the input address and amount, as well as the output address and amount. Each block contains two parts of transactions: BTC block rewards and transaction fees. Fee transactions must have inputs and outputs, but block rewards are BTC generated out of thin air, with no input address, so the "input from" field is blank, also known as "coinbase transactions." The total amount of 21 million BTC comes from this coinbase transaction, which is also the first in the list of transactions in all blocks.

The Ordinal protocol stipulates the following:

Numbering: Each sat is numbered in the order it was mined.
Transfer: According to the first-in-first-out rule, from transaction inputs to outputs.

The first rule is relatively simple; it determines that numbering can only be generated from the coinbase transactions of mining rewards. For example, if the mining reward for the first block is 50 BTC, then the first block will allocate sats in the range of [0;1;2;...;4,999,999,999]; if the second block also has a mining reward of 50 BTC, then the second block will allocate sats in the range of [5,000,000,000;5,000,000,001;...;9,999,999,999].

Snip20240129_7

The more difficult part to understand is that since UTXO actually contains many satoshis, each sat in this UTXO looks the same. How do we sort them? This is actually determined by the second rule. Let’s use a simple example:

Assuming that the smallest divisible unit of BTC is 1, a total of 10 blocks are produced, with each block's mining reward being 10 BTC, resulting in a total of 100 BTC. We can directly assign a serial number (0-99) to these 100 BTC. If there are no transactions, we only know that the first block's 10 BTC are numbered (0-9), the second block's 10 BTC are numbered (10-19), and so on, until the tenth block's 10 BTC are numbered (90-99). Since there are no expenditures, there are no outputs, so we can only assign a range of numbers to every 10 BTC.

Suppose in the second block, two expenditures (outputs) are added: one is 3 BTC, and the other is the "change" of 7 BTC, corresponding to transferring 3 BTC to someone else and giving 7 BTC back to oneself. At this point, in the transaction list of the block, suppose the 7 BTC given back to oneself ranks first (corresponding to the numbers 10-16), and the 3 BTC to others ranks second (corresponding to the numbers 17-19). This confirms the ordered set of sats contained in a certain UTXO through the transfer of outputs.

Note that each sat is not a UTXO! Since UTXO is the smallest indivisible transaction unit, sats can only exist within UTXO, and UTXO contains a certain range of sats, and new outputs can only be generated by spending a certain UTXO to number the sats.

As for how to express this "numbering," Ordinals supports various forms, such as the aforementioned "integer method," as well as decimal methods, degree methods, percentage methods, and pure letter naming methods.

Snip20240129_8

Once sats have a unified serial number, we can consider inscriptions. As mentioned earlier, we can upload any type of file in the 4MB space of the witness data area, whether text, images, or videos. After uploading, the file will automatically be converted to hexadecimal and stored in the Taproot script area. Thus, one UTXO corresponds to one Taproot script area, and this one UTXO will simultaneously contain many sats (the overall is a collection of sat sequences, with a minimum Bitcoin amount of 546 satoshis in a single UTXO to prevent dust attacks). The Ordinal protocol specifies that "the first sat number in this sequence collection is used to represent the binding relationship" (the original wording from the white paper is the number of the first sat in the first output), for example, a UTXO containing sats numbered 17-19 will directly use 17 to represent this collection and the binding of the inscribed content.

Minting and Transferring Ordinal Assets#

Ordinal NFTs clearly involve uploading various files to the script in the segregated witness area and binding them to a collection of sats, thereby issuing NFT assets on the BTC chain. However, there is another question: since the script in the segregated witness area contains both the unlocking script for inputs and the locking script for outputs, where is the content placed? The correct answer is both. Here, we must mention the commit-reveal mechanism in blockchain technology.

The commit-reveal mechanism in blockchain is a protocol used to ensure fair and transparent handling of information. This mechanism is typically used in scenarios where hidden information (such as votes or bids) needs to be submitted and then revealed at a later time. The commit-reveal mechanism consists of two phases: the commit phase and the reveal phase.

Commit Phase: In this phase, users submit their information (such as voting choices or bid prices), but this information is encrypted. Typically, users generate a hash of this information (i.e., an encrypted summary of the information) and send this hash to the blockchain. Due to the properties of hash functions, they can produce a unique output (hash value) that is irreversible for the original information. This means that the original information cannot be inferred from the hash value. This process ensures the confidentiality of the information at the time of submission.
Reveal Phase: At a predetermined later time, users must reveal their original information and prove that it matches the previously submitted hash value. This is usually done by submitting the original information along with any additional data used to generate the hash value (such as a random number or "salt"). The network then verifies whether the hash value of the original information matches the previously submitted hash value. If they match, the original information is accepted as valid.

We previously discussed that the content of the inscription needs to be bound to the collection of sats contained in the UTXO. Since UTXO is an output in the block, it must be attached to the locking script of the output. However, BTC full nodes need to maintain and transmit the entire network's UTXO collection locally. Imagine if 10,000 4MB video files were directly uploaded to the locking scripts of 10,000 UTXOs; all full nodes would require extremely high storage space and fast internet speed, which could cause the entire chain to collapse. Therefore, the only solution is to place the content in the unlocking script of the input and then have this content "point" to another output.

Thus, the minting of Ordinal assets needs to be divided into two steps (wallets combine these two steps; when constructing transactions, they simultaneously construct the commit-reveal parent-child transaction, so the user experience feels like only one step and saves gas fees).

During the minting phase, users first need to upload the hash value of a certain file to the locking script of the UTXO in the commit transaction (transferring from their address A to their address B), because it is a hash value, it does not occupy too much space in the full node's UTXO database. Next, users construct a new transaction (transferring from their address B back to their address A), called the reveal transaction. At this point, the input must use the UTXO from the previous commit transaction containing the file hash value, and the unlocking script of this input must include the original inscribed file. To quote the white paper, "First, in the commit, create a submission to the taproot output containing the inscription content. Second, in the reveal transaction, use the output generated by the commit transaction to display the inscription content on-chain."

In the transfer phase, Ordinal NFTs differ slightly from BRC20. Ordinal NFTs require a direct transfer of the NFT bound to a certain UTXO to the recipient, similar to a regular BTC transfer. However, BRC20 involves custom amount transfers, which are also divided into two steps: the first step is to inscribe the "transaction" (Inscribe "TRANSFER"), and the second step is the transfer "transaction" (Transfer "TRANSFER"). The first inscribing transaction is actually similar to the minting process of an Ordinal NFT, implicitly containing the commit-reveal parent-child transaction pair. The second transfer transaction is similar to a regular transfer of an Ordinal NFT, directly transferring the BRC20 asset bound to a certain UTXO to the recipient. Some wallets will construct these three transactions (parent-child-grandchild transactions) simultaneously to save time and gas.

Snip20240130_9

In summary, the commit transaction is used to bind the inscribed content (the hash value of the original content) to the numbered sats (UTXO), while the reveal transaction is used to display the content (the original content). This parent-child transaction pair jointly completes the minting of the NFT.

P2TR and an Example#

The technical discussion about minting is not over yet, as some may wonder how the reveal transaction verifies the inscription information in the commit transaction. Why is it necessary to transfer between one's own addresses A and B when constructing the transaction? There is no need to prepare two wallets when inscribing. This brings us to one of the significant upgrades of Taproot, P2TR.

P2TR (Pay-to-Taproot) is a new type of Bitcoin transaction introduced by the Taproot upgrade. P2TR transactions allow users to spend Bitcoin using a single public key or more complex scripts (such as multi-signature wallets or smart contracts), achieving higher privacy and flexibility. This is accomplished through the use of Merkleized Abstract Syntax Trees (MAST) and Schnorr signatures, which enable the efficient encoding of multiple spending conditions within a single transaction.

Creating Spending Conditions
To create a P2TR transaction, users first define a spending condition, such as a single public key or a more complex script, specifying the requirements for spending Bitcoin (e.g., multi-signature wallets or smart contracts).
Generating Taproot Output
Next, users generate a Taproot output that includes a single public key (representing the spending condition). This public key is derived from a combination of the user's public key and the hash of the script using a process called "tweaking." This ensures that the output looks like a standard public key, making it difficult to distinguish from other transactions on the blockchain.
Spending Bitcoin
When users want to spend Bitcoin, they can use their single public key (if the spending condition is met) or reveal the original script and provide the necessary signatures or data to meet the spending condition. This is accomplished using Tapscript, which allows for more efficient and flexible execution of spending conditions.
Verifying Transactions
Miners and nodes then verify the transaction by checking the provided Schnorr signatures and data against the spending conditions. If the conditions are met, the transaction is considered valid, and the Bitcoin can be spent.
Enhanced Privacy and Flexibility
Because P2TR transactions only reveal the necessary spending conditions when spending Bitcoin, they maintain a high level of privacy. Additionally, the use of MAST and Schnorr signatures allows for the efficient encoding of multiple spending conditions, enabling more complex and flexible transactions without increasing the overall size of the transaction.

This is how the commit-reveal mechanism is applied in P2TR, and we will illustrate it with a practical example.

Using the blockchain explorer https://www.blockchain.com/, we will examine the minting process of an Ordinal image NFT, including the previous commit-reveal two phases.

First, we see that the hash ID of the commit transaction is (2ddf90ddf7c929c8038888fc2b7591fb999c3ba3c3c7b49d54d01f8db4af585c). It is noteworthy that this transaction's output does not contain the inscription data (it actually contains the hash value of the 16-megabyte image file), and there is no related inscription information on the webpage. The output address (bc1p4mtc.....) is actually a temporary address generated through the "tweaking" process (representing the public key of the script unlocking condition), sharing a private key with the taproot main address (bc1pg2mp...). The second UTXO in this transaction belongs to the returned "change" operation. Thus, the binding of the inscription content to the sats contained in the first UTXO is achieved.

Snip20240131_12

Next, we check the records of the reveal transaction, whose hash ID is (e7454db518ca3910d2f17f41c7b215d6cba00f29bd186ae77d4fcd7f0ba7c0e1). Here, we can see the information of the Ordinals inscription. The input address of this transaction is the temporary output address generated from the previous transaction (bc1p4mtc.....), and the unlocking script of the input contains the original image's hexadecimal file, while the output of 0.00000546 BTC (546 satoshis) sends this NFT to the user's taproot main address (bc1pg2mp...). Based on the First in First Out principle and the "binding is the first output's first sat number," although the number of sats contained in the two UTXOs changes, the bound sat number remains unchanged. Therefore, we can find the sat where this inscription resides (sat 1893640468329373).

(https://ordinals.com/sat/1893640468329373)

Snip20240131_13

These two transactions (belonging to the parent-child transaction) are submitted to the memory pool simultaneously by the wallet during minting, so only one gas fee is required, and there is a high probability that they will be recorded and broadcast by miners in the same block (the two transactions in the above example indeed exist simultaneously in block 790468). Miners and nodes then verify the reveal transaction by checking the Schnorr signatures provided in the input and the hexadecimal image hash value against the output locking script's hexadecimal image hash value in the commit transaction. If both match, the transaction is considered valid, and this Bitcoin UTXO can be spent, thus permanently recording these two transactions in the BTC blockchain database, and the NFT image is naturally preserved and displayed. If the two hash values differ, the two transactions will be canceled, and the inscription will fail.

BRC20 Protocol and Indexers#

For the Ordinal protocol, inscribing a piece of text results in a text NFT (corresponding to Loot on Ethereum), inscribing an image results in an image NFT (corresponding to PFP on Ethereum), and inscribing a piece of music results in an audio NFT. But what if we inscribe a piece of code, and this code is for "issuing FT homogeneous tokens"?

BRC20 utilizes the Ordinal protocol to set inscriptions (inscriptions) as JSON data formats to deploy, mint, and transfer tokens. The JSON contains some code snippets describing various attributes of the token, such as its supply, maximum minting units, and unique code. In the previous article, we discussed that BRC20 tokens are essentially semi-homogeneous tokens (SFT), meaning that in some cases they can be treated as NFT transactions, while in others as FT transactions. How is this control over "different situations" achieved? The answer lies in the indexer.

The indexer is essentially an accountant that categorizes and records the received information in a database. In the Ordinal protocol, the indexer determines the changes of ordered sats across different addresses by tracking inputs and outputs. In the BRC-20 protocol, the indexer has an additional function: recording the changes in token balances in different addresses from the inscriptions.

Thus, we can view the different forms of token existence from the accountant's perspective: BRC20 protocol tokens actually exist in a three-layer database. The first layer (Layer 1) has Bitcoin miners as accountants, with a "chain database" type, producing BTC as FT assets. The second layer (Layer 2) has Ordinal indexers as accountants, with a "relational database" type, producing numbered sats as NFT assets. The third layer (Layer 3) has BRC20 indexers as accountants, with a "relational database" type, producing BRC20 assets as FT assets. When we consider BRC20 in terms of "sheets," the perspective is that of the ordinal indexer (recorded by that indexer), making it naturally an NFT; when we consider BRC20 in terms of "individuals" (especially after being deposited into centralized exchanges), the perspective is that of the BRC20 indexer (recorded by that indexer or the centralized exchange's server), making it naturally an FT. Thus, we can conclude that the existence of semi-homogeneous tokens (SFT) is due to the different levels of accountants involved.

Isn't blockchain a distributed database? Hence, there is a group of miners as accountants to jointly maintain this "chain database" (because only a chain database can achieve true decentralization). But in the end, we still return to the old path of centralized "relational databases." This is also the essential reason why the initiators of the Ordinal protocol, the BRC20 protocol, and the Unisat wallet have been in heated debates over whether to upgrade the indexer—disagreement among accountants.

However, after more than a decade of industry development, a considerable amount of "decentralization" experience has been accumulated. Can the indexer replace the relational database with a "chain database"? Can fraud proofs or ZKP be used to ensure security and decentralization? Will the DA demand of the Bitcoin ecosystem overflow into other DAs, thereby promoting the prosperity and integration of a multi-chain ecosystem? I seem to see more possibilities.

This article is authored by @hicaptainz
Follow the author, and navigate the web3 without getting lost.

References

https://www.aixinzhijie.com/books/261/master_bitcoin/_book/

https://learnblockchain.cn/article/5717

https://zhuanlan.zhihu.com/p/361854961

https://www.odaily.news/post/5187233

https://learnblockchain.cn/article/5376

https://www.panewslab.com/zh/articledetails/1301r1ibp79c.html

https://docs.ordinals.com/inscriptions.html

https://thebitcoinmanual.com/articles/pay-to-taproot-p2tr/