Bag of cells
Arbitrary data are represented in TON Blockchain by trees of cells.
Such a tree of cells is transformed into a DAG of cells by identifying cells in the tree that have the same hash.
After that, each of the references of each cell might be replaced by the 32-byte representation hash of the cell referred to. Thus a bag of cells (BoC) is obtained.
šµ Roots: [0x1111]
0x1111
Refs: [0x2222, 0x3333]
0x2222
Refs: [0x4444]
0x3333
Refs: [0x4444]
0x4444
Refs: []In general, a BoC can be obtained from several trees of cells, thus forming a forest.
By convention, the roots of the original trees of cells are marked elements of the resulting bag of cells, so that anybody receiving this bag of cells and knowing the marked elements can reconstruct the original forest. However, this BoC needs to be serialized into a file, suitable for disk storage or network transfer.
šµ Roots: [0x1111, 0x2222]
0x1111
Refs: [0x3333]
0x2222
Refs: [0x3333]
0x3333
Refs: []There may be many different ways to serialize such a data structure, each of which has its own goals and is convenient for specific cases. This page provides a general serialization algorithm and specification of the corresponding TL-B schemes, followed by the example and specific implementation used in the TON Blockchain.
Even though the syntax looks very much like TL-B, it cannot be used in most of the TL-B tooling. Unlike in real TL-B, these schemas serialize to a bitstring with no 1023 bit length limit, and without any refs.
General scheme
Internal references, absent cells, and complete BoCs
For an arbitrary cell c in a given BoC, references to it can be either:
- internal if the cell corresponding to the reference is also represented in BoC,
- external if it's not in BoC. Such cell
cis called absent from this BoC.
A BoC is called complete if it does not contain any external references. Most real-world BoCs are complete.
Outline of serialization process
This paragraphs provide a textual description of the BoC serialization process. The specific implementation of the serialization and TL-B schemes is left to the choice of developers.
For a specific example of TL-B schema and pseudocode of related cell serialization, see TL-B schema.
The serialization process of a BoC B consisting of n cells can be outlined as follows.
- List the cells from B in a topological order:
c1, ..., cn(withc1, ..., ckas root cells, ifBis a forest). - Choose the smallest number of bytes
sthat can contain the binary representation ofn. Serialize each cellciin a way similar to standard representation algorithm, with exceptions:d1 = r + 8s + 16h + 32lwhereh = 1if the cell's hashes are explicitly included into the serialization; otherwise,h = 0(whenr = 7,hmust be1);- if
h = 1, after bytesb1andb2the serialization is continued byl + 132-byte higher hashes ofc; - unsigned big-endian s-bit integer
jused instead of hashHash(cj)to represent internal references to cellcj.
- Concatenate the representations of cells
cithus obtained in the increasing order ofi. - Optionally, an index can be constructed that consists of
nt-bytes integer entriesL_{1}, \ldots, L_{n}where:L_{i}is the total length in bytes of the representations of cellscjwithj ⤠i;tis the smallest number of bytes that can contain the binary representation ofL_{n}.
- An optional CRC32C may be appended to the serialization for integrity verification purposes.
If the index is included, any cell ci the serialized bag of cells may be easily accessed by its index i without deserializing all other cells, or even without loading the entire serialized bag of cells in memory.
A final serialization of the bag of cells must include a magic number indicating the precise format of the serialization, followed by integers s, t, n, an optional index consisting of n * t bytes, L_n bytes with the cell representations, and an optional CRC32C checksum. Each specific implementation of the serialization process must comply with these fields and their order but, for example, may take into account number of roots, number of absent cells, and so one.
Serialization
Beware this is not an actual TL-B schema. TL-B describes serialization to cells, i.e. bits and refs, with the limit of 1023 bits per cell. This serialization describes serialization into a bitstring of arbitrary length without any refs, even though it uses syntax similar to TL-B.
Only one serialization scheme of BoCs is used in TON Blockchain (there are also two outdated BoC serialization schemes in the file):
serialized_boc#b5ee9c72 has_idx:(## 1) has_crc32c:(## 1)
has_cache_bits:(## 1) flags:(## 2) { flags = 0 }
size:(## 3) { size <= 4 }
off_bytes:(## 8) { off_bytes <= 8 }
cells:(##(size * 8))
roots:(##(size * 8)) { roots >= 1 }
absent:(##(size * 8)) { roots + absent <= cells }
tot_cells_size:(##(off_bytes * 8))
root_list:(roots * ##(size * 8))
index:has_idx?(cells * ##(off_bytes * 8))
cell_data:(tot_cells_size * [ uint8 ])
crc32c:has_crc32c?uint32
= BagOfCells;Fields size is s, off_bytes is t, cells is n, tot_cells_size is L_n (the total size of the serialization of all cells in bytes), index is the optional index L_{1}, \ldots, L_{n}, cell_data is the concatenation of the cells representations, and crc32c is the optional 4-bytes CRC32C checksum.
This schema additionally includes:
- the 1-bit
has_idxflag that indicates whether the index is included in the serialization; - the 1-bit
has_crc32cflag that indicates whether the CRC32C checksum is included in the serialization; - the 1-bit
has_cache_bitsand 2-bitflagsfields that are reserved for future use (flagsmust be zero); - the
rootsfield that indicates the number of root cells in the BoC; - the
absentfield that indicates the number of absent cells in the BoC; - the
root_listfield that is an indices sequence of the root cells in the BoC.
Example: manual
Consider the following example of a tree of cells:
01
0aaaaa
fe
0aaaaaSo, there is a 2-bit root cell that references two other cells:
- The first is a 24-bit cell.
- The second is a 8-bit cell that itself references a 24-bit cell.
After identifying of unique cells, we have the following:
01
0aaaaa
feNext, the unique cells are arranged in a topological order:
01 -> index 0 (root cell)
fe -> index 1
0aaaaa -> index 2Now, let's calculate the descriptor bytes b1 and b2 for each of the three unique cells.
So, we obtain:
01 -> 0201
fe -> 0102
0aaaaa -> 0006Then the data bits are serialized as \lceil\frac{b}{8}\rceil bytes. Remember, if b is not a multiple of eight, a binary 1 and up to six binary 0s are appended to the data bits. After that, the data is split into \lceil\frac{b}{8}\rceil 8-bit groups.
01 -> 01100000 = 0x60
fe -> do not change (full groups)
0aaaaa -> do not change (full groups)Next come the depths for the refs in two bytes:
01 -> 0002
fe -> 0001
0aaaaa -> 0000Now specify which cells each one references:
0: 01 -> 0201: refers to 2 cells with such indexes
1: fe -> 02: refers to cells with index 2
2: 0aaaaa -> no refsFor each cell we have its hexadecimal representation:
01 -> 02016000020201
fe -> 0102fe000102
0aaaaa -> 00060aaaaa0000Finally, we concatenate all parts into a single hexadecimal array:
0x020160000202010102fe00010200060aaaaa0000.
Now that we've serialized our cells into a flat 20-byte array, it's time to pack them into a complete BoC format.
0xb5ee9c72 -> TL-B id of the BoC structure
0b1 -> has indexes
0b0 -> does not have CRC32C
0b0 -> does not have cache bits
0b00 -> flags are 0
0b001 -> the number of bytes needed to store the number of cells is 1
0b00000001 -> the number of bytes used to represent offset of a serialization is 1
0b00000011 -> the number of cells is 3
0b00000001 -> the number of roots is 1
0b00000000 -> the number of absent cells is 0
0b00010100 -> tot_cells_size is 20 bytes
0b00000000 -> the root list; we have one root with number 0 after the topological sort
0b000001110000111000010100 -> the three 8-bits group of indexes for cells accorging to the topological sort
0x020160000202010102fe00010200060aaaaa0000 -> the cells data
_ -> CRC32C is not serializedBy combining everything into a single bit string, we get the result of serialization.
Example: TypeScript
According to the TL-B scheme above there is the SDK for serialization and parsing BoC.
Only serialization of BoCs with one root and no absent cells is supported. There are two main functions:
serializeBocfor serialization. It has two parameters:rootand options object with two boolean flags:idxandcrc32. They indicate whether indexes and CRC32C will be included in serialization. The output is a Buffer with serialization.deserializeBocfor parsing. It has one parameter:src, a Buffer that contains a serialized BoC. The output is a roots list of a given BoC.
import { beginCell, serializeBoc, deserializeBoc } from "@ton/core";
const innerCell = beginCell()
.storeUint(456, 16)
.endCell();
const rootCell = beginCell()
.storeUint(0, 64)
.storeRef(innerCell)
.endCell();
const boc = serializeBoc(
rootCell,
{
// do not include index
idx: false,
// do not include CRC-32
crc32: false,
},
);
const decodedRootCell = deserializeBoc(boc)[0];Alternatively, use methods of Cell:
import { Cell } from "@ton/core";
const hex = 'b5ee9c72410106010082000114ff00f4a413f4bcf2c80b01...';
// deserialze
const cell = Cell.fromBoc(Buffer.from(hex, 'hex'))[0];
// serialize
const hexBack = cell.toBoc().toString('hex');Last updated on