Cardano blockchain data on BigQuery

About BigQuery

According to its official description, BigQuery is a fully managed enterprise data warehouse that helps to manage and analyze data with built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery's serverless architecture allows using SQL queries to get the data you are looking for with zero infrastructure management.

Motivation

Cardano’s on-chain data has considerably grown over the last few months. This means that the time to sync the whole history of the blockchain increases accordingly. Running a node and a DB Sync process (mapping the on-chain data to a relational database) now requires more time and a more robust software instance.

Google BigQuery makes it easier to look up data without the need to run specialized software. Using Google Data Studio, you can also seamlessly create advanced visualizations and dashboards based on the BigQuery data.

Cardano data on BigQuery is organized by epoch numbers. This allows limiting queries to one or several epochs worth of data, which results in a lower cost per query.

There are several things you should note when working with BigQuery:

  1. The data is fetched (updated) from DB Sync every two hours.
  2. The data is only updated to approximately the last 20 blocks before the current block height in DB Sync. This is essential to prevent rollbacks of blocks in the case of chain forks.

Getting started with BigQuery

To get started, go to the IOG data analytics dataset.

Note that to start querying data, you need to have a Google project. In case you don’t have a Google project:

  1. Login to the Google Developer Console
  2. Create a new project and activate the BigQuery API.

Querying the data

You are now all set to work with the dataset:

  1. Open the dataset
  2. Select the table you would like to retrieve data from
  3. Click the ‘query’ option:

You can find the example queries in the query table schemas section below.

Analyzing the data

BigQuery provides seamless integration with Google Data Studio.

To import the data into the Data Studio:

  1. Create a new report
  2. Select the BigQuery option:

You can now create various charts using the Cardano mainnet dataset:

See more examples in this Data Studio report.

Costs

The cost for querying data from BigQuery is paid by an individual querying it. BigQuery charges an amount analogous to the amount of the data being queried.

Usually, the cost is $5 per terabyte (TB) of the data queried. Find more information by visiting the BigQuery pricing page.

Note that Cardano data tables are divided by epoch numbers so that queries can be made according to the epoch of choice, incurring less costs.

Query table schemas

The following tables are available in the dataset:

  1. Table: block_hash
  2. Table: block
  3. Table: epoch_param
  4. Table: tx
  5. Table: tx_hash
  6. Table: tx_in_out
  7. Table: rel_addr_txout
  8. Table: rel_stake_txout

1. Table: block_hash

The block_hash table is clustered by field epoch_no and relates slot_no to block_hash. If the slot_no is NULL, then the block is from the genesis.

Field nameData typeOptions
epoch_noINTEGERREQUIRED
slot_noINTEGERNULLABLE
block_hashSTRINGREQUIRED

An example of searching a block by its hash:

SELECT block.* FROM `iog-data-analytics.cardano_mainnet.block_hash` AS bh
JOIN `iog-data-analytics.cardano_mainnet.block` AS block
ON block.slot_no = bh.slot_no
AND block.epoch_no = bh.epoch_no
WHERE bh.block_hash = '00b6c4bd4024cbf66050fe978f880a7129149be7a1a01310b3e3055053d563b2'
AND bh.epoch_no = 329;

2. Table: block

The block table is clustered by field epoch_no and is usually accessed using epoch_no and slot_no. See the Table: block_hash example above.

Field nameData typeOptions
epoch_noINTEGERREQUIRED
slot_noINTEGERREQUIRED
block_timeDATETIMEREQUIRED
block_sizeINTEGERREQUIRED
tx_countINTEGERREQUIRED
sum_tx_feeINTEGERREQUIRED
script_countINTEGERREQUIRED
sum_script_sizeINTEGERREQUIRED

3. Table: epoch_param

The epoch_param table is very small and can be accessed using epoch_no.

Field nameData typeOptions
epoch_noINTEGERREQUIRED
min_fee_aINTEGERREQUIRED
min_fee_bINTEGERREQUIRED
max_block_sizeINTEGERREQUIRED
max_tx_sizeINTEGERREQUIRED
max_bh_sizeINTEGERREQUIRED
key_depositINTEGERREQUIRED
pool_depositINTEGERREQUIRED
max_epochINTEGERREQUIRED
optimal_pool_countINTEGERREQUIRED
influenceNUMERICREQUIRED
monetary_expand_rateNUMERICREQUIRED
treasury_growth_rateNUMERICREQUIRED
decentralisationNUMERICREQUIRED
entropySTRINGNULLABLE
protocol_majorINTEGERREQUIRED
protocol_minorINTEGERREQUIRED
min_utxo_valueINTEGERREQUIRED
min_pool_costINTEGERREQUIRED
nonceSTRINGNULLABLE
coins_per_utxo_wordINTEGERNULLABLE
cost_model_idINTEGERNULLABLE
price_memNUMERICNULLABLE
price_stepNUMERICNULLABLE
max_tx_ex_memINTEGERNULLABLE
max_tx_ex_stepsINTEGERNULLABLE
max_block_ex_memINTEGERNULLABLE
max_block_ex_stepsINTEGERNULLABLE
max_val_sizeINTEGERNULLABLE
collateral_percentINTEGERNULLABLE
max_collateral_inputsINTEGERNULLABLE

4. Table: tx

The tx table is clustered by field epoch_no and slot_no, and can efficiently be accessed using epoch_no or slot_no.

A transaction is identified by the pair (slot_no, txidx), identifying the block of the transaction and transaction’s index in the block.

Field nameData typeOptions
epoch_noINTEGERREQUIRED
tx_hashSTRINGREQUIRED
block_timeDATETIMEREQUIRED
slot_noINTEGERREQUIRED
txidxINTEGERREQUIRED
out_sumNUMERICREQUIRED
feeNUMERICREQUIRED
depositNUMERICREQUIRED
sizeINTEGERREQUIRED
invalid_beforeDECIMALNULLABLE
invalid_afterDECIMALNULLABLE
valid_scriptBOOLREQUIRED
script_sizeINTEGERREQUIRED
count_inputsINTEGERREQUIRED
count_outputsINTEGERREQUIRED

5. Table: tx_hash

The tx_hash table allows to quickly find the slot number and the transaction index in the block given that the hash has been incorporated. The table is clustered by epoch_no.

Field nameData typeOptions
epoch_noINTEGERREQUIRED
slot_noINTEGERREQUIRED
txidxINTEGERREQUIRED
tx_hashSTRINGREQUIRED

6. Table: tx_in_out

The tx_in_out table is clustered by field epoch_no and slot_no, and can efficiently be accessed using epoch_no or slot_no.

This table contains a list of inputs and outputs to a transaction, which is identified by the pair (slot_no, txidx).

Field nameData typeOptions
epoch_noINTEGERREQUIRED
slot_noINTEGERREQUIRED
txidxINTEGERREQUIRED
inputsSTRINGNULLABLE
outputsSTRINGNULLABLE

Example

See an example of transaction details here.

epoch_no: 329
slot_no: 56785711
txidx: 0

This transaction consumes two inputs:

[
{
"in_slot_no": 56750312,
"in_txidx": 13,
"in_idx": 2
},
{
"in_slot_no": 56609945,
"in_txidx": 1,
"in_idx": 0
}
]

and creates four outputs of which only one remains unspent (as of the time this example has been documented):

[
{
"out_idx": 0,
"out_address": "addr1q9cwvremt6n320s2e3agq0jyq82yhrk3htsu0w426xnz5us70z4w0jgvcdkkynmm8wmds66jd9kusnjfpu6raw5fqp0sr07p5w",
"out_value": 1000000,
"out_ma": null,
"consumed_in_tx": [
{
"in_slot_no": 56865611,
"in_txidx": 6
}
]
},
{
"out_idx": 1,
"out_address": "addr1q894g0fpzjh8aj7mulfpca2wgwz0xznyzvc89y2yly8em75cxj5k9gax0h6lymuvnmf7qhqwu0y0hyxqk2xcfmnva3gqx9fpj7",
"out_value": 2500000,
"out_ma": null,
"consumed_in_tx": null
},
{
"out_idx": 2,
"out_address": "addr1qy66guz87dm2dfy8efdx7gm43e3f5f369avgq9thrm5hsjcphuva29yz3chdrakhlds6r24rvyu2qtqjf4s0v2hx5essdgyk9j",
"out_value": 46500000,
"out_ma": null,
"consumed_in_tx": [
{
"in_slot_no": 56860931,
"in_txidx": 11
}
]
},
{
"out_idx": 3,
"out_address": "addr1q9r7sqenl529jk7ylvht5t4k40t59h32f26k4annkdcnpj4shnq3sq2p4mr0940zqdfmfvgyr7p8zr720zdd3xkuygcqehzxum",
"out_value": 740462105,
"out_ma": [
{
"fingerprint": "asset18ry832pk9yye9d5fszehrd5yh3glkeclu40w23",
"quantity": 1
}
],
"consumed_in_tx": [
{
"in_slot_no": 56871829,
"in_txidx": 35
}
]
}
]

See the output of the transaction on Cardano Explorer.

7. Table: rel_addr_txout

This table is clustered by epoch_no and relates address to outputs in an epoch.

The list of outputs are triples of slot_no, txidx, index, which identify the transaction by the pair of slot_no and txidx and then the output at index in the transaction.

Field nameData typeOptions
epoch_noINTEGERREQUIRED
addressSTRINGREQUIRED
outputsSTRINGREQUIRED

8. Table: rel_stake_txout

This table is clustered by epoch_no and relates staking address to outputs in an epoch.

The list of outputs are triples of slot_no, txidx, and index, which identify the transaction by the pair of slot_no and txidx and then the output at index in the transaction.

Field nameData typeOptions
epoch_noINTEGERREQUIRED
addressSTRINGREQUIRED
outputsSTRINGREQUIRED

Further development

The current release does not include all the blockchain data, particularly the staking and staking rewards details. In its future iterations, this dataset on BigQuery will cover all the Cardano data, and we will be updating this section with more details.

To raise an issue or share your feedback, please reach out to data-analytics@iohk.io.

© IOHK 2015 - 2022


Cardano Logo

Cardano is an open-source project.

Cardano is a software platform ONLY and does not conduct any independent diligence on, or substantive review of, any blockchain asset, digital currency, cryptocurrency or associated funds. You are fully and solely responsible for evaluating your investments, for determining whether you will exchange blockchain assets based on your own judgement, and for all your decisions as to whether to exchange blockchain assets with Cardano. In many cases, blockchain assets you exchange on the basis of your research may not increase in value, and may decrease in value. Similarly, blockchain assets you exchange on the basis of your research may fall or rise in value after your exchange.

Past performance is not indicative of future results. Any investment in blockchain assets involves the risk of loss of part or all of your investment. The value of the blockchain assets you exchange is subject to market and other investment risks