Cardano blockchain data on BigQuery
Cardano on-chain data on BigQuery
A dedicated site with BigQuery documentation and example queries can be found here.
According to its official description, BigQuery is a fully managed enterprise data warehouse that helps to manage and analyze data with built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery's serverless architecture allows using SQL queries to get the data you are looking for with zero infrastructure management.
Cardano’s on-chain data has considerably grown over the last few months. This means that the time to sync the whole history of the blockchain increases accordingly. Running a node and a DB Sync process (mapping the on-chain data to a relational database) now requires more time and a more robust software instance.
Google BigQuery makes it easier to look up data without the need to run specialized software. Using Google Data Studio, you can also seamlessly create advanced visualizations and dashboards based on the BigQuery data.
Cardano data on BigQuery is organized by epoch numbers. This allows limiting queries to one or several epochs worth of data, which results in a lower cost per query.
There are several things you should note when working with BigQuery:
- The data is fetched (updated) from DB Sync every two hours.
- The data is only updated to approximately the last 20 blocks before the current block height in DB Sync. This is essential to prevent rollbacks of blocks in the case of chain forks.
Getting started with BigQuery
To get started, go to the IOG data analytics dataset.
Note that to start querying data, you need to have a Google project. In case you don’t have a Google project:
- Login to the Google Developer Console
- Create a new project and activate the BigQuery API.
NB. If you don't query the dataset from your own project you'll be getting the error: "Access Denied: Project iog-data-analytics: User does not have bigquery.jobs.create permission in project iog-data-analytics."
Querying the data
You are now all set to work with the dataset:
- Open the dataset
- Select the table you would like to retrieve data from
- Click the ‘query’ option:
You can find the example queries in the query table schemas section below.
Analyzing the data
BigQuery provides seamless integration with Google Data Studio.
To import the data into the Data Studio:
- Create a new report
- Select the BigQuery option:
You can now create various charts using the Cardano mainnet dataset:
See more examples in this Data Studio report.
The cost for querying data from BigQuery is paid by an individual querying it. BigQuery charges an amount analogous to the amount of the data being queried.
Usually, the cost is \$5 per terabyte (TB) of the data queried. Find more information by visiting the BigQuery pricing page.
Note that Cardano data tables are divided by epoch numbers so that queries can be made according to the epoch of choice, incurring less costs.
Data update and validation process
The data in the BigQuery ‘cardano-mainnet’ project is currently updated every two hours including the newest ‘db-sync’ blocks. The overview of block heights in BigQuery vs. PostgreSQL is also updated every two hours. It is expected that the data in BigQuery is not at the same block height as in PostgreSQL from ‘db-sync’, however, the data should not deviate too much.
The data of the past full epoch between ‘db-sync’ and BigQuery is validated on every epoch boundary. For this, we compute the data hashes in canonical order in both databases and compare them.
This report shows the results of data validation for all tables.
Query table schemas
The wiki also contains more information about table schemas.
The current release includes almost all of the blockchain data from db_sync version 13. In its future iterations, this dataset on BigQuery will cover even more Cardano data, and we will be updating this section with more details.
To raise an issue or share your feedback, please reach out to email@example.com.