Blockchain Data in Cloud

Bitquery provides ready-to-use blockchain data dumps in Parquet format via popular cloud providers such as AWS S3, Google Cloud Storage, Snowflake, and BigQuery.
You can plug these datasets directly into your existing analytics stack and build custom data pipelines without running your own blockchain infrastructure or maintaining complex indexing systems.

Overview

Our cloud data export service delivers production-ready blockchain datasets optimized for large-scale analytics, historical backfills, and data lake integrations. All data is provided in Apache Parquet format, ensuring optimal compression, columnar storage, and compatibility with modern analytics engines. We can also provide other file formats if required.

Key Benefits

No Infrastructure Management – Skip running blockchain nodes, indexers, or data processing infrastructure
Production-Ready Format – Parquet files optimized for analytics workloads
Cloud-Native – Direct integration with AWS S3, Google Cloud Storage, Snowflake, and BigQuery
Historical Coverage – Complete blockchain history from genesis blocks
Multi-Chain Support – Access data from major blockchain networks
Cost-Effective – Pay only for the data you need, when you need it
Scalable – Handle petabytes of blockchain data with ease

Available Blockchain Data Dumps

Bitquery provides comprehensive cloud data dumps for the following blockchains:

EVM Chains Data Export

Export blockchain data for Ethereum, BSC, Base, Polygon/Matic, Optimism, Arbitrum, and other EVM-compatible chains. Includes:

Blocks – Block-level metadata and timestamps
Transactions – Full transaction-level data with gas information
Transfers – Native token and ERC-20 token transfers
Balance Updates – Account balance changes per block
DEX Trades – Decentralized exchange trading data
DEX Pools – Liquidity pool metadata and state
Smart Contract Calls – Function calls and interactions
Events – Ethereum event logs and emissions
Miner Rewards – Block rewards and transaction fees
Uncle Blocks – Ethereum uncle block data

Use Cases: DeFi analytics, NFT tracking, smart contract analysis, token holder analysis, DEX volume analysis, cross-chain analytics.

Solana Blockchain Data Export

Export Solana blockchain data including slot-level blocks, transactions, transfers, and DEX activity:

Blocks – Slot-level block metadata
Transactions – Full transaction-level data with signatures
Transfers – Native SOL and SPL token transfers
Balance Updates – Account balance changes per slot
DEX Pools – Decentralized exchange pool metadata
DEX Orders – Order-level DEX activity and fills
DEX Trades – Executed trades on Solana DEXs
Rewards – Validator and staking rewards

Use Cases: Solana DeFi analytics, NFT marketplace analysis, token transfer tracking, DEX volume analysis, validator performance monitoring.

Tron Blockchain Data Export

Export Tron blockchain data for comprehensive network analysis:

Blocks – Block-level metadata
Transactions – Full transaction-level data
Transfers – Native TRX and TRC-20 token transfers
Balance Updates – Account balance changes per block
DEX Trades – Executed trades on Tron DEXs

Use Cases: Tron DeFi analytics, TRC-20 token tracking, DEX volume analysis, account balance monitoring, transaction flow analysis.

Bitcoin Blockchain Data Export

Export Bitcoin blockchain data including transaction inputs, outputs, and OMNI Layer protocol data:

Blocks – Block-level metadata
Transactions – Full transaction-level data
Inputs – Transaction input data and UTXO references
Outputs – Transaction output data and addresses
OMNI Transactions – OMNI Layer protocol transactions
OMNI Transfers – OMNI Layer token transfers

Use Cases: Bitcoin transaction analysis, UTXO tracking, address clustering, OMNI token analysis, blockchain forensics, historical price analysis.

BSC (BNB Chain) Data Export

Export BSC (BNB Chain) blockchain data for comprehensive EVM-compatible chain analysis:

Blocks – Block-level metadata and timestamps
Transactions – Full transaction-level data with gas information
Transfers – Native BNB and BEP-20 token transfers
Balance Updates – Account balance changes per block
DEX Trades – Executed trades on BSC DEXs (PancakeSwap, etc.)
DEX Pools – Liquidity pool metadata and state
Smart Contract Calls – Function calls and contract interactions
Events – BSC event logs and emissions
Miner Rewards – Block rewards and transaction fees

Use Cases: BSC DeFi analytics, PancakeSwap analysis, BEP-20 token tracking, DEX volume analysis, smart contract monitoring, yield farming analytics, NFT marketplace data.

Data Format and Structure

Blockchain data is provided by default in Apache Parquet format, a columnar storage file format optimized for analytics workloads. We can also provide data in other file formats (CSV, JSON, Avro, etc.) based on your requirements. Parquet offers:

High Compression – Reduces storage costs by up to 90%
Columnar Storage – Enables efficient column pruning and predicate pushdown
Schema Evolution – Supports schema changes over time
Universal Compatibility – Works with all major analytics engines

File Organization

Data is organized by blockchain and topic, with files named using block/slot ranges:

bitquery-blockchain-dataset/
├── ethereum/
│   ├── blocks/
│   ├── transactions/
│   ├── transfers/
│   ├── balance_updates/
│   ├── dex_trades/
│   └── ...
├── solana/
│   ├── blocks/
│   ├── transactions/
│   ├── transfers/
│   ├── dex_trades/
│   └── ...
├── bitcoin/
│   ├── blocks/
│   ├── transactions/
│   ├── inputs/
│   ├── outputs/
│   └── ...
└── tron/
    ├── blocks/
    ├── transactions/
    ├── transfers/
    └── ...

Sample Parquet Data

To quickly explore the structure of the data and test your tooling, you can use our public sample datasets:

GitHub repository with sample Parquet dumps and schemas:
https://github.com/bitquery/blockchain-cloud-data-dump-sample/tree/main

In the GitHub repo, each sample file (per data point or topic) includes the exact S3 URL in a comment, so you can:

Point test pipelines to the same path
Easily request more files from the same bucket/prefix if you need additional data
Validate schemas before production integration

Example: Ethereum Balance Updates

https://bitquery-blockchain-dataset.s3.us-east-1.amazonaws.com/ethereum/balance_updates/24053500_24053549.parquet

bitquery-blockchain-dataset/
└── ethereum/
    └── balance_updates/
        ├── 24053500_24053549.parquet
        ├── 24053550_24053599.parquet
        ├── 24053600_24053649.parquet
        ├── 24053650_24053699.parquet
        ├── 24053700_24053749.parquet
        ├── 24053750_24053799.parquet
        ├── 24053800_24053849.parquet
        ├── 24053850_24053999.parquet
        ├── 24053900_24053949.parquet
        └── 24053950_24053999.parquet

Use Sample Data To:

Validate ETL Pipelines – Test your data processing workflows against realistic blockchain data
Inspect Schemas – Review column names, types, and data structures before production
Benchmark Performance – Measure query performance on realistic data sizes
Develop Analytics – Build and test analytics queries before full dataset access
Validate Tooling – Ensure compatibility with your analytics stack

Cloud Platform Integration

AWS S3 Integration

Store blockchain data in Amazon S3 and query with:

Amazon Athena – Serverless SQL queries on S3 data
Amazon Redshift – Data warehouse with S3 integration
AWS Glue – ETL jobs and data catalog
Amazon EMR – Spark-based analytics on S3

Google Cloud Platform Integration

Store blockchain data in Google Cloud Storage and analyze with:

BigQuery – Serverless data warehouse with native Parquet support
Dataproc – Managed Spark and Hadoop clusters
Dataflow – Stream and batch data processing
BigQuery ML – Machine learning on blockchain data

Snowflake Integration

Load blockchain data into Snowflake for:

Data Warehousing – Centralized blockchain data storage
SQL Analytics – Complex queries across multiple chains
Data Sharing – Share blockchain datasets across teams
Snowpark – Python, Java, and Scala analytics

Other Platforms

Our Parquet datasets are compatible with:

Databricks – Unified analytics platform
Apache Spark – Distributed data processing
Presto/Trino – Distributed SQL query engine
Apache Drill – Schema-free SQL queries
DuckDB – In-process analytical database

Building Scalable Real-Time Solutions

Bitquery enables you to build enterprise-grade, scalable, real-time, low-latency solutions in the cloud that can handle millions of transactions and events per second. Our cloud-native architecture supports both batch and streaming data pipelines for comprehensive blockchain analytics.

Real-Time Streaming Architecture

Build low-latency applications with sub-second data delivery using:

Kafka Streams – High-throughput, low-latency blockchain data streams
- Mempool Data – Access pending transactions before block confirmation
- Committed Data – Real-time confirmed transaction streams
- Multi-Chain Support – Stream data from Ethereum, Solana, Bitcoin, Tron, and more
- Protobuf Format – Efficient binary serialization for optimal performance
GraphQL Subscriptions – WebSocket-based real-time data subscriptions
- Live Queries – Subscribe to specific blockchain events and transactions
- Custom Filters – Filter data streams by address, token, contract, or event
- Low Latency – Sub-100ms data delivery for time-sensitive applications

Scalable Cloud Solutions

Design and deploy horizontally scalable solutions that can handle:

High Throughput – Process millions of transactions per day
Concurrent Users – Support thousands of simultaneous connections
Data Volume – Handle petabytes of historical and real-time data
Global Scale – Deploy across multiple cloud regions for low latency

Performance Benchmarks

Our cloud solutions support:

Latency: Sub-100ms for real-time streams, sub-second for batch queries
Throughput: Millions of transactions per second processing capability
Scalability: Auto-scaling from zero to thousands of concurrent connections
Availability: 99.9% uptime SLA with multi-region redundancy
Data Freshness: Real-time data with <1 second delay from blockchain

Use Cases

DeFi Analytics

DEX Volume Analysis – Track trading volumes across decentralized exchanges
Liquidity Pool Analytics – Monitor pool sizes, fees, and impermanent loss
Yield Farming Analysis – Analyze yield opportunities and risks
Token Flow Tracking – Monitor token movements between addresses

NFT Analytics

Collection Analysis – Track NFT sales, floor prices, and market trends
Marketplace Analytics – Compare performance across NFT marketplaces
Holder Analysis – Identify whale wallets and distribution patterns
Rarity Analysis – Calculate and track NFT rarity metrics

Blockchain Forensics

Transaction Tracing – Follow funds through complex transaction paths
Address Clustering – Identify related addresses and entities
Compliance Monitoring – Track suspicious transactions and patterns
Risk Assessment – Evaluate transaction risks and anomalies

Data Science and Machine Learning

Price Prediction – Build models using historical transaction data
Anomaly Detection – Identify unusual patterns in blockchain activity
Network Analysis – Analyze blockchain network topology
Sentiment Analysis – Correlate on-chain activity with market sentiment

Business Intelligence

Portfolio Tracking – Monitor multi-chain portfolio performance
Revenue Analytics – Track protocol revenues and fees
User Analytics – Analyze user behavior and engagement
Market Research – Study market trends and competitive analysis

Real-Time vs Batch Data Access

Cloud data dumps are optimized for batch analytics and historical workloads. They provide:

Complete Historical Data – Access to full blockchain history
Cost-Effective Storage – Optimized compression reduces costs
Batch Processing – Ideal for ETL pipelines and scheduled analytics
Data Warehousing – Perfect for building comprehensive data lakes

If you require low-latency or streaming blockchain data, Bitquery also provides:

Kafka Streams – Real-time blockchain data streams via Apache Kafka
GraphQL Subscriptions – Live data subscriptions for real-time applications

Getting Started

Explore Sample Data – Review our GitHub repository to understand data structures
Choose Your Blockchain – Select from EVM, Solana, Tron, Bitcoin, or BSC data exports
Set Up Cloud Storage – Configure AWS S3, Google Cloud Storage, or your preferred storage solution
Integrate Analytics Engine – Connect Snowflake, BigQuery, Athena, or your analytics platform
Build Your Pipeline – Create ETL jobs to process and transform blockchain data

EVM Data Export – Ethereum, Polygon, and other EVM chains
BSC Data Export – BNB Chain blockchain data dumps
Solana Data Export – Solana blockchain data dumps
Tron Data Export – Tron blockchain data dumps
Bitcoin Data Export – Bitcoin blockchain and OMNI data
Kafka Streaming Concepts – Real-time blockchain data streams

Overview​

Key Benefits​

Available Blockchain Data Dumps​

EVM Chains Data Export​

Solana Blockchain Data Export​

Tron Blockchain Data Export​

Bitcoin Blockchain Data Export​

BSC (BNB Chain) Data Export​

Data Format and Structure​

File Organization​

Sample Parquet Data​

Example: Ethereum Balance Updates​

Use Sample Data To:​

Cloud Platform Integration​

AWS S3 Integration​

Google Cloud Platform Integration​

Snowflake Integration​

Other Platforms​

Building Scalable Real-Time Solutions​

Real-Time Streaming Architecture​

Scalable Cloud Solutions​

Performance Benchmarks​

Use Cases​

DeFi Analytics​

NFT Analytics​

Blockchain Forensics​

Data Science and Machine Learning​

Business Intelligence​

Real-Time vs Batch Data Access​

Getting Started​

Related Documentation​