Blockchain Data in Cloud
Bitquery provides ready-to-use blockchain data dumps in Parquet format via popular cloud providers such as AWS S3, Google Cloud Storage, Snowflake, and BigQuery.
You can plug these datasets directly into your existing analytics stack and build custom data pipelines without running your own blockchain infrastructure or maintaining complex indexing systems.
Overview
Our cloud data export service delivers production-ready blockchain datasets optimized for large-scale analytics, historical backfills, and data lake integrations. All data is provided in Apache Parquet format, ensuring optimal compression, columnar storage, and compatibility with modern analytics engines. We can also provide other file formats if required.
Key Benefits
- No Infrastructure Management – Skip running blockchain nodes, indexers, or data processing infrastructure
- Production-Ready Format – Parquet files optimized for analytics workloads
- Cloud-Native – Direct integration with AWS S3, Google Cloud Storage, Snowflake, and BigQuery
- Historical Coverage – Complete blockchain history from genesis blocks
- Multi-Chain Support – Access data from major blockchain networks
- Cost-Effective – Pay only for the data you need, when you need it
- Scalable – Handle petabytes of blockchain data with ease
Available Blockchain Data Dumps
Bitquery provides comprehensive cloud data dumps for the following blockchains:
EVM Chains Data Export
Export blockchain data for Ethereum, BSC, Base, Polygon/Matic, Optimism, Arbitrum, and other EVM-compatible chains. Includes:
- Blocks – Block-level metadata and timestamps
- Transactions – Full transaction-level data with gas information
- Transfers – Native token and ERC-20 token transfers
- Balance Updates – Account balance changes per block
- DEX Trades – Decentralized exchange trading data
- DEX Pools – Liquidity pool metadata and state
- Smart Contract Calls – Function calls and interactions
- Events – Ethereum event logs and emissions
- Miner Rewards – Block rewards and transaction fees
- Uncle Blocks – Ethereum uncle block data
Use Cases: DeFi analytics, NFT tracking, smart contract analysis, token holder analysis, DEX volume analysis, cross-chain analytics.
Solana Blockchain Data Export
Export Solana blockchain data including slot-level blocks, transactions, transfers, and DEX activity:
- Blocks – Slot-level block metadata
- Transactions – Full transaction-level data with signatures
- Transfers – Native SOL and SPL token transfers
- Balance Updates – Account balance changes per slot
- DEX Pools – Decentralized exchange pool metadata
- DEX Orders – Order-level DEX activity and fills
- DEX Trades – Executed trades on Solana DEXs
- Rewards – Validator and staking rewards
Use Cases: Solana DeFi analytics, NFT marketplace analysis, token transfer tracking, DEX volume analysis, validator performance monitoring.
Tron Blockchain Data Export
Export Tron blockchain data for comprehensive network analysis:
- Blocks – Block-level metadata
- Transactions – Full transaction-level data
- Transfers – Native TRX and TRC-20 token transfers
- Balance Updates – Account balance changes per block
- DEX Trades – Executed trades on Tron DEXs
Use Cases: Tron DeFi analytics, TRC-20 token tracking, DEX volume analysis, account balance monitoring, transaction flow analysis.
Bitcoin Blockchain Data Export
Export Bitcoin blockchain data including transaction inputs, outputs, and OMNI Layer protocol data:
- Blocks – Block-level metadata
- Transactions – Full transaction-level data
- Inputs – Transaction input data and UTXO references
- Outputs – Transaction output data and addresses
- OMNI Transactions – OMNI Layer protocol transactions
- OMNI Transfers – OMNI Layer token transfers
Use Cases: Bitcoin transaction analysis, UTXO tracking, address clustering, OMNI token analysis, blockchain forensics, historical price analysis.
BSC (BNB Chain) Data Export
Export BSC (BNB Chain) blockchain data for comprehensive EVM-compatible chain analysis:
- Blocks – Block-level metadata and timestamps
- Transactions – Full transaction-level data with gas information
- Transfers – Native BNB and BEP-20 token transfers
- Balance Updates – Account balance changes per block
- DEX Trades – Executed trades on BSC DEXs (PancakeSwap, etc.)
- DEX Pools – Liquidity pool metadata and state
- Smart Contract Calls – Function calls and contract interactions
- Events – BSC event logs and emissions
- Miner Rewards – Block rewards and transaction fees
Use Cases: BSC DeFi analytics, PancakeSwap analysis, BEP-20 token tracking, DEX volume analysis, smart contract monitoring, yield farming analytics, NFT marketplace data.
Data Format and Structure
Blockchain data is provided by default in Apache Parquet format, a columnar storage file format optimized for analytics workloads. We can also provide data in other file formats (CSV, JSON, Avro, etc.) based on your requirements. Parquet offers:
- High Compression – Reduces storage costs by up to 90%
- Columnar Storage – Enables efficient column pruning and predicate pushdown
- Schema Evolution – Supports schema changes over time
- Universal Compatibility – Works with all major analytics engines
File Organization
Data is organized by blockchain and topic, with files named using block/slot ranges:
bitquery-blockchain-dataset/
├── ethereum/
│ ├── blocks/
│ ├── transactions/
│ ├── transfers/
│ ├── balance_updates/
│ ├── dex_trades/
│ └── ...
├── solana/
│ ├── blocks/
│ ├── transactions/
│ ├── transfers/
│ ├── dex_trades/
│ └── ...
├── bitcoin/
│ ├── blocks/
│ ├── transactions/
│ ├── inputs/
│ ├── outputs/
│ └── ...
└── tron/
├── blocks/
├── transactions/
├── transfers/
└── ...
Sample Parquet Data
To quickly explore the structure of the data and test your tooling, you can use our public sample datasets:
- GitHub repository with sample Parquet dumps and schemas:
https://github.com/bitquery/blockchain-cloud-data-dump-sample/tree/main
In the GitHub repo, each sample file (per data point or topic) includes the exact S3 URL in a comment, so you can:
- Point test pipelines to the same path
- Easily request more files from the same bucket/prefix if you need additional data
- Validate schemas before production integration
Example: Ethereum Balance Updates
https://bitquery-blockchain-dataset.s3.us-east-1.amazonaws.com/ethereum/balance_updates/24053500_24053549.parquet
bitquery-blockchain-dataset/
└── ethereum/
└── balance_updates/
├── 24053500_24053549.parquet
├── 24053550_24053599.parquet
├── 24053600_24053649.parquet
├── 24053650_24053699.parquet
├── 24053700_24053749.parquet
├── 24053750_24053799.parquet
├── 24053800_24053849.parquet
├── 24053850_24053999.parquet
├── 24053900_24053949.parquet
└── 24053950_24053999.parquet
Use Sample Data To:
- Validate ETL Pipelines – Test your data processing workflows against realistic blockchain data
- Inspect Schemas – Review column names, types, and data structures before production
- Benchmark Performance – Measure query performance on realistic data sizes
- Develop Analytics – Build and test analytics queries before full dataset access
- Validate Tooling – Ensure compatibility with your analytics stack
Cloud Platform Integration
AWS S3 Integration
Store blockchain data in Amazon S3 and query with:
- Amazon Athena – Serverless SQL queries on S3 data
- Amazon Redshift – Data warehouse with S3 integration
- AWS Glue – ETL jobs and data catalog
- Amazon EMR – Spark-based analytics on S3
Google Cloud Platform Integration
Store blockchain data in Google Cloud Storage and analyze with:
- BigQuery – Serverless data warehouse with native Parquet support
- Dataproc – Managed Spark and Hadoop clusters
- Dataflow – Stream and batch data processing
- BigQuery ML – Machine learning on blockchain data
Snowflake Integration
Load blockchain data into Snowflake for:
- Data Warehousing – Centralized blockchain data storage
- SQL Analytics – Complex queries across multiple chains
- Data Sharing – Share blockchain datasets across teams
- Snowpark – Python, Java, and Scala analytics
Other Platforms
Our Parquet datasets are compatible with:
- Databricks – Unified analytics platform
- Apache Spark – Distributed data processing
- Presto/Trino – Distributed SQL query engine
- Apache Drill – Schema-free SQL queries
- DuckDB – In-process analytical database
Building Scalable Real-Time Solutions
Bitquery enables you to build enterprise-grade, scalable, real-time, low-latency solutions in the cloud that can handle millions of transactions and events per second. Our cloud-native architecture supports both batch and streaming data pipelines for comprehensive blockchain analytics.
Real-Time Streaming Architecture
Build low-latency applications with sub-second data delivery using:
-
Kafka Streams – High-throughput, low-latency blockchain data streams
- Mempool Data – Access pending transactions before block confirmation
- Committed Data – Real-time confirmed transaction streams
- Multi-Chain Support – Stream data from Ethereum, Solana, Bitcoin, Tron, and more
- Protobuf Format – Efficient binary serialization for optimal performance
-
GraphQL Subscriptions – WebSocket-based real-time data subscriptions
- Live Queries – Subscribe to specific blockchain events and transactions
- Custom Filters – Filter data streams by address, token, contract, or event
- Low Latency – Sub-100ms data delivery for time-sensitive applications
Scalable Cloud Solutions
Design and deploy horizontally scalable solutions that can handle:
- High Throughput – Process millions of transactions per day
- Concurrent Users – Support thousands of simultaneous connections
- Data Volume – Handle petabytes of historical and real-time data
- Global Scale – Deploy across multiple cloud regions for low latency
Performance Benchmarks
Our cloud solutions support:
- Latency: Sub-100ms for real-time streams, sub-second for batch queries
- Throughput: Millions of transactions per second processing capability
- Scalability: Auto-scaling from zero to thousands of concurrent connections
- Availability: 99.9% uptime SLA with multi-region redundancy
- Data Freshness: Real-time data with <1 second delay from blockchain
Use Cases
DeFi Analytics
- DEX Volume Analysis – Track trading volumes across decentralized exchanges
- Liquidity Pool Analytics – Monitor pool sizes, fees, and impermanent loss
- Yield Farming Analysis – Analyze yield opportunities and risks
- Token Flow Tracking – Monitor token movements between addresses
NFT Analytics
- Collection Analysis – Track NFT sales, floor prices, and market trends
- Marketplace Analytics – Compare performance across NFT marketplaces
- Holder Analysis – Identify whale wallets and distribution patterns
- Rarity Analysis – Calculate and track NFT rarity metrics
Blockchain Forensics
- Transaction Tracing – Follow funds through complex transaction paths
- Address Clustering – Identify related addresses and entities
- Compliance Monitoring – Track suspicious transactions and patterns
- Risk Assessment – Evaluate transaction risks and anomalies
Data Science and Machine Learning
- Price Prediction – Build models using historical transaction data
- Anomaly Detection – Identify unusual patterns in blockchain activity
- Network Analysis – Analyze blockchain network topology
- Sentiment Analysis – Correlate on-chain activity with market sentiment
Business Intelligence
- Portfolio Tracking – Monitor multi-chain portfolio performance
- Revenue Analytics – Track protocol revenues and fees
- User Analytics – Analyze user behavior and engagement
- Market Research – Study market trends and competitive analysis
Real-Time vs Batch Data Access
Cloud data dumps are optimized for batch analytics and historical workloads. They provide:
- Complete Historical Data – Access to full blockchain history
- Cost-Effective Storage – Optimized compression reduces costs
- Batch Processing – Ideal for ETL pipelines and scheduled analytics
- Data Warehousing – Perfect for building comprehensive data lakes
If you require low-latency or streaming blockchain data, Bitquery also provides:
- Kafka Streams – Real-time blockchain data streams via Apache Kafka
- GraphQL Subscriptions – Live data subscriptions for real-time applications
Getting Started
- Explore Sample Data – Review our GitHub repository to understand data structures
- Choose Your Blockchain – Select from EVM, Solana, Tron, Bitcoin, or BSC data exports
- Set Up Cloud Storage – Configure AWS S3, Google Cloud Storage, or your preferred storage solution
- Integrate Analytics Engine – Connect Snowflake, BigQuery, Athena, or your analytics platform
- Build Your Pipeline – Create ETL jobs to process and transform blockchain data
Related Documentation
- EVM Data Export – Ethereum, Polygon, and other EVM chains
- BSC Data Export – BNB Chain blockchain data dumps
- Solana Data Export – Solana blockchain data dumps
- Tron Data Export – Tron blockchain data dumps
- Bitcoin Data Export – Bitcoin blockchain and OMNI data
- Kafka Streaming Concepts – Real-time blockchain data streams