Skip to main content

Blockchain Data in Cloud

Bitquery provides ready-to-use blockchain data dumps in Parquet format via popular cloud providers such as AWS S3, Google Cloud Storage, Snowflake, and BigQuery.
You can plug these datasets directly into your existing analytics stack and build custom data pipelines without running your own blockchain infrastructure or maintaining complex indexing systems.

Overview

Our cloud data export service delivers production-ready blockchain datasets optimized for large-scale analytics, historical backfills, and data lake integrations. All data is provided in Apache Parquet format, ensuring optimal compression, columnar storage, and compatibility with modern analytics engines. We can also provide other file formats if required.

Key Benefits

  • No Infrastructure Management – Skip running blockchain nodes, indexers, or data processing infrastructure
  • Production-Ready Format – Parquet files optimized for analytics workloads
  • Cloud-Native – Direct integration with AWS S3, Google Cloud Storage, Snowflake, and BigQuery
  • Historical Coverage – Complete blockchain history from genesis blocks
  • Multi-Chain Support – Access data from major blockchain networks
  • Cost-Effective – Pay only for the data you need, when you need it
  • Scalable – Handle petabytes of blockchain data with ease

Available Blockchain Data Dumps

Bitquery provides comprehensive cloud data dumps for the following blockchains:

EVM Chains Data Export

Export blockchain data for Ethereum, BSC, Base, Polygon/Matic, Optimism, Arbitrum, and other EVM-compatible chains. Includes:

  • Blocks – Block-level metadata and timestamps
  • Transactions – Full transaction-level data with gas information
  • Transfers – Native token and ERC-20 token transfers
  • Balance Updates – Account balance changes per block
  • DEX Trades – Decentralized exchange trading data
  • DEX Pools – Liquidity pool metadata and state
  • Smart Contract Calls – Function calls and interactions
  • Events – Ethereum event logs and emissions
  • Miner Rewards – Block rewards and transaction fees
  • Uncle Blocks – Ethereum uncle block data

Use Cases: DeFi analytics, NFT tracking, smart contract analysis, token holder analysis, DEX volume analysis, cross-chain analytics.

Solana Blockchain Data Export

Export Solana blockchain data including slot-level blocks, transactions, transfers, and DEX activity:

  • Blocks – Slot-level block metadata
  • Transactions – Full transaction-level data with signatures
  • Transfers – Native SOL and SPL token transfers
  • Balance Updates – Account balance changes per slot
  • DEX Pools – Decentralized exchange pool metadata
  • DEX Orders – Order-level DEX activity and fills
  • DEX Trades – Executed trades on Solana DEXs
  • Rewards – Validator and staking rewards

Use Cases: Solana DeFi analytics, NFT marketplace analysis, token transfer tracking, DEX volume analysis, validator performance monitoring.

Tron Blockchain Data Export

Export Tron blockchain data for comprehensive network analysis:

  • Blocks – Block-level metadata
  • Transactions – Full transaction-level data
  • Transfers – Native TRX and TRC-20 token transfers
  • Balance Updates – Account balance changes per block
  • DEX Trades – Executed trades on Tron DEXs

Use Cases: Tron DeFi analytics, TRC-20 token tracking, DEX volume analysis, account balance monitoring, transaction flow analysis.

Bitcoin Blockchain Data Export

Export Bitcoin blockchain data including transaction inputs, outputs, and OMNI Layer protocol data:

  • Blocks – Block-level metadata
  • Transactions – Full transaction-level data
  • Inputs – Transaction input data and UTXO references
  • Outputs – Transaction output data and addresses
  • OMNI Transactions – OMNI Layer protocol transactions
  • OMNI Transfers – OMNI Layer token transfers

Use Cases: Bitcoin transaction analysis, UTXO tracking, address clustering, OMNI token analysis, blockchain forensics, historical price analysis.

Data Format and Structure

Blockchain data is provided by default in Apache Parquet format, a columnar storage file format optimized for analytics workloads. We can also provide data in other file formats (CSV, JSON, Avro, etc.) based on your requirements. Parquet offers:

  • High Compression – Reduces storage costs by up to 90%
  • Columnar Storage – Enables efficient column pruning and predicate pushdown
  • Schema Evolution – Supports schema changes over time
  • Universal Compatibility – Works with all major analytics engines

File Organization

Data is organized by blockchain and topic, with files named using block/slot ranges:

bitquery-blockchain-dataset/
├── ethereum/
│ ├── blocks/
│ ├── transactions/
│ ├── transfers/
│ ├── balance_updates/
│ ├── dex_trades/
│ └── ...
├── solana/
│ ├── blocks/
│ ├── transactions/
│ ├── transfers/
│ ├── dex_trades/
│ └── ...
├── bitcoin/
│ ├── blocks/
│ ├── transactions/
│ ├── inputs/
│ ├── outputs/
│ └── ...
└── tron/
├── blocks/
├── transactions/
├── transfers/
└── ...

Sample Parquet Data

To quickly explore the structure of the data and test your tooling, you can use our public sample datasets:

In the GitHub repo, each sample file (per data point or topic) includes the exact S3 URL in a comment, so you can:

  • Point test pipelines to the same path
  • Easily request more files from the same bucket/prefix if you need additional data
  • Validate schemas before production integration

Example: Ethereum Balance Updates

https://bitquery-blockchain-dataset.s3.us-east-1.amazonaws.com/ethereum/balance_updates/24053500_24053549.parquet

bitquery-blockchain-dataset/
└── ethereum/
└── balance_updates/
├── 24053500_24053549.parquet
├── 24053550_24053599.parquet
├── 24053600_24053649.parquet
├── 24053650_24053699.parquet
├── 24053700_24053749.parquet
├── 24053750_24053799.parquet
├── 24053800_24053849.parquet
├── 24053850_24053999.parquet
├── 24053900_24053949.parquet
└── 24053950_24053999.parquet

Use Sample Data To:

  • Validate ETL Pipelines – Test your data processing workflows against realistic blockchain data
  • Inspect Schemas – Review column names, types, and data structures before production
  • Benchmark Performance – Measure query performance on realistic data sizes
  • Develop Analytics – Build and test analytics queries before full dataset access
  • Validate Tooling – Ensure compatibility with your analytics stack

Cloud Platform Integration

AWS S3 Integration

Store blockchain data in Amazon S3 and query with:

  • Amazon Athena – Serverless SQL queries on S3 data
  • Amazon Redshift – Data warehouse with S3 integration
  • AWS Glue – ETL jobs and data catalog
  • Amazon EMR – Spark-based analytics on S3

Google Cloud Platform Integration

Store blockchain data in Google Cloud Storage and analyze with:

  • BigQuery – Serverless data warehouse with native Parquet support
  • Dataproc – Managed Spark and Hadoop clusters
  • Dataflow – Stream and batch data processing
  • BigQuery ML – Machine learning on blockchain data

Snowflake Integration

Load blockchain data into Snowflake for:

  • Data Warehousing – Centralized blockchain data storage
  • SQL Analytics – Complex queries across multiple chains
  • Data Sharing – Share blockchain datasets across teams
  • Snowpark – Python, Java, and Scala analytics

Other Platforms

Our Parquet datasets are compatible with:

  • Databricks – Unified analytics platform
  • Apache Spark – Distributed data processing
  • Presto/Trino – Distributed SQL query engine
  • Apache Drill – Schema-free SQL queries
  • DuckDB – In-process analytical database

Building Scalable Real-Time Solutions

Bitquery enables you to build enterprise-grade, scalable, real-time, low-latency solutions in the cloud that can handle millions of transactions and events per second. Our cloud-native architecture supports both batch and streaming data pipelines for comprehensive blockchain analytics.

Real-Time Streaming Architecture

Build low-latency applications with sub-second data delivery using:

  • Kafka Streams – High-throughput, low-latency blockchain data streams

    • Mempool Data – Access pending transactions before block confirmation
    • Committed Data – Real-time confirmed transaction streams
    • Multi-Chain Support – Stream data from Ethereum, Solana, Bitcoin, Tron, and more
    • Protobuf Format – Efficient binary serialization for optimal performance
  • GraphQL Subscriptions – WebSocket-based real-time data subscriptions

    • Live Queries – Subscribe to specific blockchain events and transactions
    • Custom Filters – Filter data streams by address, token, contract, or event
    • Low Latency – Sub-100ms data delivery for time-sensitive applications

Scalable Cloud Solutions

Design and deploy horizontally scalable solutions that can handle:

  • High Throughput – Process millions of transactions per day
  • Concurrent Users – Support thousands of simultaneous connections
  • Data Volume – Handle petabytes of historical and real-time data
  • Global Scale – Deploy across multiple cloud regions for low latency

Cloud-Native Architecture Patterns

Lambda Functions and Serverless

Build serverless solutions using:

  • AWS Lambda – Process blockchain events in real-time
  • Google Cloud Functions – Serverless event-driven processing
  • Azure Functions – Scalable serverless compute
  • Use Cases: Real-time alerts, automated trading, compliance monitoring

Containerized Microservices

Deploy scalable microservices with:

  • Kubernetes – Container orchestration for auto-scaling
  • Docker – Containerized blockchain data processors
  • ECS/EKS – Managed container services on AWS
  • Use Cases: Multi-tenant analytics platforms, API services, data aggregators

Event-Driven Architecture

Build reactive systems with:

  • Event Sourcing – Store blockchain events as immutable logs
  • CQRS – Separate read and write models for optimal performance
  • Message Queues – Use Kafka, SQS, or Pub/Sub for event distribution
  • Use Cases: Real-time dashboards, notification systems, trading platforms

Low-Latency Optimization

Achieve millisecond-level latency for time-sensitive applications:

  • Edge Computing – Deploy processing close to data sources
  • In-Memory Caching – Use Redis, Memcached, or ElastiCache
  • CDN Integration – Distribute data globally for reduced latency
  • Connection Pooling – Optimize database and API connections
  • Use Cases: Trading bots, arbitrage detection, flash loan monitoring

Hybrid Batch + Streaming Solutions

Combine batch and streaming for comprehensive analytics:

  • Lambda Architecture – Process both real-time and historical data
  • Kappa Architecture – Unified streaming pipeline for all data
  • Data Lake + Stream – Store historical data in S3/GCS, stream real-time updates
  • Use Cases: Portfolio trackers, analytics dashboards, compliance systems

Example Real-Time Use Cases

  • Trading Bots – Execute trades based on real-time blockchain events
  • Arbitrage Detection – Identify price discrepancies across DEXs instantly
  • Flash Loan Monitoring – Detect and analyze flash loan attacks in real-time
  • MEV Detection – Monitor miner extractable value opportunities
  • Compliance Alerts – Real-time monitoring of suspicious transactions
  • Portfolio Tracking – Live updates of multi-chain portfolio values
  • DEX Analytics – Real-time DEX volume and liquidity tracking
  • NFT Floor Price Alerts – Instant notifications on price changes

Performance Benchmarks

Our cloud solutions support:

  • Latency: Sub-100ms for real-time streams, sub-second for batch queries
  • Throughput: Millions of transactions per second processing capability
  • Scalability: Auto-scaling from zero to thousands of concurrent connections
  • Availability: 99.9% uptime SLA with multi-region redundancy
  • Data Freshness: Real-time data with < 1 second delay from blockchain

Use Cases

DeFi Analytics

  • DEX Volume Analysis – Track trading volumes across decentralized exchanges
  • Liquidity Pool Analytics – Monitor pool sizes, fees, and impermanent loss
  • Yield Farming Analysis – Analyze yield opportunities and risks
  • Token Flow Tracking – Monitor token movements between addresses

NFT Analytics

  • Collection Analysis – Track NFT sales, floor prices, and market trends
  • Marketplace Analytics – Compare performance across NFT marketplaces
  • Holder Analysis – Identify whale wallets and distribution patterns
  • Rarity Analysis – Calculate and track NFT rarity metrics

Blockchain Forensics

  • Transaction Tracing – Follow funds through complex transaction paths
  • Address Clustering – Identify related addresses and entities
  • Compliance Monitoring – Track suspicious transactions and patterns
  • Risk Assessment – Evaluate transaction risks and anomalies

Data Science and Machine Learning

  • Price Prediction – Build models using historical transaction data
  • Anomaly Detection – Identify unusual patterns in blockchain activity
  • Network Analysis – Analyze blockchain network topology
  • Sentiment Analysis – Correlate on-chain activity with market sentiment

Business Intelligence

  • Portfolio Tracking – Monitor multi-chain portfolio performance
  • Revenue Analytics – Track protocol revenues and fees
  • User Analytics – Analyze user behavior and engagement
  • Market Research – Study market trends and competitive analysis

Real-Time vs Batch Data Access

Cloud data dumps are optimized for batch analytics and historical workloads. They provide:

  • Complete Historical Data – Access to full blockchain history
  • Cost-Effective Storage – Optimized compression reduces costs
  • Batch Processing – Ideal for ETL pipelines and scheduled analytics
  • Data Warehousing – Perfect for building comprehensive data lakes

If you require low-latency or streaming blockchain data, Bitquery also provides:

  • Kafka Streams – Real-time blockchain data streams via Apache Kafka
  • GraphQL Subscriptions – Live data subscriptions for real-time applications

Getting Started

  1. Explore Sample Data – Review our GitHub repository to understand data structures
  2. Choose Your Blockchain – Select from EVM, Solana, Tron, or Bitcoin data exports
  3. Set Up Cloud Storage – Configure AWS S3, Google Cloud Storage, or your preferred storage solution
  4. Integrate Analytics Engine – Connect Snowflake, BigQuery, Athena, or your analytics platform
  5. Build Your Pipeline – Create ETL jobs to process and transform blockchain data