Getting Ethereum Data from AWS Bucket

We are now offering raw blockchain data on Amazon S3, taking a step towards providing a faster way to build dApps.

Data on S3 uses the Protobuf schema that allows for more efficient data transfer and storage. To make things easier, I'll walk you through how to access the storage and parse the data in Python.

You can access the product on the AWS Marketplace here: https://aws.amazon.com/marketplace/pp/prodview-oi4sbdu6zro3i

For this tutorial, we'll be using readily available sample datasets. So, let's get started!

marketplace

What does this code do?

This code performs the download and processing of two different files from an Amazon Web Services (AWS) S3 bucket. Specifically, it downloads two files, decompresses them, and extracts information from them using Protocol Buffers.

You can access the complete Git repo here: https://github.com/bitquery/S3-Sample-Parse-Tutorial/tree/main

Step-by-Step Implemention

Import the necessary libraries

import boto3

import lz4.frame

import block_message_pb2

import json

from google.protobuf.json_format import MessageToJson

Access S3 bucket

For this step, you need to get your AWS Access Keys.

Go to your AWS Console https://aws.amazon.com/console/
Navigate to your profile -> Security credentials -> Generate Access Key

You can find the details of the demo buckets here https://docs.bitquery.io/docs/cloud/s3/demo/

The first part of the code uses the boto3 library to connect to the S3 bucket with the specified credentials. It then defines two object keys, block_object_key which correspond to the S3 object keys for the two files to be downloaded. It also specifies two local file paths, blocks_local_path and dextrades_local_path, which are where the downloaded files will be saved on the local system.

s3 = boto3.client('s3', aws_access_key_id='YOUR ID', aws_secret_access_key='YOUR KEY', region_name='us-east-1')

bucket_name = 'demo-streaming-eth'

block_object_key = 'eth.blocks.s3/000016780000/000016780000_0xf127ae770b9b73af1be93e5a7ac19be5e3bac41673b2685c6b4619fb09af09f0_41452bd33251301d32c606c704120d027de580505d611e4fb1c5ff3ef51d0cb7.block.lz4'

blocks_local_path = 'PATH TO YOUR FILE/s3downloadblocks.lz4'

Download the Files

In this step we use block_message_pb2 which are protobuf message files which contains the structure of the data.

We use the s3.download_file() method to download the file specified by block_object_key from the S3 bucket and save it to the local file path specified by blocks_local_path. The downloaded file is in a compressed format, so the code then uses the lz4.frame.decompress() method from the lz4 library to decompress the data.

s3.download_file(bucket_name, block_object_key, blocks_local_path)

with open(blocks_local_path, 'rb') as f:
    compressed_data = f.read()

decompressed_data = lz4.frame.decompress(compressed_data)
print('here')
block_headers = block_message_pb2.BlockHeader()
block_headers.ParseFromString(decompressed_data)
print('here1')

Write to JSON files

The decompressed data is then parsed using the block_message_pb2.BlockHeader() method, which returns an object containing information about the block headers. The code then converts this object to a JSON string using the MessageToJson() method from the google.protobuf.json_format library and saves it to a file named block_headers.json.

# Write block_headers to a file
with open('block_headers.json', 'w') as f:
    json_string = MessageToJson(block_headers)
    f.write(json_string)

Getting Ethereum Data from AWS Bucket

What does this code do?​

Step-by-Step Implemention​

Import the necessary libraries​

Access S3 bucket​

Download the Files​

Write to JSON files​