S3
Amazon Simple Storage Service S3 is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface.Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network. Amazon S3 can store any type of object, which allows uses like storage for Internet applications, backups, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage.
Bruin supports S3 as a source for Ingestr assets, and you can use it to ingest data from S3 into your data warehouse.
In order to set up the S3 connection, you need to add a configuration item in the .bruin.yml
file and in the asset
file. You will need the access_key_id
and secret_access_key
. For details on how to obtain these credentials, please refer here.
Follow the steps below to correctly set up S3 as a data source and run ingestion.
Step 1: Add a connection to .bruin.yml file
To connect to S3, you need to add a configuration item to the connections section of the .bruin.yml
file. This configuration must comply with the following schema:
connections:
s3:
- name: "my-s3"
access_key_id: "AKI_123"
secret_access_key: "L6L_123"
access_key_id
andsecret_access_key
: Used for accessing S3 bucket.
Step 2: Create an asset file for data ingestion
To ingest data from S3, you need to create an asset configuration file. This file defines the data flow from the source to the destination. Create a YAML file (e.g., s3_ingestion.yml) inside the assets folder and add the following content:
name: public.s3
type: ingestr
connection: postgres
parameters:
source_connection: my-s3
source_table: 'mybucket/students/students_details.csv'
destination: postgres
name
: The name of the asset.type
: Specifies the type of the asset. Set this to ingestr to use the ingestr data pipeline.connection
: This is the destination connection, which defines where the data should be stored. For example:postgres
indicates that the ingested data will be stored in a Postgres database.source_connection
: The name of the S3 connection defined in .bruin.yml.source_table
: the bucket name and file path (or file glob) separated by a forward slash (/
).
Step 3: Run asset to ingest data
bruin run assets/s3_ingestion.yml
As a result of this command, Bruin will ingest data from the given S3 table into your Postgres database.