Skip to content

S3

Amazon Simple Storage Service S3 is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface.Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network. Amazon S3 can store any type of object, which allows uses like storage for Internet applications, backups, disaster recovery, data archives, data lakes for analytics, and hybrid cloud storage.

Bruin supports S3 as a source for Ingestr assets, and you can use it to ingest data from S3 into your data warehouse.

In order to set up the S3 connection, you need to add a configuration item in the .bruin.yml file and in the asset file. You will need the access_key_id and secret_access_key. For details on how to obtain these credentials, please refer here.

Follow the steps below to correctly set up S3 as a data source and run ingestion.

Step 1: Add a connection to .bruin.yml file

To connect to S3, you need to add a configuration item to the connections section of the .bruin.yml file. This configuration must comply with the following schema:

yaml
    connections:
      s3:
        - name: "my-s3"
          access_key_id: "AKI_123"
          secret_access_key: "L6L_123"
  • access_key_id and secret_access_key: Used for accessing S3 bucket.

Step 2: Create an asset file for data ingestion

To ingest data from S3, you need to create an asset configuration file. This file defines the data flow from the source to the destination. Create a YAML file (e.g., s3_ingestion.yml) inside the assets folder and add the following content:

yaml
name: public.s3
type: ingestr
connection: postgres

parameters:
  source_connection: my-s3
  source_table: 'mybucket/students/students_details.csv'

  destination: postgres
  • name: The name of the asset.
  • type: Specifies the type of the asset. Set this to ingestr to use the ingestr data pipeline.
  • connection: This is the destination connection, which defines where the data should be stored. For example: postgres indicates that the ingested data will be stored in a Postgres database.
  • source_connection: The name of the S3 connection defined in .bruin.yml.
  • source_table: the bucket name and file path (or file glob) separated by a forward slash (/).

Step 3: Run asset to ingest data

bruin run assets/s3_ingestion.yml

As a result of this command, Bruin will ingest data from the given S3 table into your Postgres database.

S3