Skip to content

S3

S3 is a bucket for storing data in Amazon's Simple Storage Service, a cloud-based storage solution provided by AWS. S3 buckets allow users to store and retrieve data at any time from anywhere on the web.

Bruin supports S3 as a source for Ingestr assets, and you can use it to ingest data from S3 into your data warehouse.

In order to set up the S3 connection, you need to add a configuration item in the .bruin.yml file and in the asset file. You will need the bucket_name, path_to_file, access_key_id, and secret_access_key. For details on how to obtain these credentials, please refer here.

Follow the steps below to correctly set up S3 as a data source and run ingestion.

Step 1: Add a connection to .bruin.yml file

To connect to S3, you need to add a configuration item to the connections section of the .bruin.yml file. This configuration must comply with the following schema:

yaml
    connections:
      s3:
        - name: "my-s3"
          bucket_name: "my_bucket"
          path_to_file: "users.csv"
          access_key_id: "AKI_123"
          secret_access_key: "L6L_123"
  • bucket_name: The name of the bucket
  • path_to_file: The relative path from the root of the bucket. You can find this from the S3 URI. For example, if your S3 URI is s3://mybucket/students/students_details.csv, then your bucket name is mybucket and path_to_files is students/students_details.csv.
  • access_key_id and secret_access_key: Used for accessing S3 bucket.

Step 2: Create an asset file for data ingestion

To ingest data from S3, you need to create an asset configuration file. This file defines the data flow from the source to the destination. Create a YAML file (e.g., s3_ingestion.yml) inside the assets folder and add the following content:

yaml
name: public.s3
type: ingestr
connection: postgres

parameters:
  source_connection: my-s3
  source_table: 'students_details'

  destination: postgres
  • name: The name of the asset.
  • type: Specifies the type of the asset. Set this to ingestr to use the ingestr data pipeline.
  • connection: This is the destination connection, which defines where the data should be stored. For example: postgres indicates that the ingested data will be stored in a Postgres database.
  • source_connection: The name of the S3 connection defined in .bruin.yml.
  • source_table: The name of the file/folder in S3 that you want to ingest.

Step 3: Run asset to ingest data

bruin run assets/s3_ingestion.yml

As a result of this command, Bruin will ingest data from the given S3 table into your Postgres database.

S3