Skip to content

S3

S3 is a bucket for storing data in Amazon's Simple Storage Service, a cloud-based storage solution provided by AWS. S3 buckets allow users to store and retrieve data at any time from anywhere on the web.

ingestr supports S3 as a source.

URI format

The URI format for S3 is as follows:

plaintext
s3://<bucket_name>/<path_to_file>?access_key_id=<access_key_id>&secret_access_key=<secret_access_key>

URI parameters:

  • bucket_name: The name of the bucket
  • path_to_files: The relative path from the root of the bucket. You can find this from the S3 URI. For example, if your S3 URI is s3://mybucket/students/students_details.csv, then your bucket name is mybucket and path_to_files is students/students_details.csv.
  • access_key_id and secret_access_key : Used for accessing S3 bucket.

Setting up a S3 Integration

S3 requires an access_key_id and a secret_access_key to access the bucket. Please follow the guide on dltHub to obtain credentials. Once you've completed the guide, you should have an access_key_id and secret_access_key. From the S3 URI, you can extract the bucket_name and path_to_files

For example, if your access_key_id is AKC3YOW7E, secret_access_key is XCtkpL5B, bucket name is my_bucket, and path_to_files is students/students_details.csv, here's a sample command that will copy the data from the S3 bucket into a DuckDB database:

sh
ingestr ingest --source-uri 's3://my_bucket/students/students_details.csv?access_key_id=AKC3YOW7E&secret_access_key=XCtkpL5B' --source-table 'students_details' --dest-uri duckdb:///s3.duckdb --dest-table 'dest.students_details'

The result of this command will be a table in the DuckDB database in the path s3.duckdb.

Below are some examples of path patterns, each path pattern is a reference from the root of the bucket:

  • **/*.csv: Retrieves all the CSV files, regardless of how deep they are within the folder structure.
  • *.csv: Retrieves all the CSV files from the first level of a folder.
  • myFolder/**/*.jsonl: Retrieves all the JSONL files from anywhere under myFolder.
  • myFolder/mySubFolder/users.parquet: Retrieves the users.parquet file from mySubFolder.
  • employees.jsonl: Retrieves the employees.jsonl file from the root level of the bucket.