S3
S3 is a bucket for storing data in Amazon's Simple Storage Service, a cloud-based storage solution provided by AWS. S3 buckets allow users to store and retrieve data at any time from anywhere on the web.
ingestr supports S3 as a source.
URI format
The URI format for S3 is as follows:
s3://<bucket_name>/<path_to_file>?access_key_id=<access_key_id>&secret_access_key=<secret_access_key>
URI parameters:
bucket_name
: The name of the bucketpath_to_files
: The relative path from the root of the bucket. You can find this from the S3 URI. For example, if your S3 URI iss3://mybucket/students/students_details.csv
, then your bucket name ismybucket
andpath_to_files
isstudents/students_details.csv
.access_key_id
andsecret_access_key
: Used for accessing S3 bucket.
Setting up a S3 Integration
S3 requires an access_key_id
and a secret_access_key
to access the bucket. Please follow the guide on dltHub to obtain credentials. Once you've completed the guide, you should have an access_key_id
and secret_access_key
. From the S3 URI, you can extract the bucket_name
and path_to_files
For example, if your access_key_id
is AKC3YOW7E
, secret_access_key
is XCtkpL5B
, bucket name is my_bucket
, and path_to_files
is students/students_details.csv
, here's a sample command that will copy the data from the S3 bucket into a DuckDB database:
ingestr ingest --source-uri 's3://my_bucket/students/students_details.csv?access_key_id=AKC3YOW7E&secret_access_key=XCtkpL5B' --source-table 'students_details' --dest-uri duckdb:///s3.duckdb --dest-table 'dest.students_details'
The result of this command will be a table in the DuckDB database in the path s3.duckdb
.
Below are some examples of path patterns, each path pattern is a reference from the root of the bucket:
**/*.csv
: Retrieves all the CSV files, regardless of how deep they are within the folder structure.*.csv
: Retrieves all the CSV files from the first level of a folder.myFolder/**/*.jsonl
: Retrieves all the JSONL files from anywhere undermyFolder
.myFolder/mySubFolder/users.parquet
: Retrieves theusers.parquet
file frommySubFolder
.employees.jsonl
: Retrieves theemployees.jsonl
file from the root level of the bucket.