S3 Platform
Amazon S3 (Simple Storage Service) is supported in Bruin for both data ingestion and as sensors for monitoring object availability.
S3 Sensors
S3 sensors allow you to monitor for the existence of specific objects in S3 buckets. The sensor waits for a file or object to become available before allowing downstream assets to proceed.
Connection Configuration
Add an AWS connection to your bruin.yml
file:
connections:
aws:
- name: "aws-default"
access_key: "your-access-key"
secret_key: "your-secret-key"
region: "us-east-1" # Optional - will be auto-discovered from bucket if not provided
Sensor Configuration
Create a sensor asset in your pipeline:
name: "wait_for_s3_file"
type: s3.sensor.key_sensor
connection: aws-default
parameters:
bucket_name: "my-data-bucket"
bucket_key: "path/to/expected/file.csv"
Parameters
bucket_name
(required): The name of the S3 bucket to monitorbucket_key
(required): The key/path of the object to wait for
Sensor Modes
The sensor supports different modes, controlled via the --sensor-mode
flag when running:
once
(default): Check once and fail if object doesn't existwait
: Continuously poll until object is found (24-hour timeout)skip
: Skip sensor execution entirely
Running the Sensor
Execute the sensor using the bruin run
command:
bruin run path/to/your/sensor.asset.yml --sensor-mode wait
Behavior
- If the region is not specified in the connection, it will be auto-discovered from the bucket
- In wait mode, the sensor polls every few seconds (configurable via
poke_interval
) as a parameter - Maximum timeout is 24 hours for continuous polling
- Returns error if object is not found in
once
mode
S3 for Data Ingestion
Bruin also supports S3 as a data source and destination for ingestion workflows. For comprehensive documentation on using S3 for data ingestion, including reading from and writing to S3 buckets, see the S3 Ingestion Guide.