Skip to content

S3 Platform

Amazon S3 (Simple Storage Service) is supported in Bruin for both data ingestion and as sensors for monitoring object availability.

S3 Sensors

S3 sensors allow you to monitor for the existence of specific objects in S3 buckets. The sensor waits for a file or object to become available before allowing downstream assets to proceed.

Connection Configuration

Add an AWS connection to your bruin.yml file:

yaml
connections:
  aws:
    - name: "aws-default"
      access_key: "your-access-key"
      secret_key: "your-secret-key"
      region: "us-east-1"  # Optional - will be auto-discovered from bucket if not provided

Sensor Configuration

Create a sensor asset in your pipeline:

yaml
name: "wait_for_s3_file"
type: s3.sensor.key_sensor
connection: aws-default
parameters:
  bucket_name: "my-data-bucket"
  bucket_key: "path/to/expected/file.csv"

Parameters

  • bucket_name (required): The name of the S3 bucket to monitor
  • bucket_key (required): The key/path of the object to wait for

Sensor Modes

The sensor supports different modes, controlled via the --sensor-mode flag when running:

  • once (default): Check once and fail if object doesn't exist
  • wait: Continuously poll until object is found (24-hour timeout)
  • skip: Skip sensor execution entirely

Running the Sensor

Execute the sensor using the bruin run command:

bash
bruin run path/to/your/sensor.asset.yml --sensor-mode wait

Behavior

  • If the region is not specified in the connection, it will be auto-discovered from the bucket
  • In wait mode, the sensor polls every few seconds (configurable via poke_interval) as a parameter
  • Maximum timeout is 24 hours for continuous polling
  • Returns error if object is not found in once mode

S3 for Data Ingestion

Bruin also supports S3 as a data source and destination for ingestion workflows. For comprehensive documentation on using S3 for data ingestion, including reading from and writing to S3 buckets, see the S3 Ingestion Guide.