Trino
Bruin supports Trino as a distributed SQL query engine.
Connection
In order to set up a Trino connection, you need to add a configuration item to connections in the .bruin.yml file.
connections:
trino:
- name: "connection_name"
username: "trino_user"
password: "XXXXXXXXXX" # Optional
host: "trino-coordinator.example.com"
port: 8080
catalog: "default" # Optional
schema: "schema_name" # OptionalTrino Assets
trino.sql
Runs a materialized Trino asset or a Trino script. For detailed parameters, you can check Definition Schema page. For information about materialization strategies, see the Materialization page.
IMPORTANT
Use a single SQL statement per trino.sql asset. Multi-statement queires are not supported by Trino.
Example: Create a table using table materialization
/* @bruin
name: hive.events.install
type: trino.sql
materialization:
type: table
@bruin */
SELECT user_id, event_name, ts
FROM hive.analytics.events
WHERE event_name = 'install'Example: Run a Trino script
/* @bruin
name: hive.events.install
type: trino.sql
@bruin */
CREATE TABLE IF NOT EXISTS hive.events.install AS
SELECT user_id, event_name, ts
FROM hive.analytics.events
WHERE event_name = 'install'trino.sensor.query
Checks if a query returns any results in Trino, runs every 30 seconds until this query returns any results.
name: string
type: string
parameters:
query: string
poke_interval: int (optional)Parameters:
query: Query you expect to return any resultspoke_interval: The interval between retries in seconds (default 30 seconds).
Example: Partitioned upstream table
Checks if the data available in upstream table for end date of the run.
name: analytics_123456789.events
type: trino.sensor.query
parameters:
query: select exists(select 1 from upstream_table where dt = '{{ end_date }}')Example: Streaming upstream table
Checks if there is any data after end timestamp, by assuming that older data is not appended to the table.
name: analytics_123456789.events
type: trino.sensor.query
parameters:
query: select exists(select 1 from upstream_table where inserted_at > '{{ end_timestamp }}')Lakehouse Support
Trino lakehouse integration is configured in Trino itself (catalog/connector settings). The Bruin Trino connection format stays the same.
Supported Lakehouse Formats
Iceberg Format
| Catalog \ Storage | S3 |
|---|---|
| Glue | |
| Nessie |
Prerequisites
- Docker and Docker Compose are installed.
- AWS credentials are available (exported) in your shell (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, optionalAWS_SESSION_TOKEN,AWS_REGION). - You have an S3 warehouse prefix, for example
s3://example-lakehouse/warehouse.
Local config file structure:
└── trino
│ └── etc
│ ├── catalog
│ │ └── analytics_catalog.properties
│ ├── config.properties
│ ├── jvm.config
│ ├── log.properties
│ └── node.properties
└── docker-compose.ymltrino/etc/node.properties:
node.environment=dev
node.id=00000000-0000-0000-0000-000000000000
node.data-dir=/data/trinonode.environmentmust be the same on all Trino nodes in a cluster.node.idmust be unique per node and stable across restarts/upgrades.node.data-dirmust be writable by Trino.
Guide: Iceberg + Glue + S3
Use this when you want AWS Glue as the Iceberg catalog. The Docker setup below is an example for local testing purposes.
docker-compose.yml:
services:
trino:
image: trinodb/trino:latest
ports:
- "8080:8080"
environment:
AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
AWS_SESSION_TOKEN: ${AWS_SESSION_TOKEN}
AWS_REGION: ${AWS_REGION}
volumes:
- ./trino/etc:/etc/trinotrino/etc/catalog/analytics_catalog.properties:
This file configures a Trino catalog to use Iceberg connector with AWS Glue as the metadata catalog and S3 as the storage location for table data.
connector.name=iceberg
iceberg.catalog.type=glue
hive.metastore.glue.region=us-east-1
hive.metastore.glue.default-warehouse-dir=s3://example-lakehouse/warehouse/
# Optional in some account setups:
# hive.metastore.glue.catalogid=<aws-account-id>
fs.native-s3.enabled=true
s3.region=us-east-1Guide: Iceberg + Nessie (In-Memory) + S3
Use this for local testing with ephemeral Nessie metadata. The Docker setup below is an example for local testing and documentation purposes.
services:
nessie:
image: ghcr.io/projectnessie/nessie:latest
container_name: nessie
ports:
- "19120:19120"
environment:
NESSIE_VERSION_STORE_TYPE: IN_MEMORY
QUARKUS_HTTP_PORT: 19120
trino:
image: trinodb/trino:latest
container_name: trino
ports:
- "8080:8080"
environment:
AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
AWS_SESSION_TOKEN: ${AWS_SESSION_TOKEN}
AWS_REGION: ${AWS_REGION}
volumes:
- ./trino/etc:/etc/trino
depends_on:
- nessienote that IN_MEMORY does not persist Nessie metadata across restarts.
trino/etc/catalog/analytics_catalog.properties:
This file tells Trino to use Iceberg tables with Nessie as the metadata catalog and S3 as the data warehouse.
connector.name=iceberg
iceberg.catalog.type=nessie
iceberg.nessie-catalog.uri=http://nessie:19120/api/v1
iceberg.nessie-catalog.ref=main
iceberg.nessie-catalog.default-warehouse-dir=s3://example-lakehouse/warehouse
fs.native-s3.enabled=true
s3.region=us-east-1iceberg.nessie-catalog.uripoints to the Nessie API,iceberg.nessie-catalog.refselects the active branch/ref, andiceberg.nessie-catalog.default-warehouse-dirsets where Iceberg data files are written in S3.
Validate With Bruin
Bruin connection config stays unchanged; point to the Trino catalog and schema you configured.
pipeline.yml:
name: trino-iceberg-smoke
default_connections:
trino: trino_lakehouse.bruin.yml:
default_environment: default
environments:
default:
connections:
trino:
- name: trino_lakehouse
host: localhost
port: 8080
username: trino_user
catalog: analytics_catalog
# Trino catalog name from /etc/trino/catalog/analytics_catalog.properties (not Iceberg catalog type like glue/nessie)
schema: analyticsassets/smoke_test.sql:
/* @bruin
name: analytics.smoke_test
type: trino.sql
@bruin */
SELECT * FROM sample_users ORDER BY id;Run:
bruin run my-pipelineTroubleshooting
Catalog analytics_catalog does not exist: ensureanalytics_catalog.propertiesis mounted under/etc/trino/catalog/.- S3 permission errors: verify AWS credentials and region in container env.
- Glue errors: verify IAM permissions and
hive.metastore.glue.default-warehouse-dir. - Nessie errors: verify
http://nessie:19120/api/v1and Nessie mode (IN_MEMORYvs persistent backend).