Asset Definition
Assets are defined in a YAML format in the same file as the asset code. This enables the metadata to be right next to the code, reducing the friction when things change and encapsulating the relevant details in a single file. The definition includes all the details around an asset from its name to the quality checks that will be executed.
Here's an example asset definition:
/* @bruin
name: dashboard.hello_bq
type: bq.sql
depends:
- hello_python
materialization:
type: table
tags:
- dashboard
- team:xyz
columns:
- name: one
type: integer
description: "Just a number"
checks:
- name: unique
- name: not_null
- name: positive
- name: accepted_values
value: [1, 2]
@bruin */
select 1 as one
union all
select 2 as one
INFO
Bruin has an open-source Visual Studio Code extension that does syntax-highlighting for the definition syntax and more.
name
The name of the asset, used for many things including dependencies, materialiation and more. Corresponds to the schema.table
convention. Must consist of letters and dot .
character.
- Type:
String
type
The type of the asset, determines how the execution will happen. Must be one of the types here.
- Type:
String
depends
The list of assets this asset depends on. This list determines the execution order. In other words, the asset will be executed only when all of the assets in the depends
list have succeeded.
- Type:
String[]
materialization
This option determines how the asset will be materialized. Bruin knows about various materialization strategies, refer to the docs on materialization for more details.
- Type:
Object
- Keys:
Here's an example materialization definition:
materialization:
type: table
strategy: delete+insert
incremental_key: dt
partition_by: dt
cluster_by:
- dt
- user_id
The following is the underlying data structure that parses this configuration:
type Materialization struct {
Type string
Strategy string
PartitionBy string
ClusterBy []string
IncrementalKey string
}
materialization > type
The type of the materialization, can be one of the following:
table
view
Default: none
materialization > strategy
The strategy used for the materialization, can be one of the following:
create+replace
: overwrite the existing table with the new version.delete+insert
: incrementally update the table by only refreshing a certain partition.append
: only append the new data to the table, never overwrite.merge
: merge the existing records with the new records, requires a primary key to be set.
materialization > partition_by
Define the column that will be used for the partitioning of the resulting table. This is used to instruct the data warehouse to set the column for the partition key.
- Type:
String
- Default: none
materialization > cluster_by
Define the columns that will be used for the clustering of the resulting table. This is used to instruct the data warehouse to set the columns for the clustering.
- Type:
String[]
- Default:
[]
materialization > incremental_key
This is the column of the table that will be used for incremental updates of the table.
- Type:
String[]
- Default:
[]
tags
A list of tags that can be used to categorize the asset. This is useful for searching and filtering assets when running.
- Type:
String[]
- Default:
[]
columns
This is a list that contains all the columns defined with the asset, along with their quality checks and other metadata.
Here's an example column definition:
columns:
- name: one
type: integer
description: "Just a number"
checks:
- name: unique
- name: not_null
- name: positive
- name: accepted_values
value: [1, 2]
Here's the underlying data type that represents the columns:
type Column struct {
Name string `json:"name"`
Type string `json:"type"`
Description string `json:"description"`
Checks []ColumnCheck `json:"checks"`
PrimaryKey bool `json:"primary_key"`
UpdateOnMerge bool `json:"update_on_merge"`
}