Policies
Bruin supports policies to verify that data transformation jobs follow best practices and organisation wide conventions. In addition to built-in lint rules, Bruin also allows users to define custom lint rules using a policy.yml
file.
This document explains how to define, configure, and use custom linting policies.
NOTE
For the purpose of this document, a resource
means either an asset
or a pipeline
.
Quick Start
- Create a
policy.yml
file in your project root. - Define custom rules under
custom_rules
(optional if only using built-in rules). - Group rules into
rulesets
, specifying which resource they should apply to using selectors.
Example:
rulesets:
- name: ruleset-1
selector:
- path: .*/foo/.*
rules:
- asset-has-owner
- asset-name-is-lowercase
- asset-has-description
🚀 That's it! Bruin will now lint your assets according to these policies.
To verify that your assets satisfy your policies, you can run:
$ bruin validate /path/to/pipelines
TIP
bruin run
normally runs lint before pipeline execution. So you can rest assured that any non-compliant resources will get stopped in it's tracks.
Rulesets
A ruleset groups one or more rules together and specifies which resources they apply to, based on selectors.
Each ruleset must include:
- name: A unique name for the ruleset.
- selector (optional): One or more predicates to select the applicable resources.
- rules: List of rule names (built-in or custom) to apply.
If a selector is not specified, the ruleset applies to all resources.
NOTE
Names be must alphanumeric or use dashes (-
). This applies to both rulesets
and rules
.
Selector Predicates
Selectors determine which resources a ruleset should apply to. Supported predicates are:
Predicate | Target | Description |
---|---|---|
path | asset , pipeline | path of the asset/pipeline |
pipeline | asset , pipeline | name of the pipeline |
asset | asset | name of the asset |
tag | asset | asset tags |
Each predicate is a regex string.
INFO
If multiple selectors are specified within a ruleset, all selectors must match for the ruleset to apply
If no selectors are defined for a ruleset, the ruleset applies to all resources. Some selectors only work with certain rule targets. For instance tag
selector only works for rules that target assets. Pipeline level rules will just ignore this selector.
TIP
If your ruleset only contains asset selectors, but uses pipeline
rules, then those pipeline rules will apply to all pipelines. Make sure to define a pipeline
or path
selector if you don't intend for that to happen.
Example
rulesets:
- name: production
selector:
- path: .*/prod/.*
- tag: critical
rules:
- asset-has-owner
- asset-name-is-lowercase
In this example:
production
applies only to resources that match both:- path regex
.*/prod/.*
- and have a tag matching
critical
.
- path regex
Custom Rules
Custom lint rules are defined inside the custom_rules
section of policy.yml
.
Each rule must include:
- name: A unique name for the rule.
- description: A human-readable description of the rule.
- criteria: An expr boolean expression. If the expression evalutes to
true
then the resource passes validation.
Example
custom_rules:
- name: asset-has-owner
description: every asset should have an owner
criteria: asset.Owner != ""
Targets
Custom rules can have an optional target
attribute that defines what resource the rule acts on. Valid values are:
asset
(default)pipeline
Example
custom_rules:
- name: pipline-must-have-prefix-acme
description: Pipeline names must start with the prefix 'acme'
criteria: pipeline.Name startsWith 'acme'
target: pipeline
- name: asset-name-must-be-layer-dot-schema-dot-table
description: Asset names must be of the form {layer}.{schema}.{table}
criteria: len(split(asset.Name, '.')) == 3
target: asset # optional
ruleset:
- name: std
rules:
- pipeline-must-have-prefix-acme
- asset-name-must-be-layer-dot-schema-dot-table
Variables
criteria
has the following variables available for use in your expressions:
Name | Target |
---|---|
asset | asset |
pipeline | asset , pipeline |
WARNING
The variables exposed here are direct Go structs, therefore it is recommended to check the latest version of these given structs.
In the future we will create dedicated schemas for custom rules with standards around them.
Built-in Rules
Bruin provides a set of built-in lint rules that are ready to use without requiring a definition.
Rule | Target | Description |
---|---|---|
asset-name-is-lowercase | asset | Asset names must be in lowercase. |
asset-name-is-schema-dot-table | asset | Asset names must follow the format schema.table . |
asset-has-description | asset | Assets must have a description. |
asset-has-owner | asset | Assets must have an owner assigned. |
asset-has-columns | asset | Assets must define their columns. |
asset-has-primary-key | asset | Assets must define at least one column as a primary key. |
asset-has-checks | asset | Asset must have at least one check (column or custom_checks ). |
asset-has-tags | asset | Asset must have at least one tag. |
column-has-description | asset | All columns declared by Asset must have description. |
column-name-is-snake-case | asset | Column names must be in snake_case . |
column-name-is-camel-case | asset | Column names must be in camelCase . |
column-type-is-valid-for-platform | asset | Ensure that column types declared by asset are valid types in the relevant platform (BigQuery and Snowflake only). |
description-must-not-be-placeholder | asset | asset and column descriptions must not contain placeholder strings |
asset-has-no-cross-pipeline-dependencies | asset | Assets must not depend on assets in other pipelines. |
pipeline-has-notifications | pipeline | Pipeline must declare at least one notification channel |
pipeline-has-retries | pipeline | Pipeline must have retries > 0 |
pipeline-has-start-date | pipeline | Pipeline must have a `start_date` |
pipeline-has-metadata-push | pipeline | Pipeline must push it's metadata |
You can directly reference these rules in rulesets[*].rules
.
Full Example
custom_rules:
- name: asset-has-owner
description: every asset should have an owner
criteria: asset.Owner != ""
rulesets:
- name: production
selector:
- path: .*/production/.*
- tag: critical
rules:
- asset-has-owner
- asset-name-is-lowercase
- asset-has-description
- name: staging
selector:
- asset: stage.*
- pipeline: staging
rules:
- asset-name-is-lowercase