AWS Redshift
Bruin supports AWS Redshift as a data platform, which means you can use Bruin to build tables and views in your Redshift data warehouse.
Connection
In order to have set up a Redshift connection, you need to add a configuration item to connections
in the .bruin.yml
file complying with the following schema Mind that, despite the connection being at all effects a Postgres connection, the default port
field of Amazon Redshift is 5439
.
connections:
redshift:
- name: "connection_name"
username: "awsuser"
password: "XXXXXXXXXX"
host: "redshift-cluster-1.xxxxxxxxx.eu-north-1.redshift.amazonaws.com"
port: 5439
database: "dev"
ssl_mode: "allow"
NOTE
ssl_mode
should be one of the modes describe in the PostgreSQL documentation.
Making Redshift publicly accessible
Before the connection works properly, you need to ensure that the Redshift cluster can be access from the outside. In order to do that you must mark the configuration option in your redshift cluster
In addition to this, you must configure the inbound rules of the security group your redshift cluster belongs to, to accept inbound connections. In the example below we enabled access for all origins but you can set more restrictive rules for this.
If you have trouble setting this up you can check AWS documentation on the topic
AWS Redshift Assets
rs.sql
Runs a materialized AWS Redshift asset or an SQL script. For detailed parameters, you can check Definition Schema page.
Example: Create a table for product reviews
/* @bruin
name: product_reviews.table
type: rs.sql
materialization:
type: table
@bruin */
create table product_reviews (
review_id bigint identity(1,1),
product_id bigint,
user_id bigint,
rating int,
review_text varchar(500),
review_date timestamp
);
Example: Run an AWS Redshift script to clean up old data
/* @bruin
name: clean_old_data
type: rs.sql
@bruin */
begin transaction;
delete from user_activity
where activity_date < dateadd(year, -2, current_date);
delete from order_history
where order_date < dateadd(year, -5, current_date);
commit transaction;
rs.seed
rs.seed
are a special type of assets that are used to represent are CSV-files that contain data that is prepared outside of your pipeline that will be loaded into your redshift database. Bruin supports seed assets natively, allowing you to simply drop a CSV file in your pipeline and ensuring the data is loaded to the redshift database.
You can define seed assets in a file ending with .yaml
:
name: dashboard.hello
type: rs.seed
parameters:
path: seed.csv
Parameters:
path
: Thepath
parameter is the path to the CSV file that will be loaded into the data platform. path is relative to the asset definition file.
Examples: Load csv into a Redshift database
The examples below show how load a csv into a redshift database.
name: dashboard.hello
type: rs.seed
parameters:
path: seed.csv
Example CSV:
name,networking_through,position,contact_date
Y,LinkedIn,SDE,2024-01-01
B,LinkedIn,SDE 2,2024-01-01