Ingestr Assets
Ingestr is a Python package that allows you to easily move data between platforms. Bruin supports ingestr
natively as an asset type.
Using Ingestr, you can move data from:
- your production databases like:
- MSSQL
- MySQL
- Oracle
- your daily tools like:
- Notion
- Google Sheets
- Airtable
- from other platforms such as:
- Hubspot
- Salesforce
- Google Analytics
- Facebook Ads
- Google Ads
to your data warehouses:
- Google BigQuery
- Snowflake
- AWS Redshift
- Azure Synapse
- Postgres
INFO
You can read more about the capabilities of ingestr in its documentation.
Template
yaml
name: string
type: ingestr
connection: string # optional, by default uses the default connection for destination platform in pipeline.yml
parameters:
source: string # optional, used when inferring the source from connection is not enough, e.g. GCP connection + GSheets source
source_connection: string
source_table: string
destination: bigquery | snowflake | redshift | synapse
# optional
incremental_strategy: replace | append | merge | delete+insert
incremental_key: string
sql_backend: pyarrow | sqlalchemy
loader_file_format: jsonl | csv | parquet
INFO
Ingestr assets require Docker to be installed in your machine. If you are using Bruin Cloud, you don't need to worry about this.
Examples
The examples below show how to use the ingestr
asset type in your pipeline. Feel free to change them as you wish according to your needs.
Copy a table from MySQL to BigQuery
yaml
name: raw.transactions
type: ingestr
parameters:
source_connection: mysql_prod
source_table: public.transactions
destination: bigquery
Copy a table from Microsoft SQL Server to Snowflake incrementally
This example shows how to use updated_at
column to incrementally load the data from Microsoft SQL Server to Snowflake.
yaml
name: raw.transactions
type: ingestr
parameters:
source_connection: mysql_prod
source_table: dbo.transactions
destination: snowflake
incremental_strategy: append
incremental_key: updated_at
Copy data from Google Sheets to Snowflake
This example shows how to copy data from Google Sheets into your Snowflake database
yaml
name: raw.manual_orders
type: ingestr
parameters:
source: gsheets
source_connection: gcp-default
source_table: <mysheetid>.<sheetname>
destination: snowflake