Data Ingestion

Bruin has built-in data ingestion capabilities thanks to ingestr. The basic idea is simple:

you have data sources
each source may have one or more tables/streams
- e.g. for Shopify, you have customers, orders, products, each being separate tables.
you want to load these to a destination data platform

Ingestr abstracts away all of these in the concept of sources, destinations and tables.

Using Bruin, you can load data from any source into your data platforms as a regular asset.

Definition Schema

Ingestr assets are defined in a simple YAML file:

yaml

name: raw.customers
type: ingestr
parameters:
  source_connection: <source-connection-name>
  source_table: customers
  destination: bigquery

The interesting part is in the parameters list:

Effectively, this asset will run ingestr in the background and load the data to your data warehouse.

There are various combinations of sources and destinations, but below are a few examples for common scenarios.

yaml

name: raw.customers
type: ingestr
parameters:
  source_connection: my-postgres
  source_table: raw.customers
  destination: bigquery

yaml

name: raw.orders
type: ingestr
parameters:
  source_connection: my-shopify
  source_table: orders
  destination: snowflake

yaml

name: raw.topic1
type: ingestr
parameters:
  source_connection: my-kafka
  source_table: topic1
  destination: bigquery