Google BigQuery
BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data.
ingestr supports BigQuery as both a source and destination.
URI format
The URI format for BigQuery is as follows:
bigquery://<project-name>?credentials_path=/path/to/service/account.json&location=<location>URI parameters:
project-name: the name of the project in which the dataset residescredentials_path: optional, the path to the service account JSON file. If not provided, ingestr will use Application Default Credentialscredentials_base64: optional, base64-encoded service account JSON credentialslocation: optional, the location of the dataset
Authentication
ingestr supports multiple authentication methods for BigQuery:
Explicit credentials (via
credentials_pathorcredentials_base64in URI):plaintextbigquery://my-project?credentials_path=/path/to/service-account.jsonApplication Default Credentials (recommended for local development and GCP environments):
plaintextbigquery://my-projectWhen no credentials are provided in the URI, ingestr will use the Google authentication library which automatically discovers credentials from:
- The
GOOGLE_APPLICATION_CREDENTIALSenvironment variable - User credentials set via
gcloud auth application-default login - Service account credentials when running on Google Cloud (Compute Engine, App Engine, Cloud Run, etc.)
- The
The same URI structure can be used both for sources and destinations. You can read more about SQLAlchemy's BigQuery dialect here.
Using GCS as a staging area
ingestr can use GCS as a staging area for BigQuery. To do this, you need to set the --staging-bucket flag when you are running the command.
ingestr ingest
--source-uri $SOURCE_URI
--dest-uri $BIGQUERY_URI
--source-table raw.input
--dest-table raw.output
--staging-bucket "gs://your-bucket-name"