DynamoDB
Amazon DynamoDB is a managed NoSQL database service provided by Amazon Web Services (AWS). It supports key-value and document data structures and is designed to handle a wide range of applications requiring scalability and performance.
URI format
The URI format for DynamoDB is as follows:
dynamodb://dynamodb.<region>.amazonaws.com?access_key_id=<aws_access_key_id>&secret_access_key=<aws_secret_access_key>
URI parameters:
access_key_id
: Identifes an IAM account.secret_access_key
: Password for the IAM account.
Setting up a DynamoDB integration
Prerequisites
- AWS IAM access key pair.
- A DynamoDB Table that you will to load data from
To obtain the access keys, use the IAM console on AWS. See IAM Documentation for more information.
Configuring Permissions
To use DynamoDB source, the user account must have the following IAM permissions:
dynamodb:DescribeTable
dynamodb:Scan
Following AWS Best practices, you can create an IAM policy that you can assign to the user account you wish to use with ingestr
. Below is a sample policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Statement1",
"Effect": "Allow",
"Action": [
"dynamodb:DescribeTable",
"dynamodb:Scan"
],
"Resource": [
"<TABLE_ARN>"
]
}
]
}
Replace TABLE_ARN
with the DynamoDB Amazon Resource Name of your Table. You can find the ARN for your table in the DynamoDB console. You can add as many tables you want in the Resource
field. Alternatively, if you'd like to give access to all tables that you own, you can set Resource
to ["*"]
.
Example: Simple load
For this example, we'll assume the value of access_key_id
and secret_access_key
are user
and pass
respectively.
Say you have a table called absolute-armadillo
in the region ap-south-1
and you want to load this data to a duckdb database called animal.db
.
You run the following to achieve this:
ingestr ingest \
--source-uri "dynamodb://dynamodb.ap-south-1.amazonaws.com?access_key_id=user&secret_access_key=pass" \
--source-table "absolute-armadillo" \
--dest-uri "duckdb://./animal.db" \
--dest-table "public.armadillo"
Example: Incremental load
ingestr
supports incremental loading. Incremental loading is a technique whereby only rows or fields that are changed are fetched. This reduces load times of subsequent runs and improves efficiency of your pipelines.
Assuming the same setup from Simple Load, we can run incremental load with:
ingestr ingest \
--source-uri "dynamodb://dynamodb.ap-south-1.amazonaws.com?access_key_id=user&secret_access_key=pass" \
--source-table "absolute-armadillo" \
--dest-uri "duckdb://./animal.db" \
--dest-table "public.armadillo" \
--incremental-strategy "replace" \
--incremental-key "updated_at"
Assuming that absolute-armadillo
table has a datetime field called updated_at
, whenever you run this command, only rows with value greater than MAX(updated_at)
from previous load will be fetched from DynamoDB.
WARNING
DynamoDB doesn't support indexed range scans. Whenever you run ingestr ingest
, the whole table is scanned. Although ingestr
does specify a filter critiera, DynamoDB only applies this after running the table scan.