In my previous article, I showcased how we can use command-line interface CLI to load JSON data from a local machine into DynamoDB. The use of a command-line application is good for ad-hoc load use cases.
In this article, I will show you how to build an automated ingestion pipeline using the same core application with the help of AWS Lambda Functions and the S3 bucket. This setup takes less than 10 minutes to complete and run automatically.
Solution Design
The high-level illustration of the implementation presented below captures accurately and succinctly what we are aiming to complete by this solution.
Simply put, a drop of JSON document in the S3 bucket will trigger the invocation of the lambda function, a NodeJs application. This NodeJs application in turn will read the event’s data i.e. bucket name/file name and call S3 API to get the JSON document from the bucket. Next, the lambda will prepare the batch request for DynamoDB insert and call the DynamoDB API to run the BulkInsert operation.
Note: Lambda Function and DynamoDB must be in the same AWS Region
Implementation Steps
Set up an IAM Role
The lambda function needs access to the S3 bucket, Target DynamoDB Table and the AWS CloudWatch. Create an IAM Role and apply the relevant policies that provide access to these 3 services. Sample Role and attached policies can be seen in the following screenshot:
Create an S3 bucket to drop JSON document into. This S3 bucket will be the source of the ingestion pipeline and also acts as a trigger for the lambda function invocation.
Log in to the AWS console and create an S3 bucket. There is no restriction on the name of other settings needed for this implementation. We can also use an existing S3 bucket for this implementation. However, a dedicated S3 bucket is a better idea to maintain the coherency and independence of the application.
Create a DynamoDB table in an AWS region of your choice. However, please note to create the lambda function in the same region where your DynamoDB is.
Note: Make sure to choose the ‘Partion Key’ (primary key) name that exists in your JSON data
Make a note of the newly created DynamoDB table. This will be used in the lambda function call to Dynamo.
Create a Lambda Function with NodeJS runtime environment. AWS generates a default ‘index.js’ file with a handler function that is invoked by the lambda when triggered.
We will take the implementation from our CLI Version of this solution and paste it in the index.js file.
Assign the Role created in first step to this function and create the function.
Next, we will perform the following 2 steps:
Step 1: Update the index.js
Use the following gist document and update the default index.js created by the lambda. You will also have to update the Region and Table name of the target DyanmoDB table.
Step 2: Add an S3 trigger on the lambda function
Use the Add Trigger button to add a new trigger on the lambda function
This will open a dialogue box like shown below to configure a trigger of your choice. We will create an S3 type of trigger and configure it to be notified on the drop of a JSON file.
Once completed the lambda outline will show you the trigger added on the lambda function.
Invocation and Monitoring of the lambda function is completely automated with S3 trigger and CloudWatch monitoring.
To trigger the invocation and load the JSON data, drop a JSON file in your S3 bucket.
To Monitor, the invocation, click on the Monitoring tab and access the cloudwatch logs. This will open up cloudwatch in another browser tab where you can see detailed logs of each instance of lambda run.
Summary
AWS lambdas are one of the most powerful and efficient ways to automate multiple use cases. Here we took an ad-hoc load script and added automation that makes it a whole lot more manageable and efficient.
Next, we will explore the AWS Serverless Application Model (SAM) to further automate complete infrastructure provisioning and deployment from the command-line interface.