Learn how to make AWS Lambda work for you: Activate a Lambda Function Whenever a File is Uploaded to an S3 Bucket.
Summary: We’re going to create a trigger for a Lambda function that will be activated whenever a CSV file is uploaded to an Amazon S3 bucket. The Lambda function will then read the contents of the CSV file, perform some calculations, and print the results to the CloudWatch log.
Amazon Web Services (AWS) offers a plethora of services that seamlessly integrate with each other to provide scalable and efficient solutions for various use cases. One such powerful combination is the integration between Amazon Simple Storage Service (S3) and AWS Lambda. By utilizing this integration, we can automate tasks and execute code in response to events such as file uploads to an S3 bucket. In this blog post, we’ll explore how to trigger a Lambda function when a file is uploaded to an S3 bucket.
By the end of this tutorial, we will have accomplished the following:
- Create an IAM Role in AWS
- Create the S3 Bucket
- Create the AWS Lambda Function with the relevant S3 Triggers.
- Deploy the Python Lambda code along with the necessary libraries. We will utilize the boto3 library to interface with S3 from Lambda.
- Add layers in Lambda for external libraries we will be using, namely pandas to read the csv and perform some operations.
At the beginning of each step, I’ll give a quick rundown of what we’re aiming to achieve in that step. This way, if you’re already familiar with the task, you can skip ahead to the next part of the tutorial.
Step 1: Creating the IAM Role in AWS
TLDR: Create an IAM role with “AmazonS3ReadOnlyAccess ”and “AWSLambdaBasicExecutionRole” permissions.
- Log in to the AWS Management Console and navigate to IAM. You can use Unified Search (alt-s on windows or option-s for mac) and search for IAM.
- Go to Roles (under Access Management in the left sidebar) and click on “Create Role”.
IAM Console
3. In the role creation window, select “AWS Service” and then “Lambda” in the common use cases. Click “Next” to proceed.
Create Role
4. In the “Add Permissions” window, search for the following permissions and select them:
- AmazonS3ReadOnlyAccess
- AWSLambdaBasicExecutionRole
AmazonS3ReadOnlyAccess
AWSLambdaBasicExecutionRole
6. Leave Set a permissions boundary as is. Click “Next” to continue.
7. Enter the role name and provide a meaningful description. I’ll call it lambda-s3-trigger-role. You can check permissions and add tags if you want. Click “Create Role” to finish IAM role creation.
Create Role Confirmation
Step 2: Creating the S3 Bucket
TLDR: Create a S3 Bucket and upload a csv file to it. (Example csv file).
Now we will create an Amazon S3 bucket where the CSV files will be uploaded. We need to ensure that appropriate permissions are set on the bucket to allow the Lambda function to access objects.
- From the AWS Management Console, navigate to the Amazon S3 service.
- Click on “Create bucket” to begin creating a new S3 bucket.
- Enter a unique bucket name. Note that bucket names must be globally unique across all AWS accounts. (Technically, the name should be unique within a partition. But since most accounts will fall under the same partition, we can safely assume you will need a globally unique bucket name. AWS defines partition as a grouping of Regions and currently has three partitions:
aws
(Standard Regions),aws-cn
(China Regions), andaws-us-gov
(AWS GovCloud (US)).) - Choose the region where you want to create the bucket. Consider selecting a region close to your Lambda function’s region for optimal performance. You can leave the other bucket settings as their default values.
S3 bucket creation
Next, let’s upload a CSV file to this bucket. This will help us to test our lambda function later on.
- Navigate to the S3 bucket window.
- Click on the “Upload” button. In the Upload window, click on “Add Files”.
- Select a csv file from your local storage. I will be using this grades.csv file.
- Click “Upload” to initiate the upload process.
csv file uploaded to S3 bucket
Step 3a: Creating the Lambda Function
TLDR: Create a Lambda Function using python runtime and the previously created IAM role.
- Navigate to the Lambda > Functions from the AWS Management Console.
- Click on “Create Function” and choose the option to “Author from scratch”. Alternatively, you can use a blueprint tailored for common use cases. Although a blueprint for reading an S3 file metadata after an S3 put event exists, we’ll create our Lambda function from scratch. This will help us accommodate diverse use cases.
- Enter the Name, Runtime environment, Architecture suitable for your code. For this example, we’ll use Python 3.12.
- Change the default execution role to “use an existing role” and select the role you created earlier from the drop down menu.
Create Lambda Function
Step 3b: Adding triggers to the Lambda Function
TLDR: Add a S3 PUT event trigger for .csv files to the Lambda Function.
We need to add the trigger to invoke our lambda function when a csv file is uploaded to our S3 bucked, which will be a PUT event in this case.
- After creating the Lambda function, open it from the functions list and click on “Add Trigger” in the function overview.
Adding trigger
2. In the “Add trigger” window, choose the S3 as the source.
- For Bucket, select the bucket we created.
- For Event types, select PUT and deselect “All object create events”. We only want to trigger the lambda when a new file is uploaded. This way, we can ignore files created in other ways, like HTTP POST or COPY.
- In the Suffix field, enter “.csv” — this will make sure we only trigger the function for csv files and ignore other file uploads.
- Acknowledge the Recursive invocation warning. While it shouldn’t be an issue for this example because the Lambda function lacks write access to S3, it’s crucial to exercise caution to avoid triggering Lambda functions recursively. This means ensuring that the Lambda function’s operations do not inadvertently trigger another invocation of the same Lambda function.
Trigger Configuration
Step 4: Coding the “Easy” Lambda Function
TLDR: Code Lambda Function such that it reads in the data from the uploaded csv file and print out each row. Run a manual test with a simulated PUT event for the csv file we uploaded to S3 in step 2 (grades.csv).
After creating the Lambda function trigger, navigate to the “Code” tab. Here, you will find the default code provided by Lambda.
Lambda Function Code
In the Lambda function code source editor, write the following python code to read the contents of the CSV file. We will import the Boto3 library which is the Amazon Web Services (AWS) SDK for Python. It provides an easy-to-use interface to interact with AWS services programmatically.
import boto3
import csv
def lambda_handler(event, context):
s3 = boto3.client('s3')
# Retrieve the bucket and key information from the S3 event
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
# Read the CSV file from S3
obj = s3.get_object(Bucket=bucket, Key=key)
rows = obj['Body'].read().decode('utf-8').split('\n')
# Print each row to the CloudWatch log
for row in csv.reader(rows):
print(row)
Click Deploy to update the lambda function with the new code we just wrote.
Now let’s test our code to the test manually to ensure its functionality.
1. Click the “Test” Button which is just beside the Deploy button.
2. You will be asked to Create a New Test Event. Once inside the testing interface, opt to create a new test event. This will simulate an event that triggers our Lambda function.
3. Name Your Test Event: Assign a name to your test event for reference. A descriptive name like “s3-csv-put” is ideal for easy identification.
4. Event Template: Search and choose the “s3-put” template. This template mimics the event structure generated when an object is uploaded to an S3 bucket, which is our lambda trigger.
Setup Test Event
5. Configure the Test Event JSON: Within the JSON editor, you’ll need to replace the placeholder values with actual data relevant to your S3 bucket and file. Replace the bucket name — “example-bucket” with the name of your S3 bucket, and the object key — “test%2Fkey” with the name of the csv file you want to test (grades.csv).
Event JSON
6. Trigger the Test Event: With the test event configured, initiate the test by clicking the “Test” button once again. This action will simulate an S3 upload event and trigger the execution of your Lambda function.
Lambda will execute the function based on the test event and display the results in the “Execution Results” window. The output should look like this:
Lambda Function Output
Step 5: Setting up Layers for Lambda Function
TLDR: Add layers when your Lambda Function requires external libraries. In this case, we will be using pandas to calculate average score for each student.
Suppose we aim to utilize pandas for handling the CSV data efficiently and process the CSV data to calculate the average score of each student. Here’s a revised version of the Lambda function incorporating pandas:
import boto3
import pandas as pd
s3 = boto3.client('s3')def lambda_handler(event, context):
# Retrieve bucket name and file key from the event
bucket_name = event['Records'][0]['s3']['bucket']['name']
file_key = event['Records'][0]['s3']['object']['key']
# Download the CSV file from S3
response = s3.get_object(Bucket=bucket_name, Key=file_key)
csv_content = response['Body']
# Read CSV data using pandas
df = pd.read_csv(csv_content)
# Calculate average of numerical columns
numerical_columns = df.select_dtypes(include='number')
averages = numerical_columns.mean() print(averages.to_dict())
return {
'statusCode': 200,
'body': averages.to_dict()
}
Now, if we rerun the test case, we’ll observe an issue: an import error for pandas — “No module name ‘pandas’”. This arises because pandas is not included in the Lambda execution environment by default.
Lambda Function Import Error
To resolve this, we need to create a deployment package that includes pandas and upload it as a Lambda layer.
To attach the pandas layer to the AWS Lambda function, follow these steps:
- In the Lambda Function window, scroll down to find the “Layers” tab.
- Click on the “Add a layer” button.
3. In the “Add Layer” dialog, select AWS Layers. AWS has a publicly available layer containing pandas. You’ll can also specify a custom layer you’ve may have created or specify a layer ARN of the layer that you’ve created. We will use the public pandas layer for AWS.
4. Once selected, click “Add” to attach the layer to your Lambda function.
Add Layer
After attaching the pandas layer, click on the “Test” button to run the Lambda function. You should now see the result displaying the data available in the CSV file, processed using the pandas library. This confirms that the Lambda function can now successfully import pandas for data manipulation and analysis.
No more Import Errors
Step 6: Testing the triggers
TLDR: Upload a csv file to s3 and check the lambda invocation result in CloudWatch logs.
For the final evaluation, we’ll upload a new CSV file to S3 and verify if our Lambda function triggers accurately. Rename the file to “grades_1.csv” and upload it to the designated S3 bucket.
Once uploaded, the Lambda function configured will automatically trigger, processing the CSV data based on its implemented logic. You can then inspect the output data in the Lambda function’s CloudWatch Logs.
To view the processed data in the CloudWatch Logs:
- Navigate to the AWS Lambda console.
- Locate and select the Lambda function set up for processing CSV files.
- In the Lambda function’s details page, navigate to the “Monitoring” tab.
- Click on the “View logs in CloudWatch” button.
Lambda console
6. In the CloudWatch Logs console, find and select the log stream corresponding to the Lambda function’s execution.
CloudWatch Logs console
7. Review the log entries to observe the processed data from the CSV file. The log entries will display outputs generated by the Lambda function’s execution, such as calculated averages.
Averages printed in logs
By reviewing the log entries, we can ensure that the Lambda function executed successfully and processed the CSV file as expected.