Unraveling Recursive Loops in AWS Lambda
The Next Station board game series begins with a London edition, and the latest addition features Tokyo. This game tasks players with redesigning the city's underground subway system. The goal is to optimize connections, serve as many attractions as possible, and make efficient use of the tunnels under the city. One of the fundamental rule of the game is that players cannot circle back to the starting station – in other words, loop is not permitted. For more information, visit: boardgamegeek
Last year, I awoke to a startling email from AWS billing service. My personal AWS account had amassed charges amounting to $4,000. Initially, I dismissed it as a bizarre dream, a shepherd's nightmare, if you will. But reality struck—it was a genuine email. The AWS billing page echoed the same staggering figure.
My AWS account is typically a sandbox for experimentation, with monthly expenses hovering around $100. So, what triggered this financial avalanche?
A closer inspection of the bill pinpointed an unexpected surge in costs associated with AWS S3 (Simple Storage Service) and AWS Lambda. It dawned on me—the culprit was my recent Lambda experiment. In an effort to automate object processing in a bucket, adding a watermark and re-storing, I had overlooked the need to dismantle my setup. A simple oversight, yet it nearly led to my financial undoing.
Upon further investigation, the root of the problem became clear: an infinite loop. An S3 trigger was continuously invoking the Lambda function, which, in turn, saved the modified object back into the same bucket—a relentless cycle akin to an endless subway line with no stations.
The purpose of this post is to illuminate the perilous potential of infinite loops within Lambda functions and to share strategies for averting, detecting, and halting such cycles before they spiral out of control.
Invoking Lambda and recursive loops
AWS Lambda function is an event-driven compute service that runs the code when certain events occur. You can invoke Lambda functions directly using the Lambda console, a function URL HTTP(S) endpoint, the Lambda API, an AWS SDK, the AWS Command Line Interface (AWS CLI), and AWS toolkits. You can also configure other AWS services to invoke your function in response to events or external requests, or on a schedule. For another AWS service to invoke your function directly, you need to create a trigger for your Lambda.
So, a trigger is a resource you configure to allow another AWS service to invoke your function when certain events or conditions occur. Your function can have multiple triggers. Each trigger acts as a client invoking your function independently, and each event that Lambda passes to your function has data from only one trigger. You can find a list of AWS services that can be used as Lambda triggers here: Lambda invokers
For example, when an item is added to an Amazon SQS queue or Amazon Simple Notification Service (Amazon SNS) topic. Lambda passes events to your function as JSON objects, which contain information about the change in the system state. When an event causes your function to run, this is called an invocation.
An AWS Lambda recursive loop is an error that occurs when a Lambda function inadvertently calls itself repeatedly without an exit condition, leading to a potentially endless cycle of function invocations.
Let's looking back to my infinitive subway lines with no station. Lambda recursive loop created by wrongly invoking the lambda function using S3 object put which then invokes the same function. This invocation causes the function to write another object into the same bucket, which in turn invokes the function again.
Another example would be an Amazon SQS queue invoking your Lambda function. Your Lambda function would then send the processed event back to the same Amazon SQS queue, which would in turn invoke your function again.
AWS built-in recursive loop detection method
In Jul 13, 2023 AWS introduced a new feature to detect and stop recursive loops in lambda functions for certain supported AWS services and SDKs.
Lambda uses an AWS X-Ray trace header primitive called Lineage
to track the number of times a function has been invoked with an event. (You do not need to configure active X-Ray tracing for this feature to work.) If your function is invoked more than 16 times in the same chain of requests, then Lambda automatically stops the next function invocation in that request chain and notifies you. If your function is configured with multiple triggers, then invocations from other triggers aren't affected.
At the time of writing this blog post, the feature detects recurisve loops between Lambda function and Amazon SQS, and Amazon SNS. Also it detects loops in Lambda functions, which may invoke each other synchronously or asynchronously.
Also the featute only works if the lambda function code using one of the below SDKs:
- Node.js
- Python
- Java 8, Java 11 and Java 17
- NET
- Ruby
If your design intentionally uses recursive patterns, then you can raise a AWS support request to turn off Lambda recursive loop detection.
Find more information here: Lambda recursive loop detection
Custom recursive loop detection methods
As we can observe, there are still some services that could potentially trigger an infinite loop, which AWS has yet to provide detection support for. The problem I faced falls into one of these scenarios.
In the scenario where a Lambda function is triggered by an S3 put event, it writes an object back to the S3 bucket. This action, in turn, triggers the same Lambda function again, creating a recursive loop of invocations and writings.
Below, I have gathered some possible approaches to prevent recursive loops when using AWS Lambda and S3:
Vary source and destination locations
In my endless subway project, I determined that the most effective way to prevent this issue was by using two separate S3 buckets: one for reading the original objects and another for storing the watermarked objects. By directing the output objects to a different bucket, I eliminated the risk of triggering additional events from the source bucket, thereby avoiding the recursive loop.
There are some scenarios that the object should write in the source bucket. The next solution describe some of them. (Ref: AWS blog)
Event source filtering
Use S3 event source filtering to trigger the Lambda function only for objects that match certain prefix and/or suffix criteria. You can use naming convention to set a specific naming convention for files that should trigger the Lambda function. When the Lambda function processes a file, it can rename the file or move it to a different folder that does not trigger the function.
Check the below function trigger configuration:
Add metadata tagging
The other solutin is to use metadata tagging. Add specific metadata to the S3 object after processing it. The Lambda function can check this metadata before processing to determine if the object has already been processed.
Here is a sample Python code that can be used in a function to check object metadata:
import boto3
import json
s3 = boto3.client('s3')
def lambda_handler(event, context):
for record in event['Records']
bucket_name = record['s3']['bucket']['name']
object_key = record['s3']['object']['key']
# Get the object and its metadata from S3
response = s3.get_object(Bucket=bucket_name, Key=object_key)
# Check for the 'original' metadata
metadata = response.get('Metadata', {})
if metadata.get('original') == 'true':
file_content = response['Body'].read().decode('utf-8')
# Other functions
return {
'statusCode': 200,
'body': json.dumps('File processed successfully!')
}
In this solution, the Lambda function is always invoked twice for each uploaded S3 object.
The blog post also suggested another solution using an AWS DynamoDB to store item keys, and then DynamoDB stream triggers another Lambda function to process objects.It writes the object back to the same source bucket. Because the same item is put to the DynamoDB table, this does not trigger a new DynamoDB stream event.
Furthermore, building on the above solutions to prevent recursive loops in Lambda functions, I recommend some preventive measures.
It is evident that programming mistakes or improper use of the service might lead to the Lambda function being triggered endlessly.
Using an AWS SQS as a middleware
Since AWS Lambda's built-in recursive loop supports invocation through AWS SQS, we can utilize AWS SQS to trigger a Lambda function, which in turn reads an object from S3. This approach effectively prevents unintentional recursive loops.
Lambda function permissions
Restrict your Lambda function’s permissions so it can only read from or write to specific S3 paths. This can prevent it from accidentally processing unintended files.
for example:
"arn:aws:s3:::${SourceBucket}/${SourcePrefix}/*"
"arn:aws:s3:::${SourceBucket}/${SourcePrefix}"
You can find more scenarios and details about possible recursive loops here: Recursive patterns that cause run-away Lambda functions
And finally, if you are curious about my billing saga, I should mention that I reached out to AWS support. The team meticulously reviewed my case, and since my account wasn't a business one, they graciously absolved me of the debt.