Lambda Exit Codes
I currently work for a consulting firm as a software developer. We specialize in helping clients migrate to the cloud, AWS in particular. We also develop services to replace old ones if needed. Recently I ran into a problem with lambda that I found interesting and I believe will be interesting to other developers who work with AWS.
As part of a cdc job we are working on we use a lambda function to perform insert statements on a cloud database. In order to insert into a database the function needs credentials. It would be insecure to pass those credentials as part of the data stream that triggers the lambda function so the lambda reaches out to a secret management service and gets the required credentials. Then it builds the connection and inserts a row of data into the database and exits with a success. Sounds pretty straight forward, right?
Well actually there is a flaw here. This particular lambda function is triggered about ten times a second while the cdc process is running. That means that it sends requests to the secret management service that often. That is a problem because it is inefficient and will likely get the function locked out of the secret management service. That was the main problem. The solution is to store the credentials in memory, but how do you store something in the memory of a serverless function?
When a lambda function is first provisioned you give it memory and the cpu is allocated proportionally to that memory. In the function code I simply added some global variables and done.
Not quite. The function was still sending requests to the secret management service every time the function was triggered. That is because the way the function was written was to exit after it had completed its task. So the function was doing a cold start every time it was triggered and as I mentioned before it is triggered about ten times a second. This was no good because we were still sending requests to the secret management service too often. So, I removed the exit code. With no exit the function just does its job then lingers until AWS decides to recycle it. AWS does not give information on how long these functions idle before their resources are recycled but I found an excellent article by Yan Cui who has done some experiments. Take a look here https://read.acloud.guru/how-long-does-aws-lambda-keep-your-idle-functions-around-before-a-cold-start-bf715d3b810
After removing the exit code there were no more problems with the successful execution of our lambda function. It is triggered so often that the idle times and cold starts are a non issue. Looking at the logs for the function it seems to run without any cold starts for about 45 minutes to an hour then it has to request those credentials again. Once an hour is much better than once every tenth of a second.
I merged my changes and moved on. Problem solved. Almost.
A few weeks later I was working on changing some of the behavior of that lambda function and in testing in the sandbox account the function was throwing an exception and failing. The problem was that the function was still being triggered and after an exception it would be tossed out and a cold start would happen each time it was triggered. This means that it was again sending requests to the secret management service several times per second. The solution here was the same. Remove the exit code. So now if there is an exception it is caught, logged, and then the function moves on without exiting.
This solution works well but it led to the discovery of another problem that I will talk about in my next post.