AWS Solutions Architects hate him.
AWS launched Provisioned Concurrency for Lambda at re:Invent 2019 last week — essentially a way to keep warm Lambdas provisioned for you so you don’t experience any cold start latency in your function invocations. It also may save you money if you happen to have the ideal workload for it, as it’s priced at $0.05/hr (for 1 GB of memory) instead of the usual $0.06/hr.
This theoretical 16.67% saving is not what this article’s about though — it was only as I was exploring this new feature that I was reminded of an interesting factor of Lambda I discovered a couple of years ago.
Before I dive in, I’ll preface this with: this is an FYI, to explore some aspects of Lambda that you may be unaware of. It is not something I’ll be releasing code for. You’ll see why.
The thing I noticed with Provisioned Concurrency Lambdas was related to global work done outside of the function handler, before it’s invoked — let’s call it the init stage. For Provisioned Lambdas, this is executed in the background whenever you config your provisioned concurrency settings, and then every hour or so after that. Work done during this stage seemed to be executing at the same performance as the work done during the handler function on invocation — plus, you’re also charged for this time. That would seem unsurprising if it weren’t for the fact I was reminded of: that normal Lambda “containers”, unlike Provisioned Lambdas, actually get a performance boost when they’re in the init stage. This is presumably to aid cold starts, especially in runtimes like Java and .NET that typically have slow process start times and large class assemblies to load.
What do I mean by a perf boost? Well, we can measure it. Let’s code up an in-no-way-contrived Node.js 12.x function where we see how many PBKDF2-100k-iteration password hashes we can calculate per second. We’ll do it once outside the handler (
initHashes below will get run only when the Lambda container cold starts), and once inside the handler (
For those not aware, Lambda memory and CPU for your handler are linked (with some nuances for multi-core), so let’s play with the memory setting. If we do it at the highest setting, 3008 MB, we see there’s little difference in performance, getting just over 12 hashes/sec both during the init stage and the handler stage:
These numbers are about the same until we get down to around (exactly?) 1792 MB. The multi-core nuance I mentioned above is that above this point, instead of CPU increasing as memory does, you instead get an extra core — but as this code’s single-threaded, we didn’t see any difference.
Below this memory setting is where it gets interesting. We find the init performance stays the same, even when we get all the way down to 128MB, but the handler performance degrades in direct proportion to the memory.
128 MB init = 1792 MB performance
So essentially, we’ve established that the init stage has the same performance as a 1792 MB Lambda, even if we’re only running a 128 MB one.
Maybe you can see where I’m going with this… If we can do all of our work outside of the function handler, we get $0.105/hr (1792 MB) performance for only $0.0075/hr (128 MB)— a 14x cost saving 🎉
Hang on a minute, I hear you cry. Firstly, how are we supposed to do all of our work outside the handler, if any subsequent time we invoke that Lambda, it’s already warm and that code won’t even run? Secondly, how are we supposed to pass anything to the init stage if only the handler receives events? And finally, 1/14th is “only” a 92.86% cost saving, not the 99.93% you promised 💸
Let’s tackle that first point. There are some basic ways to ensure we always hit a cold Lambda, such as modifying any aspect of the function that would cause existing warm containers to be out-of-date. We were doing exactly that when we were fiddling with the memory settings above — each time we change that number and invoke, it’ll be fresh containers that get hit. Modifying environment variables, deploying new code, and other function config settings would achieve the same thing. The APIs to do these are probably rate-limited at a fairly strict rate though, so YMMV.
Another way to achieve this is to just exit the process in the handler. The Lambda supervisor will need to restart the process when the next invoke comes in and the init code will run again. The downside to this is that the function will always return an error.
Getting data in and out
To the second point, you basically can’t pass any events in outside of the handler. If you’re just doing some sort of fixed job that didn’t require events, then this isn’t a problem. You could try to do it via environment variables I guess, but you’d need to modify the function’s config with each invocation.
One thing you can do, though, is make HTTP calls, API calls, etc. You can read from an SQS queue, a DynamoDB table, S3, Route53, or maybe even use something crazy like Serverless Networking. (also, if you’re using Node.js, you’ll need to
spawnSync/execSync another node process to do any async work)
If you needed your Lambda to respond synchronously, you’d have to have another normal 128 MB one (or another something) in front of it. This function could post the event to an SQS queue, invoke the cold-starting Lambda, and then wait to get a response from a second SQS queue. The cold-starting function reads from the first queue and responds to the second. Pretty messy, wouldn’t recommend, but you know, we’re talking mad science here.
Alright, here’s where it gets even more far-fetched. If you actually ran the code from earlier, you may have noticed another interesting thing: the billed duration didn’t match the entire duration of work done. In fact, the init duration isn’t included in the billed duration at all. The init stage is free.
At least, up to a point. Technically you can do up to 10 seconds of work before it starts getting included in the billed duration. Also, you’ll always have to pay something for the handler execution — the minimum billed execution time is 100ms.
To illustrate, let’s first modify our code from above and do as many hashes as we can in our handler within 10 seconds on a 1792 MB Lambda. We subtract a little buffer to make sure we definitely stay under 10 secs, though it’ll be rounded up when billed.
So we calculated 122 hashes in 10 secs. Let’s say we wanted to calculate a billion hashes this way. Using 1792 MB Lambdas, this would cost us $2,390.71.
Now let’s try it outside the handler on a 128 MB Lambda. We use the container start time (as measured by
/proc/1) to accurately calculate our deadline as some time would have already been used up with the process starting, requiring Node.js modules, etc. We also exit the process so our init code will always run, as we mentioned earlier.
Here we calculated 121 hashes — one less, we needed to be a little more cautious so as not to hit the 10 second limit. Still, it was in a 128 MB Lambda and we were only billed for 100ms, 100x less than the 10 seconds we ran for.
Calculating 1 billion hashes this way would cost us $1.72 — that’s 1,390x cheaper: a saving of 99.93%
I first noticed this in Jan 2018 and I was a little worried as it wasn’t documented anywhere and I thought it may be a resource abuse vulnerability. I contacted AWS security (firstname.lastname@example.org), was told the relevant teams would be contacted to investigate, and heard no more.
Since then it’s been mentioned a number of times from different AWS people as a feature, not a bug. A little thank-you-for-using-Lambda if you will.
Obviously you shouldn’t code your app like this. It’s a proof of concept that involves lots of hoop-jumping and who knows, you may very well get a slap on the wrist from AWS if you start abusing it.
However, it is a good illustration of just how much you should leverage the init stage. Do as much work as you can outside of your handler: it’s fast and cheap. Even with Provisioned Lambdas where you don’t get any perf boost or cost saving, at least it’s work that doesn’t need to happen in your handler, which will leave them nice and responsive.
Happy hacking everyone!