AWS Serverless Architecture In Practice

Five key takeaways for designing, building and deploying serverless applications in the real world

This article was published on 4/3/2017 in VentureBeat.

The term “serverless architecture” is a recent addition to the technology lexicon, coming into common use within the last year or so, following the launch of AWS Lambda in 2014. The term is both quizzical and provocative. Case in point: while explaining the concept of serverless architecture to a seasoned systems engineer recently, he literally stopped me mid-sentence—worried that I had gone insane—and asked: “You realize there is actual hardware up there in the cloud, right?” Not wanting to sound crazy, I said yes. But secretly I thought to myself: “Yet, if my team doesn’t have to worry about server failures, then for all practical purposes, hardware doesn’t exist in the cloud—it might as well be unicorn fairy dust.” And that, in a nutshell, is the appeal of serverless architecture: the ability to write code on clouds of cotton candy, without a concern for the dark dungeons of server administration.

But is the reality as sweet as the magical promise? At POP, we put this question to the test when we recently deployed an app in production utilizing a serverless architecture for one of our clients. However, before we review the results, let’s dissect what serverless architecture is.

AWS Lambda is a pure compute service that allows you to deploy a single thread of execution. With Lambda, you simply write a function (in Python 2.7, JavaScript/NodeJS 4.3 or Java 8), deploy it to AWS and get charged for the amount of memory used per second. Brilliantly simple, right? Yes, at first, but then the questions start to arise. How do you actually call a Lambda function? How do you manage return values and exceptions? Applications typically contain hundreds of functions; do you deploy all of them as Lambda functions? How should you structure a serverless app given the extreme level of deployment granularity that Lambda provides?

To help make sense of it, first of all, think of Lambda functions as nanoservices. They should be more course-grain than the code-level functions you typically use to structure your app, and you shouldn’t expose all of your internal application functions as Lambdas. However, they are more fine grained than a typical microservice, even though like a microservice, they are mini-servers in themselves that execute their own code. Behind-the-scenes, Lambda functions use containers, clustered and fully-managed by Amazon. As a result, each Lambda function is stateless and runs in an isolated process, making it possible for AWS to scale up or down based on usage.

You call Lambda functions by configuring triggers that execute your code. There are many different kinds of triggers available in AWS. For a web app, you can create an API Gateway endpoint that receives HTTPS traffic and passes it along to your Lambda function. For a file processing workflow, you can setup an S3 trigger that fires when S3 files are uploaded or removed. For data-centric workflows, you can configure DynamoDB triggers to call Lambda functions when data is inserted or updated, much like old-school SQL triggers. The variety and volume of triggers in AWS give Lambdas a great deal of flexibility for powering different types of application workflows.

For our project, the POP team opted for a serverless architecture using AWS Lambda, API Gateway, S3 and DynamoDB to power a website and chatbot application for a major entertainment company. A serverless approach was a natural fit because the app was expected to receive a significant amount of traffic during the initial few months, with traffic likely tapering off thereafter (this spikey usage pattern is very common in the games and entertainment industry as new titles are released). Relying on AWS-managed components meant that we could achieve scalability and high-availability without the cost or complexity of setting up redundant, load-balanced, auto-scaled, EC2 instances. More importantly though, it meant the client only paid for actual compute time, instead of paying for idle EC2 instances running as insurance against periodic traffic spikes.

The end result was fantastic: the deployed application ran flawlessly and we slept well at night knowing that the Amazon-managed components could withstand any of the traffic spikes the application was likely to receive. The journey to get there, on the other hand, was a bit more challenging. It became obvious very quickly that serverless architecture is still in its infancy, and the tools and documentation to support it are still maturing.

Below are five key takeaways we learned when building a serverless app in the real world.

1. Allow sufficient time for Lambda and API Gateway configuration

The first thing that you need to be aware of when building a serverless web app is that there is a lot of configuration involved with each Lambda function, including:

  • Uploading the code
  • Configuring the Lambda function
  • Creating the API endpoint (e.g. specifying which HTTP methods it listens to)
  • Setting up the IAM security role for the function
  • Configuring the behavior of the HTTP request (e.g. how the request variables are received and transformed into Lambda function arguments)
  • Configuring the behavior of the HTTP response (e.g. how return variables are sent back to the caller and transformed into an HTTP/JSON format)
  • Creating the staging endpoint
  • Deploying the API

This seems like a significant amount of configuration for each function, and it is. You will absolutely need to build this time into your project schedule. But as annoying as the amount of configuration seems, it’s important to remember that Lambda functions aren’t really functions in the classic sense. Rather, they are nanoservices, and their configuration is part of the deployment process itself. It wouldn’t seem strange to spend this amount of time configuring and deploying an API microservice. The difference here is that with Lambda, the services are more granular, which means there are more of them, which means there is more deployment overhead. Furthermore, due to the fine-grained nature of Lambda functions, developers themselves need to be hands-on with the configuration and deployment of Lambdas, even for “local” development. For developers not used to performing these types of deployment activities, it feels onerous. (But on the flipside, it makes life real easy for the IT operations.) In a sense, serverless architecture forces an extreme level of DevOps into the workflow because it so tightly couples programming with deployment.

2. Documentation is light, so be prepared to do some detective work to resolve issues

For me, the challenge was less about the number of configuration steps per se, and more about the lack of documentation for a production-ready service. When you’re frequently encountering cryptic error messages, and are faced with a huge battery of configuration options that could be causing the problem, you need lots of helpful documentation and community support to get you through it. Unfortunately, there just isn’t enough of this yet for AWS serverless development. For example, after banging my head against the wall to resolve a run-of-the-mill configuration problem for several hours, there was a single post on a Github message board, buried several pages down, that provided a critical clue, that pointed me in the right direction to eventually resolve the issue. I’m used to this type of detective work when experimenting with new technologies, but it’s nerve wracking when you’re facing a looming client deadline.

The good news is that documentation and community support will increase over time. It’s easy to forget that Lambda is only two years old. And since Lambda is not an ordinary old cloud service (it represents a radical shift in software development), it’s natural for the community to take its time adopting it. As adoption increases over the next few years, vendor and community documentation will inevitably improve, as will the service itself.

3. Find the right balance between tight cohesion and loose coupling

One of the more difficult questions we wrangled with at first was how to structure the application itself. Prior to the advent of serverless architecture, functions were assumed to exist within a larger cohesive application with shared memory and a call stack, and there are very well-established patterns for designing applications under these conditions. In contrast, there is a notable dearth of published design patterns for serverless applications. Even Lambda’s close cousin, the microservice, operates much like a traditional application internally with shared memory and a call stack, so the design patterns for microservices didn’t provide much help either. With AWS Lambda, you are literally deploying a single function, outside the context of a larger application. So how do you live without the things we take for granted every day like shared functions and common configuration settings? If we were building a microservice, then we might decide to trade process independence for DRY (Don’t Repeat Yourself) principles, and elect to tolerate redundant code. But at the nanoservice level, this doesn’t make sense. Lambda functions are much more tightly coupled, so there is too much redundant code to manage.

One option is to take nanoservices to the extreme and expose every function as a Lambda. While this radical approach might sound good late at night after a six pack of Mountain Dew, in practice it’s a nightmare. It does solve the shared code problem, but it creates a ton of configuration busy work (as every function needs to be setup and deployed in AWS) and the performance of your application will crawl to a grinding halt since every function invocation will be an out-of-process call.

At the other end of the spectrum, you can deploy a single Lambda function that acts as a controller for the entire application, using parameter values to handle the request appropriately. While this solves the shared code problem, it creates another one instead: complexity. With this approach, the controller function can quickly become bloated and unmanageable, and you could potentially reach the Lambda deployment package size limit of 50 MB. Plus, this doesn’t pass the sniff test for modern, cloud-first application design, which favors distributed solutions over monolithic apps.

For our project, we landed on the Mama Bear approach (not too hot, not too cold) and created a limited set of Lambda functions that mapped logically to how client callers would consume them. This felt like the right solution, but it still left us with the problem of dealing with shared code and configuration. Rather than address this problem through application design, however, we solved it by building out our development workflow.

4. Establish an efficient development workflow

The development workflow for a serverless app is different than your typical project. First of all, technically speaking, there is no “local” development environment, since everything runs on AWS-hosted components. In practice, however, you’ll probably run some of the bits locally to streamline development. For example, you can run a local copy of DynamoDB for database development, and you can obviously run NodeJS/Python/Java locally on your laptop. However, this is not the same thing as running a full replica of the production environment locally using a tool like Vagrant or Docker. There is a lot of plumbing that AWS API Gateway provides that you need to mock yourself if you’re doing all of our deployment locally. We ended up using a combination of local development, and doing dev/test directly in AWS.

Another idiosyncrasy of serverless development is that the process scope of each Lambda function is isolated, yet the functions exist within the logical context of a larger parent application. This means there is a need to access shared code across Lambdas but no good way to share it. As we discussed above, you can reduce the impact of this by reducing the number Lambdas to a reasonable quantity. But even after doing this, you’ll still need access to common functions and configuration settings for things like database access.

Our solution to this problem was simple but effective. First, we created a separate directory for the shared code outside the directories containing the code for each Lambda function. The shared code was updated in this parent folder. Then, we wrote a simple bash script that copied the shared code from the parent directory into each of the Lambda function directories. The script also used the AWS CLI to update the Lambda functions in AWS, deploying to the appropriate staging environment based on a command line argument. Using this approach, we were able to kill two birds with one stone: it allowed us to rapidly deploy and test our Lambda functions in AWS, and it gave us a way to update shared code in one location while ensuring it was used consistently across all Lambdas. Was this the most elegant solution ever conceived? No. Did it break every application design principle I’ve ever learned during my career as an engineer? More or less. Did it work well in practice? Absolutely. Would I use this technique again? Maybe.

The truth is that when you have a tight timeline, few published design principles to guide you, and a lot of work to complete, you don’t have the luxury of theorizing about the best solution. You just need a solution. But now that the project has shipped and we can take a step back, we are definitely keeping an eye out for better patterns and tools to help us improve our workflow. One promising development that we’re bullish on is the Serverless Framework. This was a small open source project that quickly became a semi-big-deal in the serverless world. Serverless Framework is designed to alleviate some of the AWS configuration complexity and general weirdness of serverless application development. Initially geared toward non-DevOps-y JavaScript developers, Serverless Framework holds great promise for streamlining serverless application development in general. It is also going to support other cloud vendors like Microsoft Azure in the future. We evaluated Serverless Framework at the start of the project and initially planned on using it, but it just wasn’t ready for primetime. Fortunately, the Serverless team is hard at work making a post-Beta release and we’ll be reevaluating the framework again at that time.

5. Automate serverless infrastructure with CloudFormation

Given the complexity of the configuration for an AWS serverless application, you will definitely want to automate the creation of the app’s infrastructure. It’s challenging enough to configure a serverless application in a development environment, you certainly don’t want to manually redo these steps in QA, staging and production—it’s just too easy to miss something. To solve for this, we wrote a parameterized CloudFormation template to create the full application stack, including all of the API Gateway and Lambda configurations. The CloudFormation template was called from a bash script which gave us more programmatic control over the stack creation/deletion process.

The are several points worth noting about using CloudFormation for infrastructure automation on a serverless application. First, there isn’t much documentation. If the documentation for Lambda in general is light, the documentation for Lambda + CloudFormation is even lighter. Second, be prepared for tight coupling between the application code and the infrastructure scripts. Because you’re deploying individual functions, you’ll need a CloudFormation resource for each Lambda and its corresponding API Gateway endpoint. This means that low-level changes to the application code can require changes to infrastructure scripts. As such, your developers have to be involved with updating the CloudFormation scripts as they add and edit Lambda functions. You can’t really throw this over the fence to DevOps (which is the way it should be anyway).

Third, make sure you clearly understand the security model for Lambda functions and API Gateway endpoints before you create your CloudFormation script. There are some security settings that AWS makes invisibly in the background when you create an API Gateway endpoint in the Console, and you’ll need to programmatically recreate these in your CloudFormation script. Specifically, you need to grant the API Gateway permission to invoke each Lambda function. Note here that the security principle is the API Gateway itself, and that this is in addition to the permissions granted to the IAM role associated with the Lambda function (which is its own security principle). If you don’t grant the API Gateway invocation permission, your Lambda function will fail when you call it through the API Gateway (but it will work when you call the Lambda function directly). Even stranger, if you click the edit icon next to the Lambda function name in the AWS Console on the API Gateway > Your API > Your Endpoint > Integration Request page, and then click the okay checkbox button, it will automatically create the invocation permission in the background, even if you don’t change the function name (that one was a bit of a head scratcher).

Fortunately, you can easily address this issue by including the following JSON snippet in the Resources section of the CloudFormation template. This code grants the API Gateway the necessary invocation permission. You’ll need to include this snippet once for each API Gateway endpoint that calls a Lambda function:

"APIInvokePermissionForMyLambdaFunction": {
    "Type": "AWS::Lambda::Permission",
    "Properties": {
        "FunctionName" : “MyLambdaFunction”,
        "Action": "lambda:InvokeFunction",
        "Principal": "apigateway.amazonaws.com"
    }
}

Conclusion

So is serverless architecture all roses and rainbows? Is building a serverless app like bouncing on a cloud of cotton candy without a care in the world? Sadly, it isn’t. Like any new technology, it has some warts, and the kinks need to be worked out. Granted, you don’t have to do any server administration, but you do have to perform quite a bit of service configuration instead. So the real question is: Is serverless architecture worth it?

To answer this, it’s important to remember why you would choose a serverless architecture in the first place. Its primary benefit is not about streamlining work during development, but rather reducing the burden of administration after development. The value that serverless architecture provides is its support for rapidly building scalable and highly-available applications with minimal maintenance or operational support. This is a huge benefit that cannot be overstated. In fact, this benefit is so great that in my mind it unequivocally justifies any additional time spent during development, especially if this inefficiency can be reduced over time as the team learns and adapts.

The bottom line is that by spending a little more time upfront during development, serverless architecture yields substantial operational savings over the long term. So even though serverless architecture fails to fulfill my dream of coding on clouds of cotton candy, it absolutely makes sound business sense.