philschmid blog

Mount your AWS EFS volume into AWS Lambda with the Serverless Framework

#AWS #Serverless
, August 12, 2020 · 8 min read

Photo by Jonathan Wolf on Unsplash

Introduction

Just like wireless internet has wires somewhere, serverless architectures still have servers somewhere. What ‘serverless’ really means is that as a developer you don’t have to think about those servers. You just focus on code.” - serverless.com

This focus is only possible if we make some tradeoffs. Currently, all Serverless FaaS Services like AWS Lambda, Google Cloud Functions, Azure Functions are having limits. For example, there is no real state or no endless configurable memory.

These limitations have led to serverless architectures being used more for software development and less for machine learning, especially deep learning.

A big hurdle to overcome in serverless deep learning with tools like AWS Lambda, Google Cloud Functions, Azure Functions is storage. Tensorflow and Pytorch are having a huge size and newer “State of the art” models like BERT have a size of over 300MB. So far it was only possible to use them if you used some compression techniques. You can check out two of my posts on how you could do this:

But last month AWS announced mountable storage to your serverless functions. They added support for Amazon Elastic File System (EFS), a scalable and elastic NFS file system. This allows you to mount your AWS EFS filesystem to your AWS Lambda function.

In their blog post, they explain to connect an AWS lambda function to AWS EFS. The blog post is very nice, definitely check it out.

In this post, we are going to do the same, but a bit better with using the Serverless Framework and without the manual work.

serverless-architecture

PREVIEW: I am building a CLI tool called efsync which enables you to upload automatically files (pip packages, ML models, …) to an EFS file system.

Until I finished efsync you can use AWS Datasync to upload you data to an AWS EFS file system.


What is AWS Lambda?

You are probably familiar with AWS Lambda, but to make things clear AWS Lambda is a computing service that lets you run code without managing servers. It executes your code only when required and scales automatically, from a few requests per day to thousands per second. You only pay for the compute time you consume - there is no charge when your code is not running.

AWS Lambda Logo

https://aws.amazon.com/de/lambda/features/


What is AWS EFS?

Amazon EFS is a fully-managed service that makes it easy to set up, scale, and cost-optimize file storage in the Amazon Cloud. Amazon EFS-filesystems can automatically scale from gigabytes to petabytes of data without needing to provision storage. Amazon EFS is designed to be highly durable and highly available. With Amazon EFS, there is no minimum fee or setup costs, and you pay only for what you use.


Serverless Framework

The Serverless Framework helps us develop and deploy AWS Lambda functions. It’s a CLI that offers structure, automation, and best practices right out of the box. It also allows us to focus on building sophisticated, event-driven, serverless architectures, comprised of functions and events.

Serverless Framework Logo

If you aren’t familiar or haven’t set up the Serverless Framework, take a look at this quick-start with the Serverless Framework.


Tutorial

We build an AWS Lambda function with python3.8 as runtime, which is going to import and use pip packages located on our EFS-filesystem. As an example, we use pandas and pyjokes. They could easily be replaced by Tensorflow or Pytorch.

Before we get started, make sure you have the Serverless Framework configured and an EFS-filesystem set up with the required dependencies. We are not going to cover the steps on how to install the dependencies and upload them to EFS in this blog post. You can either user AWS Datasync or start an ec2-instance connect with ssh, mount the EFS-filesystem with amazon-efs-utils, and use pip install -t to install the pip packages on efs.

We are going to do:

  • create a Python Lambda function with the Serverless Framework
  • configure the serverless.yaml and add our EFS-filesystem as mount volume
  • adjust the handler.py and import pandas and pyjokes from EFS
  • deploy & test the function

Create a Python Lambda function

First, we create our AWS Lambda function by using the Serverless CLI with the aws-python3 template.

1 serverless create --template aws-python3 --path serverless-efs

This CLI command creates a new directory containing a handler.py, .gitignore, and serverless.yaml file. The handler.py contains some basic boilerplate code.

1 import json
2
3 def hello(event, context):
4 body = {
5 "message": "Go Serverless v1.0! Your function executed successfully!",
6 "input": event
7 }
8 response = {
9 "statusCode": 200,
10 "body": json.dumps(body)
11 }
12 return response

Configure the serverless.yaml and add our EFS-filesystem as mount volume

I provide the complete serverless.yamlfor this example, but we go through all the details we need for our EFS-filesystem and leave out all standard configurations. If you want to learn more about the serverless.yaml, I suggest you check out Scaling Machine Learning from ZERO to HERO. In this article, I went through each configuration and explain the usage of them.

1 service: blog-serverless-efs
2
3 plugins:
4 - serverless-pseudo-parameters
5
6 custom:
7 efsAccessPoint: <your-efs-accesspoint>
8 LocalMountPath: <mount-directory-in-aws-lambda-function>
9 subnetsId: <subnetid-in-which-efs-is>
10 securityGroup: <any-security-group>
11
12 provider:
13 name: aws
14 runtime: python3.8
15 region: eu-central-1
16
17 package:
18 exclude:
19 - node_modules/**
20 - .vscode/**
21 - .serverless/**
22 - .pytest_cache/**
23 - __pychache__/**
24
25 functions:
26 joke:
27 handler: handler.handler
28 environment: # Service wide environment variables
29 MNT_DIR: ${self:custom.LocalMountPath}
30 vpc:
31 securityGroupIds:
32 - ${self:custom.securityGroup}
33 subnetIds:
34 - ${self:custom.subnetsId}
35 iamManagedPolicies:
36 - arn:aws:iam::aws:policy/AmazonElasticFileSystemClientReadWriteAccess
37 events:
38 - http:
39 path: joke
40 method: get
41
42 resources:
43 extensions:
44 # Name of function <joke>
45 JokeLambdaFunction:
46 Properties:
47 FileSystemConfigs:
48 - Arn: 'arn:aws:elasticfilesystem:${self:provider.region}:#{AWS::AccountId}:access-point/${self:custom.efsAccessPoint}'
49 LocalMountPath: '${self:custom.LocalMountPath}'

First, we need to install the serverless-pseudo-parameters plugin with the following command.

1 npm install serverless-pseudo-parameters

We use the serverless-pseudo-parameters plugin to get our AWS::AccountID referenced in the serverless.yaml. All custom needed variables are referenced under custom.

  • efsAccessPoint should be the value of your EFS access point. You can find it in the AWS Management Console under EFS. This one should look similar to this fsap-0a31095162dd0ca44
  • LocalMountPath is the path under which EFS is mounted in the AWS Lambda function
  • subnetsId should have the same id as the EFS-filesystem. If you started your filesystem in multiple Availability Zones you can choose the one you want.
  • securityGroup can be any security group in the AWS account. We need this to deploy our AWS Lambda function into the required subnet. We can use the default security group id. This one should look like this sg-1018g448.

We utilize Cloudformation extensions to mount the EFS-filesystem after our lambda is created. Therefore we use this little snippet. Extensions can be used to override Cloudformation Resources.

1 resources:
2 extensions:
3 # Name of function <joke>
4 JokeLambdaFunction:
5 Properties:
6 FileSystemConfigs:
7 - Arn: "arn:aws:elasticfilesystem:${self:provider.region}:#{AWS::AccountId}:access-point/${self:custom.efsAccessPoint}"
8 LocalMountPath: "${self:custom.LocalMountPath}"

Adjust the handler.py and import pandas and pyjokes from EFS

The last step before we can deploy is to adjust our handler.py and import pandas and pyjokes from EFS. In my example, I used /mnt/efs as localMountPath and installed my pip packages in lib/.

To use our dependencies from our EFS-filesystem we have to add our localMountPath path to our PYTHONPATH. Therefore we add a small try/except statement at the top of your handler.py, which appends our mnt/efs/lib to the PYTHONPATH. Lastly, we add some demo calls to show our 2 dependencies work.

1 try:
2 import sys
3 import os
4 sys.path.append(os.environ['MNT_DIR']+'/lib') # nopep8 # noqa
5 except ImportError:
6 pass
7
8 import json
9 import os
10 import pyjokes
11 from pandas import DataFrame
12
13 def handler(event, context):
14 data = {'Product': ['Desktop Computer', 'Tablet', 'iPhone', 'Laptop'],
15 'Price': [700, 250, 800, 1200]
16 }
17
18 df = DataFrame(data, columns=['Product', 'Price'])
19
20 body = {
21 "frame": df.to_dict(),
22 "joke": pyjokes.get_joke()
23 }
24
25 response = {
26 "statusCode": 200,
27 "body": json.dumps(body)
28 }
29
30 return response

Deploy & Test the function

In order to deploy the function we only have to run serverless deploy.

After this process is done we should see something like this.

serverless bash deployment

To test our Lambda function we can use Insomnia, Postman, or any other REST client. Just send a GET-Request to our created endpoint. The answer should look like this.

insomnia-request

The first request to the cold AWS Lambda function took around 8 seconds. After it is warmed up it takes around 100-150ms as you can see in the screenshot.

The best thing is, our AWS Lambda function automatically scales up if there are several incoming requests up to thousands of parallel requests without any worries.

If you rebuild this, you have to be careful that the first request could take a while.


You can find the GitHub repository with the complete code here.

Thanks for reading. If you have any questions, feel free to contact me or comment on this article. You can also connect with me on Twitter or LinkedIn.