philschmid blog

Scaling Machine Learning from ZERO to HERO

#AWS #Serverless #Pytorch
, May 08, 2020 · 13 min read

Photo by Kyle Glenn on Unsplash


The workflow for building machine learning models often ends at the evaluation stage: you have achieved an acceptable accuracy, which you can test and demonstrate in your “research environment” and “ta-da! Mission accomplished.” But this is not all! The last - most important step in a machine learning workflow is deploying your model to work in production.

A model which does not work in production is worth nothing.

A deployed model can be defined as any unit that is seamlessly integrated into a production environment, which can take in an input and return an output. But one of the main issues companies face with machine learning is finding a way to deploy these models in such environments.

around 40% of failed projects reportedly stalled in development and didn`t get deployed into production. Source

In this post, I will show you step-by-step how to deploy your own custom-trained Pytorch model with AWS Lambda and integrate it into your production environment with an API. We are going to leverage a simplified serverless computing approach at scale.

What is AWS Lambda?

AWS Lambda is a computing service that lets you run code without managing servers. It executes your code only when required and scales automatically, from a few requests per day to thousands per second. You only pay for the compute time you consume - there is no charge when your code is not running.

AWS Lambda

AWs Lambda features


This post assumes you have the Serverless Framework for deploying an AWS Lambda function installed and configured, as well as a working Docker Environment. The Serverless Framework helps us develop and deploy AWS Lambda functions. It’s a CLI that offers structure, automation, and best practices right out of the box. It also allows you to focus on building sophisticated, event-driven, serverless architectures, comprised of functions and events.

Serverless Framework

If you aren’t familiar or haven’t set up the Serverless Framework, take a look at this quick-start with the Serverless Framework.

By modifying the serverless YAML file, you can connect SQS and, say, create a deep learning pipeline, or even connect it to a chatbot via AWS Lex.


Before we get started, I’d like to give you some information about the model we are going to use. I trained a Pytorch image classifier in a google colab. If you want to know what Google Colab is, take a look here. I created a dataset for classifying car damage detection and fine-tuned a resnet50 image classifier. In this tutorial, we are using Python3.8 with Pytorch1.5.


What are we going to do:

  • create a Python Lambda function with the Serverless Framework
  • add Pytorch to the Lambda Environment
  • write a predict function to classify images
  • create a S3 bucket, which holds the model and a script to upload it
  • configure the Serverless Framework to set up API Gateway for inference

The architecture we are building will look like this.


Now let’s get started with the tutorial.

Create the AWS Lambda function

First, we create our AWS Lambda function by using the Serverless CLI with the aws-python3 template.

1 serverless create --template aws-python3 --path scale-machine-learning-w-pytorch

This CLI command will create a new directory containing a [](, .gitignore and serverless.yaml file. The contains some basic boilerplate code.

1 import json
3 def hello(event, context):
4 body = {
5 "message": "Go Serverless v1.0! Your function executed successfully!",
6 "input":event
7 }
8 response = {
9 "statusCode": 200,
10 "body": json.dumps(body)
11 }
12 return response

Add Python Requirements

Next, we are adding our Python Requirements to our AWS Lambda function. For this, we are using the Serverless plugin serverless-python-requirements. It automatically bundles dependencies from a requirements.txt and makes them available. The serverless-python-requirements plugin allows you to even bundle non-pure-Python modules. More on that here.

Installing the plugin

To install the plugin run the following command.

1 serverless plugin install -n serverless-python-requirements

This will automatically add the plugin to your project’s package.json and to the plugins section in the serverless.yml.

Adding Requirements to requirements.txt

We have to create a requirements.txt file on the root level, with all required Python packages. But you have to be careful that the deployment package size must not exceed 250MB unzipped. You can find a list of all AWS Lambda limitations here.

If we would install Pytorch with pip install torch the package would be around ~ 470 MB, which is too big to be deployed in an AWS Lambda Environment. Thus, we are adding the link to the python wheel file (.whl) directly in the requirements.txt. For a list of all PyTorch and torchvision packages consider this list.

The requirements.txt should look like this.

2 torchvision==0.6.0
3 requests_toolbelt

To make the dependencies even smaller we will employ three techniques available in the serverless-python-requirements plugin:

  • zip — Compresses the dependencies in the requirements.txt in an additional file and in the final bundle.
  • slim — Removes unneeded files and directories such as *.so, *.pyc, dist-info, etc.
  • noDeploy — Omits certain packages from deployment. We will use the standard list that excludes packages those already built into Lambda, as well as Tensorboard.

You can see the implementation of it in the section where we are “configuring our serverless.yaml” file.

Predict function

Our Lambda function actually consists of 4 functions.

  • load_model_from_s3() is for loading our model from S3 into memory creating our PyTorch model and a list called classes, which holds the predictable classes.
  • transform_image() for transforming the incoming pictures into a PyTorch Tensor.
  • get_prediction(), which uses the transformed Image as input to get a prediction.
  • detect_damage() is the main function of our Lambda environment.

Pseudo code

1 model, classes = load_model_from_s3():
3 def detect_damage(image):
5 image_tensor = transform_image(image)
7 prediction = get_prediction(image_tensor)
9 return prediction

The working program code then looks like this.

1 try:
2 import unzip_requirements
3 except ImportError:
4 pass
5 from requests_toolbelt.multipart import decoder
6 import torch
7 import torchvision
8 import torchvision.transforms as transforms
9 from PIL import Image
11 from torchvision.models import resnet50
12 from torch import nn
14 import boto3
15 import os
16 import tarfile
17 import io
18 import base64
19 import json
21 S3_BUCKET = os.environ['S3_BUCKET'] if 'S3_BUCKET' in os.environ else 'fallback-test-value'
22 MODEL_PATH = os.environ['MODEL_PATH'] if 'MODEL_PATH' in os.environ else 'fallback-test-value'
24 s3 = boto3.client('s3')
26 def load_model_from_s3():
27 try:
28 # get object from s3
29 obj = s3.get_object(Bucket=S3_BUCKET, Key=MODEL_PATH)
30 # read it in memory
31 bytestream = io.BytesIO(obj['Body'].read())
32 # unzip it
33 tar =, mode="r:gz")
34 for member in tar.getmembers():
35 if".txt"):
36 print("Classes file is :",
37 f = tar.extractfile(member)
38 classes = [classes.decode() for classes in]
39 print(classes)
40 if".pth"):
41 print("Model file is :",
42 f = tar.extractfile(member)
43 print("Loading PyTorch model")
44 # set device to cpu
45 device = torch.device('cpu')
46 # create model class
47 model = resnet50(pretrained=False)
48 model.fc = nn.Sequential(nn.Linear(2048, 512), nn.ReLU(), nn.Dropout(0.2),
49 nn.Linear(512, 10),
50 nn.LogSoftmax(dim=1))
51 # load downloaded model
52 model.load_state_dict(torch.load(io.BytesIO(, map_location=device))
53 model.eval()
54 # return classes as list and model
55 return model, classes
56 except Exception as e:
57 raise(e)
59 model, classes = load_model_from_s3()
61 def transform_image(image_bytes):
62 try:
63 transformations = transforms.Compose([
64 transforms.Resize(255),
65 transforms.CenterCrop(224),
66 transforms.ToTensor(),
67 transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
68 image =
69 return transformations(image).unsqueeze(0)
70 except Exception as e:
71 print(repr(e))
72 raise(e)
74 def get_prediction(image_bytes):
75 tensor = transform_image(image_bytes=image_bytes)
76 outputs = model.forward(tensor)
77 _, y_hat = outputs.max(1)
78 predicted_idx = y_hat.item()
79 return classes[predicted_idx]
81 def detect_damage(event, context):
82 try:
83 content_type_header = event['headers']['content-type']
84 print(event['body'])
85 body = base64.b64decode(event["body"])
87 picture = decoder.MultipartDecoder(body, content_type_header).parts[0]
88 prediction = get_prediction(image_bytes=picture.content)
90 filename = picture.headers[b'Content-Disposition'].decode().split(';')[1].split('=')[1]
91 if len(filename) < 4:
92 filename = picture.headers[b'Content-Disposition'].decode().split(';')[2].split('=')[1]
94 return {
95 "statusCode": 200,
96 "headers": {
97 'Content-Type': 'application/json',
98 'Access-Control-Allow-Origin': '*',
99 "Access-Control-Allow-Credentials": True
100 },
101 "body": json.dumps({'file': filename.replace('"', ''), 'predicted': prediction})
102 }
103 except Exception as e:
104 print(repr(e))
105 return {
106 "statusCode": 500,
107 "headers": {
108 'Content-Type': 'application/json',
109 'Access-Control-Allow-Origin': '*',
110 "Access-Control-Allow-Credentials": True
111 },
112 "body": json.dumps({"error": repr(e)})
113 }

Adding the trained model to our project

As explained earlier, I trained a car damage detection model in a colab notebook, which takes an image as input and returns whether the car depicted is 01-whole or 00-damaged. I also added some code that does all the bundling magic for you: If you run the notebook it will create a file called cardamage.tar.gz that is ready to be deployed on AWS. Keep in mind, the size of the Lambda function can be only 250MB unzipped. Thus, we cannot include our model directly into the function. Instead we need to download it from S3 with the load_model_from_s3().

For this to work, we need a S3 bucket. You can either create one using the management console or with this script.

1 aws s3api create-bucket --bucket bucket-name --region eu-central-1 --create-bucket-configuration LocationConstraint=eu-central-1

After we created the bucket we can upload our model. You can do it either manually or using the provided python script.

1 import boto3
3 def upload_model(model_path='', s3_bucket='', key_prefix='', aws_profile='default'):
4 s3 = boto3.session.Session(profile_name=aws_profile)
5 client = s3.client('s3')
6 client.upload_file(model_path, s3_bucket, key_prefix)

Configuring the serverless.yaml

The next step is to adjust the serverless.yaml and including the custom Python requirement configuration. We are going to edit four sections of the serverless.yaml, …

  • the provider section which holds our runtime and IAM permissions.
  • the custom section where we configure the serverless-python-requirements plugin.
  • the package section where we exclude folders from production.
  • the function section where we create the function and define events that invoke our Lambda function.

Have a look at the complete serverless.yaml. Don’t worry, I will explan all four sections in detail in a minute.

1 service: car-damage-pytorch
3 provider:
4 name: aws
5 runtime: python3.8
6 region: eu-central-1
7 timeout: 60
8 environment:
11 iamRoleStatements:
12 - Effect: 'Allow'
13 Action:
14 - s3:getObject
17 custom:
18 pythonRequirements:
19 dockerizePip: true
20 zip: true
21 slim: true
22 strip: false
23 noDeploy:
24 - docutils
25 - jmespath
26 - pip
27 - python-dateutil
28 - setuptools
29 - six
30 - tensorboard
31 useStaticCache: true
32 useDownloadCache: true
33 cacheLocation: './cache'
35 package:
36 individually: false
37 exclude:
38 - package.json
39 - package-log.json
40 - node_modules/**
41 - cache/**
42 - test/**
43 - __pycache__/**
44 - .pytest_cache/**
45 - model/**
47 functions:
48 detect_damage:
49 handler: handler.detect_damage
50 memorySize: 3008
51 timeout: 60
52 events:
53 - http:
54 path: detect
55 method: post
56 cors: true
58 plugins:
59 - serverless-python-requirements


In the serverless framework, we define where our function ins deployed in the provider section. We are using aws as our provider, other options include google, azure, and many more. You can find a full list of providers here.

In addition, we define our runtime, our environment variables, and the permissions our Lambda function has.

As runtime, we are using python3.8. For our function to work we need two environment variables S3_BUCKET and MODEL_PATH. S3_BUCKET contains the name of our S3 Bucket, which we created earlier. MODEL_PATH is the path to our cardamage.tar.gz file in the S3 Bucket. We are still missing the permissions to get our model from S3 into our lambda functions. The iamRoleStatementshandles the permissions for our lambda function. The permission we need to get our model from S3 is s3:getObject with the ARN (Amazon Resource Names) of our S3 bucket as a resource.

1 provider:
2 name: aws
3 runtime: python3.8
4 region: eu-central-1
5 timeout: 60
6 environment:
9 iamRoleStatements:
10 - Effect: 'Allow'
11 Action:
12 - s3:getObject


In the custom section of the serverless.yml, we can define configurations for plugins or other scripts. For more details, refer to this guide. As described earlier we are using the serverless-python-requirements to install and reduce the size of our dependencies at the same time so we can pack everything into the Lambda runtime. If you want to know how it works you can read here.

1 custom:
2 pythonRequirements:
3 dockerizePip: true
4 zip: true
5 slim: true
6 strip: false
7 noDeploy:
8 - docutils
9 - jmespath
10 - pip
11 - python-dateutil
12 - setuptools
13 - six
14 - tensorboard
15 useStaticCache: true
16 useDownloadCache: true
17 cacheLocation: './cache'


The package section can be used to exclude directories/folders from the final package. This offers more control in the packaging process. You can exclude specific folders and files, like node_modules/. For more detail take a look here.

1 package:
2 individually: false
3 exclude:
4 - package.json
5 - package-log.json
6 - node_modules/**
7 - cache/**
8 - test/**
9 - __pycache__/**
10 - .pytest_cache/**
11 - model/**


The fourth and last section - function - holds the configuration for our Lambda function. We define the allocated memory size, a timeout, and the events here. In the events section of the function, we can define a number of events, which will trigger our lambda function. For our project, we are using http which will automatically create an API Gateway pointing to our function. You can also define events for sqs, cron, s3 upload event and many more. You can find the full list here.

1 functions:
2 detect_damage:
3 handler: handler.detect_damage
4 memorySize: 3008
5 timeout: 60
6 events:
7 - http:
8 path: detect
9 method: post
10 cors: true

Deploying the function

In order to deploy the function, we create a deploy script in the package.json. To deploy our function we need to have docker up and running.

1 {
2 "name": "blog-github-actions-aws-lambda-python",
3 "description": "",
4 "version": "0.1.0",
5 "dependencies": {},
6 "scripts": {
7 "deploy": "serverless deploy"
8 },
9 "devDependencies": {
10 "serverless": "^1.67.0",
11 "serverless-python-requirements": "^5.1.0"
12 }
13 }

Afterwards, we can run yarn deploy or npm run deploy to deploy our function. This could take a while as we are creating a Python environment with docker and installing all our dependencies in it and then uploading everything to AWS.

After this process is done we should see something like this.

deployed function

Test and Outcome

To test our lambda function we can use Insomnia, Postman, or any other rest client. Just add an image of a damaged or whole car as a multipart input in the request. Let´s try it with this image.

red car

result request

As a result of our test with the red car we get01-whole, which is correct. Also, you can see the complete request took 319ms with a lambda execution time of around 250ms. To be honest this is pretty fast.

If you are going to rebuild the classifier, you have to be careful that the first request could take a while. First off, the Lambda is unzipping and installing our dependencies and then downloading the model from S3. After this is done once, the lambda needs around 250ms - 1000ms depending on the input image size for classification.

The best thing is, our classifier automatically scales up if there are several incoming requests!

You can scale up to thousands parallel request without any worries.

Thanks for reading. You can find the GitHub repository with the complete code here and the colab notebook here. If you have any questions, feel free to contact me.