Autoscaling BERT with Hugging Face Transformers, Amazon SageMaker and Terraform module
A Few weeks ago we released a Terraform module sagemaker-huggingface, which makes it super easy to deploy Hugging Face Transformers like BERT from Amazon S3 or the Hugging Face Hub to Amazon SageMake for real-time inference.
You should check out the “Deploy BERT with Hugging Face Transformers, Amazon SageMaker and Terraform module” blog post if you want to know more about Terraform and how we have built the module.
TL;DR; this module should enable companies and individuals to easily deploy Hugging Face Transformers without heavy lifting.
Since then we got a lot of feedback requests from users for new additional features. Thank you for that! BTW. if you have any feedback or feature ideas feel free to open a thread in the forum.
Below can find the currently supported features + the newly
supported features.
Features
- Deploy Hugging Face Transformers from hf.co/models to Amazon SageMaker
- Deploy Hugging Face Transformers from Amazon S3 to Amazon SageMaker
- 🆕 Deploy private Hugging Face Transformers from hf.co/models to Amazon SageMaker with a
hf_api_token
- 🆕 Add Autoscaling to your Amazon SageMaker endpoints with
autoscaling
configuration - 🆕 Deploy Asynchronous Inference Endpoints either from the hf.co/models or Amazon S3
You can find examples for all use cases in the repository of the module or in the registry. In addition to the feature updates, we also improved the naming by adding a random lower case string at the end of all resources.
Registry: https://registry.terraform.io/modules/philschmid/sagemaker-huggingface/aws/latest
Github: https://github.com/philschmid/terraform-aws-sagemaker-huggingface
Let's test some of the new features and let us deploy an Asynchronous Inference Endpoint with autoscaling to zero.
sagemaker-huggingface terraform module
How to deploy Asynchronous Endpoint with Autoscaling using the **Before we get started, make sure you have the Terraform installed and configured, as well as access to AWS Credentials to create the necessary services. [Instructions] What are we going to do:
- create a new Terraform configuration
- initialize the AWS provider and our module
- deploy our Asynchronous Endpoint
- test the endpoint
- destroy the infrastructure
If you want to learn about Asynchronous Inference you can check out my blog post: “Asynchronous Inference with Hugging Face Transformers and Amazon SageMaker”
Create a new Terraform configuration
Each Terraform configuration must be in its own directory including a main.tf
file. Our first step is to create the distilbert-terraform
directory with a main.tf
file.
Initialize the AWS provider and our module
Next, we need to open the main.tf
in a text editor and add the aws
provider as well as our module
.
Note: the snippet below assumes that you have an AWS profile default
configured with the needed permissions
When we create a new configuration — or check out an existing configuration from version control — we need to initialize the directory with terraform init
.
Initializing will download and install our AWS provider as well as the sagemaker-huggingface
module.
Deploy the Asynchronous Endpoint
To deploy/apply our configuration we run terraform apply
command. Terraform will then print out which resources are going to be created and ask us if we want to continue, which can we confirm with yes
.
Now Terraform will deploy our model to Amazon SageMaker as a real-time endpoint. This can take 2-5 minutes.
Test the endpoint
To test our deployed endpoint we can use the aws sdk in our example we are going to use the Python SageMaker SDK (sagemaker
), but you can easily switch this to use Java, Javascript, .NET, or Go SDK to invoke the Amazon SageMaker endpoint. We are going to use the sagemaker
SDK since it provides an easy-to-use AsyncPredictor object which does the heavy lifting for uploading the data to Amazon S3 for us.
For initializing our Predictor we need the name of our deployed endpoint, which we can get by inspecting the output of Terraform with terraform output
or going to the SageMaker service in the AWS Management console and our Amazon S3 bucket defined in our Terraform module.
We create a new file request.py
with the following snippet.
Make sure you have configured your credentials (and region) correctly and sagemaker
installed
Now we can execute our request.
Destroy the infrastructure
To clean up our created resources we can run terraform destroy
, which will delete all the created resources from the module.
More Examples
You find examples of how to deploy private Models and use Autoscaling in the repository of the module or in the registry.
Conclusion
The sagemaker-huggingface terraform module abstracts all the heavy lifting for deploying Transformer models to Amazon SageMaker away, which enables controlled, consistent and understandable managed deployments after concepts of IaC. This should help companies to move faster and include deployed models to Amazon SageMaker into their existing Applications and IaC definitions.
Thanks for reading! If you have any questions, feel free to contact me, through Github, or on the forum. You can also connect with me on Twitter or LinkedIn.