Deploy open LLMs with Terraform and Amazon SageMaker
Deploying open LLMs into production environments can often be a complex process that requires coordination between data scientists, machine learning engineers, and DevOps teams. Traditionally, data scientists or ML engineers focus on model development, while the deployment and are not always responsible for—or experienced in —deploying LLMs to production. This is where Infrastructure as Code (IaC) tools like Terraform come into play.
The Importance of Infrastructure as Code
“Infrastructure as Code (IaC) is the managing and provisioning of infrastructure through code instead of through manual processes. With IaC, configuration files are created that contain your infrastructure specifications, which makes it easier to edit and distribute configurations. It also ensures that you provision the same environment every time.” - Red Hat
IaC ensures:
- Consistency: By defining infrastructure in code, we ensure that every deployment is identical, eliminating the "it works on my machine" problem.
- Version Control: Infrastructure configurations can be versioned, allowing for easy rollbacks and collaborative development.
- Scalability: IaC makes it simple to replicate environments for testing or scaling purposes.
- Automation: Deployments can be automated, reducing human error and speeding up the process.
Terraform LLM SageMaker Module
The Terraform LLM SageMaker Module simplifies the process of deploying open LLMs from Hugging Face to Amazon SageMaker real-time endpoints.
It handles the creation of all necessary resources, including:
- IAM roles (if not provided)
- SageMaker Model
- SageMaker Endpoint Configuration
- SageMaker Endpoint
- Autoscaling
With this module, you can easily deploy popular models like Llama 3, Mistral, Mixtral, and Command from Hugging Face to Amazon SageMaker.
Deploy Llama 3 with Terraform
Before we get started, make sure you have the Terraform installed and configured, as well as access to AWS Credentials to create the necessary services.
Create a new Terraform configuration
Each Terraform configuration must be in its own directory including a main.tf
file. Our first step is to create the llama-terraform
directory with a main.tf
file.
This configuration will deploy the Llama 3 model to a SageMaker endpoint, handling all the necessary setup behind the scenes.
Initialize the AWS provider and our module
Next, we open the main.tf
in a text editor and add the aws
provider as well as our module
.
Note: the snippet below assumes that you have an AWS profile default
configured with the needed permissions
Note: Make sure to replace the YOUR_HF_TOKEN_WITH_ACCESS_TO_THE_MODEL
with a valid Hugging Face Token that has access to Llama 3.1.
When we create a new configuration — or check out an existing configuration from version control — we need to initialize the directory with terraform init
. Initializing will download and install our AWS provider as well as the sagemaker-llm
module.
Deploy the Llama 3.1 8B instruct model
To deploy/apply our configuration we run terraform apply
command. Terraform will then print out which resources are going to be created and ask us if we want to continue, which can we confirm with yes
.
Now Terraform will deploy our model to Amazon SageMaker as a real-time endpoint. This can take 5-10 minutes.
Test the endpoint and run inference
To test our deployed endpoint we can use the aws sdk in our example we are going to use the Python SDK (boto3
), but you can easily switch this to use Java, Javascript, .NET, or Go SDK to invoke the Amazon SageMaker endpoint.
To be able to invoke our endpoint we need the endpoint name. You can get the endpoint name by inspecting the output of Terraform with terraform output endpoint_name
or going to the SageMaker service in the AWS Management console.
We create a new file request.py
with the following snippet.
Make sure you have configured your credentials (and region) correctly
Now we can execute our request.
Destroy the infrastructure
To clean up our created resources we can run terraform destroy
, which will delete all the created resources from the module.
Conclusion
The llm-sagemaker terraform module abstracts away the heavy lifting for deploying open LLMs to Amazon SageMaker away, which enables controlled, consistent and understandable managed deployments after concepts of IaC. This should help companies to move faster and include deployed models to Amazon SageMaker into their existing Applications and IaC definitions.
Give it a try and tell me know what you think about the module. Its still a very basic module. If you have feature requests please open an issue.
Thanks for reading! If you have any questions or feedback, please let me know on Twitter or LinkedIn.