philschmid blog

Deploy BigScience T0_3B to AWS & Amazon SageMaker

#AWS #Shorts #HuggingFace #Sagemaker
, October 20, 2021 · 5 min read

Photo by De an Sun on Unsplash

Earlier this week 🌸 BigScience released their first modeling paper for the collaboration introducing T0*. For those of you who haven’t heard about 🌸 BigScience it is a open collaboration of 600 researchers from 50 countries and +250 institutions creating large multilingual neural network language models and very large multilingual text datasets together using the Jean Zay (IDRIS) supercomputer.

The paper introduces a new model T0*, which is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. You can learn more about T0* on the Hugging Face model card. But in short T0* outperforms GPT-3 on many zero-shot tasks while being 16x smaller!


Image from Multitask Prompted Training Enables Zero-Shot Task Generalization

We will take advantage of this downsizing and deploy the model to AWS & Amazon SageMaker with just a few lines of code for production scale workloads.

Check out my other blog post “Scalable, Secure Hugging Face Transformer Endpoints with Amazon SageMaker, AWS Lambda, and CDK” to learn how you could create a secure public-facing T0_3B API.


If you’re not familiar with Amazon SageMaker: “Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models.” [REF]

What are we going to do:

  • Setting up the environment
  • deploy T0_3B to Amazon SageMaker
  • Run inference and test the Model


Setting up the environment

We will use an Amazon SageMaker Notebook Instance for the example. You can learn here how to set up a Notebook Instance. To get started, jump into your Jupyter Notebook or JupyterLab and create a new Notebook with the conda_pytorch_p36 kernel.

Note: The use of Jupyter is optional: We could also use a Laptop, another IDE, or a task scheduler like Airflow or AWS Step Functions when having appropriate permissions.

After that, we can install the required dependencies.

1 pip install "sagemaker>=2.48.0" --upgrade

To deploy a model on SageMaker, we need to provide an IAM role with the right permission. The get_execution_role method is provided by the SageMaker SDK as an optional convenience (only available in Notebook Instances and Studio).

1 import sagemaker
2 role = sagemaker.get_execution_role()

Deploy T0_3B to Amazon SageMaker

To deploy a T0_3B directly from the Hugging Face Model Hub to Amazon SageMaker, we need to define two environment variables when creating the HuggingFaceModel. We need to define:

  • HF_MODEL_ID: defines the model id, which will be automatically loaded from when creating or SageMaker Endpoint.
  • HF_TASK: defines the task for the used 🤗 Transformers pipeline.
1 from sagemaker.huggingface.model import HuggingFaceModel
3 # Hub Model configuration. <>
4 hub = {
5 'HF_MODEL_ID':'bigscience/T0_3B', # model_id from
6 'HF_TASK':'text2text-generation' # NLP task you want to use for predictions
7 }
9 # create Hugging Face Model Class
10 huggingface_model = HuggingFaceModel(
11 transformers_version='4.6.1',
12 pytorch_version='1.7.1',
13 py_version='py36',
14 env=hub,
15 role=role
16 )

After we create our HuggingFaceModel instance we can run .deploy() and provide our required infrastructure configuration. Since the model is pretty big we are going to use the ml.g4dn.2xlarge instance type.

1 # deploy model to SageMaker Inference
2 predictor = huggingface_model.deploy(
3 initial_instance_count=1,
4 instance_type='ml.g4dn.2xlarge'
5 )

This will start the deployment of our model and the endpoint should be up and ready for inference after a few minutes.

Run inference and test the Model

The .deploy method is returning a HuggingFacePredictor, which we can use to immediately run inference against our model after it is up and ready.

1 predictor.predict({
2 'inputs': "Is this review positive or negative? Review: Best cast iron skillet you will every buy."
3 })
4 # ✅ [{'generated_text': 'Positive'}]
6 predictor.predict({
7 'inputs': "A is the son's of B's uncle. What is the family relationship between A and B?"
8 })
9 # ✅ [{'generated_text': "B is A's cousin."}]

After we run our inference we can delete the endpoint again.

1 # delete endpoint
2 predictor.delete_endpoint()


This short blog posts how you can easily deploy and run inference on T0_3B in secure, controlled & managed environments. The Endpoint can be integrated into Applications already or you could create a public-facing API out of it by adding a AWS Lambda Wrapper. Check out my other blog post “Scalable, Secure Hugging Face Transformer Endpoints with Amazon SageMaker, AWS Lambda, and CDK” for this.

But the biggest thanks goes to the 🌸 BigScience collaboration for creating and sharing the results of their great work. I am so grateful that open-science & open-source exist and are being pushed forward.

Thanks for reading. If you have any questions, feel free to contact me, through Github, or on the forum. You can also connect with me on Twitter or LinkedIn.