philschmid blog

MLOps: Using the Hugging Face Hub as model registry with Amazon SageMaker

#AWS #BERT #HuggingFace #Sagemaker
, November 16, 2021 · 8 min read

Photo by Florian Wehde on Unsplash


The Hugging Face Hub is the largest collection of models, datasets, and metrics in order to democratize and advance AI for everyone 🚀. The Hugging Face Hub works as a central place where anyone can share and explore models and datasets.

In this blog post you will learn how to automatically save your model weights, logs, and artifacts to the Hugging Face Hub using Amazon SageMaker and how to deploy the model afterwards for inference. 🏎

This will allow you to use the Hugging Face Hub as the backbone of your model-versioning, -storage & -management 👔

You will be able to easily share your models inside your own private organization or with the whole Hugging Face community without heavy lifting due to build in permission and access control features.🔒



In this demo, we will use the Hugging Faces transformers and datasets library together with a custom Amazon sagemaker-sdk extension to fine-tune a pre-trained transformer for multi-class text classification. In particular, the pre-trained model will be fine-tuned using the emotion dataset. To get started, we need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on.

NOTE: You can run this demo in Sagemaker Studio, your local machine or Sagemaker Notebook Instances

Development Environment and Permissions

Note: we only install the required libraries from Hugging Face and AWS. You also need PyTorch or Tensorflow, if you haven´t it installed

1 !pip install "sagemaker>=2.69.0" "transformers==4.12.3" --upgrade
2 # using older dataset due to incompatibility of sagemaker notebook & aws-cli with > s3fs and fsspec to >= 2021.10
3 !pip install "datasets==1.13" --upgrade
1 import sagemaker
2 assert sagemaker.__version__ >= "2.69.0"


If you are going to use Sagemaker in a local environment. You need access to an IAM Role with the required permissions for Sagemaker. You can find here more about it.

1 import sagemaker
3 sess = sagemaker.Session()
4 # sagemaker session bucket -> used for uploading data, models and logs
5 # sagemaker will automatically create this bucket if it not exists
6 sagemaker_session_bucket=None
7 if sagemaker_session_bucket is None and sess is not None:
8 # set to default bucket if a bucket name is not given
9 sagemaker_session_bucket = sess.default_bucket()
11 role = sagemaker.get_execution_role()
12 sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
14 print(f"sagemaker role arn: {role}")
15 print(f"sagemaker bucket: {sess.default_bucket()}")
16 print(f"sagemaker session region: {sess.boto_region_name}")


We are using the datasets library to download and preprocess the emotion dataset. After preprocessing, the dataset will be uploaded to our sagemaker_session_bucket to be used within our training job. The emotion dataset consists of 16000 training examples, 2000 validation examples, and 2000 testing examples.


1 from datasets import load_dataset
2 from transformers import AutoTokenizer
4 # tokenizer used in preprocessing
5 tokenizer_name = 'distilbert-base-uncased'
7 # dataset used
8 dataset_name = 'emotion'
10 # s3 key prefix for the data
11 s3_prefix = 'samples/datasets/emotion'
1 # download tokenizer
2 tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
4 # tokenizer helper function
5 def tokenize(batch):
6 return tokenizer(batch['text'], padding='max_length', truncation=True)
8 # load dataset
9 train_dataset, test_dataset = load_dataset(dataset_name, split=['train', 'test'])
11 # tokenize dataset
12 train_dataset =, batched=True)
13 test_dataset =, batched=True)
15 # set format for pytorch
16 train_dataset = train_dataset.rename_column("label", "labels")
17 train_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])
18 test_dataset = test_dataset.rename_column("label", "labels")
19 test_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])

Uploading data to sagemaker_session_bucket

After we processed the datasets we are going to use the new FileSystem integration to upload our dataset to S3.

1 import botocore
2 from datasets.filesystems import S3FileSystem
4 s3 = S3FileSystem()
6 # save train_dataset to s3
7 training_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/train'
8 train_dataset.save_to_disk(training_input_path, fs=s3)
10 # save test_dataset to s3
11 test_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/test'
12 test_dataset.save_to_disk(test_input_path, fs=s3)

Creating an Estimator and start a training job

List of supported models:,transformers&sort=downloads

setting up push_to_hub for our model.

The scripts implements the push_to_hub using the Trainer and TrainingArguments. To push our model to the Hub we need to define the push_to_hub. hyperparameter and set it to True and provide out Hugging Face Token. Additionally, we can configure the repository name and saving strategy using the hub_model_id, hub_strategy.

You can find documentation to those parameters here.

We are going to provide our HF Token securely with out exposing it to the public using notebook_login from the huggingface_hub SDK. But be careful your token will still be visible insight the logs of the training job. If you run,wait=True) you will see the token in the logs. A better way of providing your HF_TOKEN to your training jobs would be using AWS Secret Manager

You can also directly find your token at

1 from huggingface_hub import notebook_login
3 notebook_login()

Now we can use the HfFolder.get_token() to dynamically load our Token from disk and use it as Hyperparameter. The script can be found in the Github repository.

1 from sagemaker.huggingface import HuggingFace
2 from huggingface_hub import HfFolder
3 import time
5 # hyperparameters, which are passed into the training job
6 hyperparameters={'epochs': 1, # number of training epochs
7 'train_batch_size': 32, # batch size for training
8 'eval_batch_size': 64, # batch size for evaluation
9 'learning_rate': 3e-5, # learning rate used during training
10 'model_id':'distilbert-base-uncased', # pre-trained model
11 'fp16': True, # Whether to use 16-bit (mixed) precision training
12 'push_to_hub': True, # Defines if we want to push the model to the hub
13 'hub_model_id': 'sagemaker-distilbert-emotion', # The model id of the model to push to the hub
14 'hub_strategy': 'every_save', # The strategy to use when pushing the model to the hub
15 'hub_token': HfFolder.get_token() # HuggingFace token to have permission to push
16 }
18 # define Training Job Name
19 job_name = f'push-to-hub-sample-{time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())}'
21 # create the Estimator
22 huggingface_estimator = HuggingFace(
23 entry_point = '', # fine-tuning script used in training jon
24 source_dir = './scripts', # directory where fine-tuning script is stored
25 instance_type = 'ml.p3.2xlarge', # instances type used for the training job
26 instance_count = 1, # the number of instances used for training
27 base_job_name = job_name, # the name of the training job
28 role = role, # Iam role used in training job to access AWS ressources, e.g. S3
29 transformers_version = '4.12', # the transformers version used in the training job
30 pytorch_version = '1.9', # the pytorch_version version used in the training job
31 py_version = 'py38', # the python version used in the training job
32 hyperparameters = hyperparameters, # the hyperparameter used for running the training job
33 )

After we defined our Hyperparameter and Estimator we can start the training job.

1 # define a data input dictonary with our uploaded s3 uris
2 data = {
3 'train': training_input_path,
4 'test': test_input_path
5 }
7 # starting the train job with our uploaded datasets as input
8 # setting wait to False to not expose the HF Token
9, wait=False)

Since we set wait=False to hide the logs we can use a waiter to see when our training job is done.

1 # adding waiter to see when training is done
2 waiter = huggingface_estimator.sagemaker_session.sagemaker_client.get_waiter('training_job_completed_or_stopped')
3 waiter.wait(

Accessing the model on

we can access the model on using the hub_model_id and our username.

1 from huggingface_hub import HfApi
3 whoami = HfApi().whoami()
4 username = whoami['name']
6 print(f"{username}/{hyperparameters['hub_model_id']}")
8 #

Deploying the model from Hugging Face to a SageMaker Endpoint

To deploy our model to Amazon SageMaker we can create a HuggingFaceModel and provide the Hub configuration (HF_MODEL_ID & HF_TASK) to deploy it. Alternatively, we can use the the hugginface_estimator to deploy our model from S3 with huggingface_estimator.deploy().

1 from sagemaker.huggingface import HuggingFaceModel
2 import sagemaker
4 role = sagemaker.get_execution_role()
5 # Hub Model configuration.
6 hub = {
7 'HF_MODEL_ID':f"{username}/{hyperparameters['hub_model_id']}",
8 'HF_TASK':'text-classification'
9 }
11 # create Hugging Face Model Class
12 huggingface_model = HuggingFaceModel(
13 transformers_version='4.12',
14 pytorch_version='1.9',
15 py_version='py38',
16 env=hub,
17 role=role,
18 )
20 # deploy model to SageMaker Inference
21 predictor = huggingface_model.deploy(
22 initial_instance_count=1, # number of instances
23 instance_type='ml.m5.xlarge' # ec2 instance type
24 )

Then, we use the returned predictor object to call the endpoint.

1 sentiment_input= {"inputs": "Winter is coming and it will be dark soon."}
3 predictor.predict(sentiment_input)

Finally, we delete the inference endpoint.

1 predictor.delete_endpoint()


With the push_to_hub integration of the Trainer API we were able to automatically push our model weights and logs based on the hub_strategy to the Hugging Face Hub. With this we benefit from automatic model versioning through the git system and build in permission and access control features.

The combination of using Amazon SageMaker with the Hugging Face Hub allows Machine Learning Teams to easily collaborate across Regions and Accounts using the private and secure Organization to manage, monitor and deploy their own models into production.

You can find the code here and feel free open a thread the forum.

Thanks for reading. If you have any questions, feel free to contact me, through Github, or on the forum. You can also connect with me on Twitter or LinkedIn.