In case you missed it: on March 25th we announced a collaboration with Amazon SageMaker to make it easier to create State-of-the-Art Machine Learning models, and ship cutting-edge NLP features faster.
Together with the SageMaker team, we built 🤗 Transformers optimized Deep Learning Containers to accelerate training of Transformers-based models. Thanks AWS friends!🤗 🚀
With the new HuggingFace estimator in the SageMaker Python SDK, you can start training with a single line of code.
The announcement blog post provides all the information you need to know about the integration, including a "Getting Started" example and links to documentation, examples, and features.
If you're not familiar with Amazon SageMaker: "Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models." [REF]
Tutorial
We will use the new Hugging Face DLCs and Amazon SageMaker extension to train a distributed Seq2Seq-transformer model on the summarization task using the transformers and datasets libraries, and then upload the model to huggingface.co and test it.
In this tutorial, we will use an Amazon SageMaker Notebook Instance for running our training job. You can learn here how to set up a Notebook Instance.
What are we going to do:
Set up a development environment and install sagemaker
Choose 🤗 Transformers examples/ script
Configure distributed training and hyperparameters
We are going to fine-tune facebook/bart-large-cnn on the samsum dataset. "BART is sequence-to-sequence model trained with denoising as pretraining objective." [REF]
The samsum dataset contains about 16k messenger-like conversations with summaries.
Set up a development environment and install sagemaker
After our SageMaker Notebook Instance is running we can select either Jupyer Notebook or JupyterLab and create a new Notebook with the conda_pytorch_p36 kernel.
Note:Â The use of Jupyter is optional: We could also launch SageMaker Training jobs from anywhere we have an SDK installed, connectivity to the cloud and appropriate permissions, such as a Laptop, another IDE or a task scheduler like Airflow or AWS Step Functions.
After that we can install the required dependencies
To run training on SageMaker we need to create a sagemaker Session and provide an IAM role with the right permission. This IAM role will be later attached to the TrainingJob enabling it to download data, e.g. from Amazon S3.
Choose 🤗 Transformers examples/ script
The 🤗 Transformers repository contains several examples/scripts for fine-tuning models on tasks from language-modeling to token-classification. In our case, we are using the run_summarization.py from the seq2seq/ examples.
*Note: you can use this tutorial as-is to train your model on a different examples script.*
We are going to use the transformers 4.4.2 DLC which means we need to configure the v4.4.2 as the branch to pull the compatible example scripts.
Configure distributed training and hyperparameters
Next, we will define our hyperparameters and configure our distributed training strategy. As hyperparameter, we can define any Seq2SeqTrainingArguments and the ones defined in run_summarization.py.
Since, we are using SageMaker Data Parallelism our total_batch_size will be per_device_train_batch_size * n_gpus.
Create a HuggingFace estimator and start training
The last step before training is creating a HuggingFace estimator. The Estimator handles the end-to-end Amazon SageMaker training. We define which fine-tuning script should be used as entry_point, which instance_type should be used, and which hyperparameters are passed in.
As instance_type we are using ml.p3dn.24xlarge, which contains 8x NVIDIA A100 with an instance_count of 2. This means we are going to run training on 16 GPUs and a total_batch_size of 16*4=64. We are going to train a 400 Million Parameter model with a total_batch_size of 64, which is just wow.
To start our training we call the .fit() method.
The training seconds are 2882 because they are multiplied by the number of instances. If we calculate 2882/2=1441 is it the duration from "Downloading the training image" to "Training job completed".
Converted to real money, our training on 16 NVIDIA Tesla V100-GPU for a State-of-the-Art summarization model comes down to ~28$.
Since our model achieved a pretty good score we are going to upload it to huggingface.co, create a model_card and test it with the Hosted Inference widget. To upload a model you need to create an account here.
We can download our model from Amazon S3 and unzip it using the following snippet.
Before we are going to upload our model to huggingface.co we need to create a model_card. The model_card describes the model and includes hyperparameters, results, and specifies which dataset was used for training. To create a model_card we create a README.md in our local_path
After we extract all the metrics we want to include we are going to create our README.md. Additionally to the automated generation of the results table we add the metrics manually to the metadata of our model card under model-index
After we have our unzipped model and model card located in my_bart_model we can use the either huggingface_hub SDK to create a repository and upload it to huggingface.co – or just to https://huggingface.co/new an create a new repository and upload it.
Test inference
After we uploaded our model we can access it at https://huggingface.co/{hf_username}/{repository_name}
And use the "Hosted Inference API" widget to test it.
You can find the code here. Feel free to contact us or the forum.
Thanks for reading. If you have any questions, feel free to contact me, through Github, or on the forum. You can also connect with me on Twitter or LinkedIn.