Combine Amazon SageMaker and DeepSpeed to fine-tune FLAN-T5 XXL
FLAN-T5, released with theScaling Instruction-Finetuned Language Modelspaper, is an enhanced version of T5 that has been fine-tuned in a mixture of tasks, or simple words, a better T5 model in any aspect. FLAN-T5 outperforms T5 by double-digit improvements for the same number of parameters. Google has open sourced 5 checkpoints available on Hugging Face ranging from 80M parameter up to 11B parameter.
In a previous blog post, we already learned how to “Fine-tune FLAN-T5 XL/XXL using DeepSpeed & Hugging Face Transformers”. In this blog post, we look into how we can integrate DeepSpeed into Amazon SageMaker to allow any practitioners to train those billion parameter size models with a simple API call. Amazon SageMaker managed training allows you to train large language models without having to manage the underlying infrastructure. You can find more information about Amazon SageMaker in the documentation.
This means we will learn how to fine-tune FLAN-T5 XL & XXL using model parallelism, multiple GPUs, and DeepSpeed ZeRO on Amazon SageMaker.
The blog post is structured as follows:
- Process dataset and upload to S3
- Prepare training script and deepspeed launcher
- Fine-tune FLAN-T5 XXL on Amazon SageMaker
before we start, let’s install the required libraries and make sure we have the correct permissions to access S3.
If you are going to use Sagemaker in a local environment. You need access to an IAM Role with the required permissions for Sagemaker. You can find here more about it.
1. Process dataset and upload to S3
Similar to the “Fine-tune FLAN-T5 XL/XXL using DeepSpeed & Hugging Face Transformers” we need to prepare a dataset to fine-tune our model. As mentioned in the beginning, we will fine-tune FLAN-T5-XXL on the CNN Dailymail Dataset. The blog post is not going into detail about the dataset generation. If you want to learn the detailed steps check out the previous post.
We define some parameters, which we use throughout the whole example, feel free to adjust it to your needs.
Compared to the previous example, we are splitting the processing and training into two separate paths. This allows you to run the preprocessing outside of the managed SageMaker Training job. We process (tokenize) the dataset and upload to s3 and pass it into our managed Training job.
We defined a prompt_template
in our config, which we will use to construct an instruct prompt for better performance of our model. Our prompt_template
has a “fixed” start and end, and our document is in the middle. This means we need to ensure that the “fixed” template parts + document are not exceeding the max length of the model. Therefore we calculate the max length of our document, which we will later use for padding and truncation
We know now that our documents can be “500” tokens long to fit our template_prompt
still correctly. In addition to our input, we need to understand better our “target” sequence length meaning and how long are the summarization ins our dataset. Therefore we iterate over the dataset and calculate the max input length (at max 500) and the max target length. (takes a few minutes)
We now have everything needed to process our dataset.
After we processed the datasets we are going to use the new FileSystem integration to upload our dataset to S3. We are using the sess.default_bucket()
, adjust this if you want to store the dataset in a different S3 bucket. We will use the S3 path later in our training script.
2. Prepare training script and deepspeed launcher
Done! The last step before we start training our is to prepare our training script and deepspeed
. We learned in the introduction that we would leverage the DeepSpeed integration with the Hugging Face Trainer. In the previous post we used the deepspeed
launcher to start our training on multiple GPUs. As of today Amazon SageMaker does not support the deepspeed
launcher. 😒
To overcome this limitation, we need to create a custom launcher ds_launcher.py. The launcher is a simple python script, which we will pass to our training script. The launcher will start the real training script with the correct environment variables and parameters. In addition, we need to create a deepspeed_config.json
to configure our training setup. In the “Fine-tune FLAN-T5 XL/XXL using DeepSpeed & Hugging Face Transformers” post we created 4 deepspeed configurations for the experiments we ran, including CPU offloading
and mixed precision
:
- ds_flan_t5_z3_config.json
- ds_flan_t5_z3_config_bf16.json
- ds_flan_t5_z3_offload.json
- ds_flan_t5_z3_offload_bf16.json
Depending on your setup, you can use those, e.g. if you are running on NVIDIA V100s, you have to use the config without bf16
since V100 are not support bfloat16
types.
When fine-tuning
T5
models we cannot usefp16
since it leads to overflow issues, see: #4586, #10830, #10956
We are going to use a p4dn.24xlarge AWS EC2 Instance including 8x NVIDIA A100 40GB. This means we can leverage bf16
, which reduces the memory footprint of the model by almost ~2x, which allows us to train without offloading efficiently.
We are going to use the ds_flan_t5_z3_config_bf16.json. If you are irritated by the auto
values, check the documentation.
3. Fine-tune FLAN-T5 XXL on Amazon SageMaker
In addition to our deepspeed_parameters
we need to define the training_hyperparameters
for our training script. The training_hyperparameters
are passed to our training_script
as CLI arguments with --key value
.
If you want to better understand which batch_size and deepspeed_config
can work which hardware setup you can check out the Results & Experiments we ran.
In order to create a sagemaker training job we need an HuggingFace
Estimator. The Estimator then creates our Amazon SageMaker training. Amazon SagMaker takes care of starting and managing our ec2 instances, provides the correct huggingface container, uploads the provided scripts and downloads the data from our S3 bucket into the container at /opt/ml/input/data.
We created our HuggingFace
estimator including the ds_launcher.py
as entry_point
and defined our deepspeed
config and training_script
in the deepspeed_parameters
, which we merged with our training_hyperparameters
. We can now start our training job, with the .fit()
method passing our S3 path to the training script.
If you want to deploy your model to a SageMaker Endpoint, you can check out the Deploy FLAN-T5 XXL on Amazon SageMaker blog.
Thanks for reading! If you have any questions, feel free to contact me on Twitter or LinkedIn.