Learn how to fine-tuned and deploy Mistral 7B with Hugging Face on Amazon SageMaker and leverage technique like Qlora, Flash Attention and response streaming
Benchmark evaluating varying sizes of Llama 2 on a range of Amazon EC2 instance types with different load levels on latency (ms per token), and throughput (tokens per second).
In this example we will show how to fine-tune Falcon 180B using DeepSpeed, Hugging Face Transformers, LoRA with Flash Attention on a multi-GPU machine.