Benchmark evaluating varying sizes of Llama 2 on a range of Amazon EC2 instance types with different load levels on latency (ms per token), and throughput (tokens per second).
In this example we will show how to fine-tune Falcon 180B using DeepSpeed, Hugging Face Transformers, LoRA with Flash Attention on a multi-GPU machine.