Published onJanuary 11, 2024Scale LLM Inference on Amazon SageMaker with Multi-Replica Endpoints#LLAMA#HuggingFace#LLM#SageMakerIn this blog post you will learn how to increase the throughput of Llama 13B on Amazon SageMaker using single instance multi-replica endpoints.Read more →
Published onNovember 14, 2023Deploy Llama 2 7B on AWS inferentia2 with Amazon SageMaker#GenerativeAI#Llama#SageMaker#InferentiaIn this blog post, you will learn how to compile and deploy Llama 2 7B on AWS Inferentia2 with Amazon SageMaker.Read more →
Published onSeptember 26, 2023Llama 2 on Amazon SageMaker a Benchmark#LLAMA#HuggingFace#LLM#SageMakerBenchmark evaluating varying sizes of Llama 2 on a range of Amazon EC2 instance types with different load levels on latency (ms per token), and throughput (tokens per second).Read more →
Published onJuly 26, 2023Extended Guide: Instruction-tune Llama 2#GenerativeAI#HuggingFace#LLM#LlamaThis blog post is an extended guide on instruction-tuning Llama 2 from Meta AIRead more →
Published onJuly 21, 2023LLaMA 2 - Every Resource you need#GenerativeAI#HuggingFace#LLM#LLaMAAll Resources for LLaMA 2, How to test, train, and deploy it.Read more →
Published onJuly 18, 2023Fine-tune LLaMA 2 (7-70B) on Amazon SageMaker#LLAMA#HuggingFace#LLM#SageMakerLearn how to train LLaMa 2 using QLoRA Hugging Face Transformers on Amazon SageMakerRead more →