LLaMA 2 - Every Resource you need
LLaMA 2 is a large language model developed by Meta and is the successor to LLaMA 1. LLaMA 2 is available for free for research and commercial use through providers like AWS, Hugging Face, and others. LLaMA 2 pretrained models are trained on 2 trillion tokens, and have double the context length than LLaMA 1. Its fine-tuned models have been trained on over 1 million human annotations.
This blog post includes all relevant resources to help get started quickly. It includes links to:
- What is LLaMA 2?
- Playgrounds, where you can test the model
- The research behind the model
- How good the model is, benchmarks
- How to correctly prompt the chat model
- How to train the model using PEFT
- How to deploy the model for inference
- and other resources
The official announcement from Meta can be found here: https://ai.meta.com/llama/
What is LLaMa 2?
Meta released LLaMA 2, the new state-of-the-art open large language model (LLM). LLaMA 2 represents the next iteration of LLaMA and comes with a commercially-permissive license. LLaMA 2 comes in 3 different sizes - 7B, 13B, and 70B parameters. New improvements compared to the original LLaMA include:
- Trained on 2 trillion tokens of text data
- Allows commercial use
- Uses a 4096 default context window (can be expanded)
- The 70B model adopts grouped-query attention (GQA)
- Available on Hugging Face Hub
LLaMA Playgrounds, test it
There are a few different playgrounds available to test out interacting with LLaMA 2 Chat:
- HuggingChat allows you to chat with the LLaMA 2 70B model through Hugging Face's conversational interface. This provides a simple way to see the chatbot in action.
- Hugging Face Spaces has LLaMA 2 models in 7B, 13B and 70B sizes available to test. The interactive demos let you compare different model sizes.
- Perplexity has both the 7B and 13B LLaMA 2 models accessible through their conversational AI demo. You can chat with the models and provide feedback on the responses.
Research Behind LLaMA 2
LLaMA 2 is a base LLM model and pretrained on publicly available data found online. Additionally Meta released a CHAT version. The first version of the CHAT model was SFT (Supervised fine-tuned) model. After that, LLaMA-2-chat was iteratively improved through Reinforcement Learning from Human Feedback (RLHF). The RLHF process involved techniques like rejection sampling and proximal policy optimization (PPO) to further refine the chatbot. Meta only released the latest RLHF (v5) versions of the model. If you curious how the process was behind checkout:
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Llama 2: an incredible open LLM
- Llama 2: Full Breakdown
How good is LLaMA 2, benchmarks?
Meta claims that “Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests.”. You can find more insights over the performance at:
How to Prompt LLaMA 2 Chat
LLaMA 2 Chat is an open conversational model. Interacting with LLaMA 2 Chat effectively requires providing the right prompts and questions to produce coherent and useful responses. Meta didn’t choose the simplest prompt. Below is the prompt template for single-turn and multi-turn conversations. This template follows the model's training procedure, as described in the LLaMA 2 paper. You can also take a look at LLaMA 2 Prompt Template.
Single-turn
Multi-turn
How to train LLaMA 2
LLaMA 2 is openly available making it easy to fine-tune using techniques, .e.g. PEFT. There are great resources available for training your own versions of LLaMA 2:
- Extended Guide: Instruction-tune Llama 2
- Fine-tune LLaMA 2 (7-70B) on Amazon SageMaker
- Fine-tuning with PEFT
- Meta Examples and recipes for Llama model
- The EASIEST way to finetune LLAMA-v2 on local machine!
How to Deploy LLaMA 2
LLaMA 2 can be deployed in local environment (llama.cpp), using managed services like Hugging Face Inference Endpoints or through or cloud platforms like AWS, Google Cloud, and Microsoft Azure.
- Deploy LLaMa 2 Using text-generation-inference and Inference Endpoints
- Deploy LLaMA 2 70B using Amazon SageMaker
- Llama-2-13B-chat locally on your M1/M2 Mac with GPU inference
Other Sources
Let me know if you would like me to expand on any section or add additional details. I aimed to provide a high-level overview of key information related to LLaMA 2's release based on what is publicly known so far.
Thanks for reading! If you have any questions, feel free to contact me on Twitter or LinkedIn.