Llama 2 minimum requirements. Made possible thanks to the llama.

Deploying Mistral/Llama 2 or other LLMs. 13B requires a 10GB card. The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. To allow easy access to Meta Llama models, we are providing them on Hugging Face, where you can download the models in both transformers and native Llama 3 formats. The 7B, 13B and 70B base and instruct models have also been trained with fill-in-the-middle (FIM) capability, allowing them to Saved searches Use saved searches to filter your results more quickly Jul 18, 2023 · Aug 27, 2023. ai/download and download the Ollama CLI for MacOS. We are releasing a 7B and 3B model trained on 1T tokens, as well as the preview of a 13B model trained on 600B tokens. RAM: Minimum 16GB for Llama 3 8B, 64GB or more for Llama 3 70B. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety Jul 21, 2023 · @HamidShojanazeri is it possible to use the Llama2 base model architecture and train the model with any one non-english language?. To support this, we encountered a spectrum of issues, spanning from minor runtime errors to intricate performance-related challenges. The latest release of Intel Extension for PyTorch (v2. Large Language Models (LLMs): Trained using massive datasets and models with a large number of parameters (e. Hardware requirements. 65B/70B requires a 48GB card, or 2 x 24GB. whl file in there. 13b models generally require at least 16GB of RAM; If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory. Disk Space: Llama 3 8B is around 4GB, while Llama 3 70B exceeds 20GB. 24xlarge node. Reduce the `batch_size`. Then enter in command prompt: pip install quant_cuda-0. Like from the scratch using Llama base model architecture but with my non-english language data? not with the data which Llama was trained on. Next, you need to preprocess the data to ensure it’s in the correct format. See our careers page. Once downloaded, you'll have the model downloaded into the . Apr 15, 2024 · Fine-Tuning memory requirements: In the case of full fine-tuning with the regular 8bit Adam optimizer using a half-precision model (2 bytes/param), we need to allocate per parameter: 2 bytes for Oct 25, 2023 · VRAM = 1323. of GPUs used GPU memory consumed Platform Llama 2-7B- Mar 21, 2023 · With the optimizers of bitsandbytes (like 8 bit AdamW), you would need 2 bytes per parameter, or 14 GB of GPU memory. cpp" that can run Meta's new GPT-3-class AI large language model, LLaMA, locally on a Mac laptop. The most recent copy of this policy can be Sep 27, 2023 · Quantization to mixed-precision is intuitive. To download the weights, visit the meta-llama repo containing the model you’d like to use. According to this article a 176B param bloom model takes 5760 GBs of GPU memory takes ~32GB of memory per 1B parameters and I'm seeing mentions using 8x A100s for fine tuning Llama 2, which is nearly 10x what I'd expect based on the rule of Jun 7, 2023 · OpenLLaMA: An Open Reproduction of LLaMA. The performance of an WizardLM model depends heavily on the hardware it's running on. We would like to show you a description here but the site won’t allow us. A fenced pasture large enough for llamas to move around freely, with room for running or jumping. The smaller 7 billion and 13 billion parameter models can run on most modern laptops and desktops with at least 8GB of RAM and a decent CPU. Running tests to ensure the model is operational. Doesn't go oom, also tried seq length 8192, didn't go oom timing was 8 tokens/sec. It should work. Getting started with Meta Llama. 077 GB. The dashboard should load without any errors, confirming the successful installation of Llama 2. Having only 7 billion parameters make them a perfect choice for individuals who Aug 8, 2023 · Download the Ollama CLI: Head over to ollama. float32 to torch. The Getting started guide provides instructions and resources to start building with Llama 2. A NOTE about compute requirements when using Llama 2 models: Finetuning, evaluating and deploying Llama 2 models requires GPU compute of V100 / A100 SKUs. If you use ExLlama, which is the most performant and efficient GPTQ library at the moment, then: 7B requires a 6GB card. Apr 24, 2024 · For a fair comparison between Llama 2 and Llama 3 models, we ran the models with native precision (float16 for Llama 2 models and bfloat16 for Llama 3 models) instead of any quantized precision. And unlike with Llama 1 there are Variations Llama 2 comes in a range of parameter sizes — 7B, 13B, and 70B — as well as pretrained and fine-tuned variations. Minimum Facility Requirements. This is the repository for the 7B pretrained model. The “missing” graph for the full Some of the steps below have been known to help with this issue, but you might need to do some troubleshooting to figure out the exact cause of your issue. Software Requirements The facilities requirements are designed to make ownership easier for you and to ensure the animals' well-being. Running huge models such as Llama 2 70B is possible on a single consumer GPU. 1. With enough fine-tuning, Llama 2 proves itself to be a capable generative AI model for commercial applications and research purposes listed below. Mar 21, 2023 · Question 3: Can the LLaMA and Alpaca models also generate code? Yes, they both can. Jul 18, 2023 · Llama 2 is a collection of foundation language models ranging from 7B to 70B parameters. Deploy Llama on your local machine and create a Chatbot. 5 tokens/second at 2k context. Meta has been very clear about its intentions to support LLama 2 as a free-to-use model due to the possible positive impact this will have on the artificial intelligence ecosystem. We aggressively lower the precision of the model where it has less impact. For the 7B and 13B models, LoRA consumes much less memory and can, therefore, be run on fewer or cheaper instances. Mar 3, 2023 · To get it down to ~140GB you would have to load it in bfloat/float-16 which is half-precision, i. You will have a gauge for how fast 33B model will run later. They are much cheaper than the newer A100 and H100, however they are still very capable of running AI workloads, and their price point makes them cost-effective. Llama 3 is part of a broader initiative to democratize access to cutting-edge AI technology. Jul 19, 2023 · In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Now we need to install the command line tool for Ollama. Dec 6, 2023 · The hardware required to run Llama-2 on a Windows machine depends on which Llama-2 model you want to use. The framework is likely to become faster and easier to use. , 65 * 2 = ~130GB. We encountered three main challenges when trying to fine-tune LLaMa 70B with FSDP: FSDP wraps the model after loading the pre-trained model. Variations Llama 2 comes in a range of parameter sizes — 7B, 13B, and 70B — as well as pretrained and fine-tuned variations. Resources. For Llama 2 model access we completed the required Meta AI license agreement. The memory consumption of the model on our system is shown in the following table. To try other quantization levels, please try the other tags. I can do a test but I expect it will just run about 2. Apr 27, 2024 · Click the next button. Reply reply. Soon thereafter Apr 18, 2024 · Llama 3 will soon be available on all major platforms including cloud providers, model API providers, and much more. The Llama2 models were trained using bfloat16, but the original inference uses float16. The output from the 70b raw model is excellent, the best output I have seen from a raw pretrained model. GPU: One or more powerful GPUs, preferably Nvidia with CUDA architecture, recommended for model training and inference. . Memory requirements. Unlock the full potential of Llama 2 with our developer documentation. cpp is a port of Llama in C/C++, which makes it possible to run Llama 2 locally using 4-bit integer quantization on Macs. Run this script and pass your jsonl file as –input. Bare minimum is a ryzen 7 cpu and 64gigs of ram. For recommendations on the best computer hardware configurations to handle CodeLlama models smoothly, check out this guide: Best Computer for Running LLaMA and LLama-2 Models. Jul 21, 2023 · Getting 10. However, Llama. Lower the Precision. Today, we are excited to announce that Llama 2 foundation models developed by Meta are available for customers through Amazon SageMaker JumpStart to fine-tune and deploy. That is true, but you will still have to specify the dtype when loading the model otherwise it will default to float-32 as per the docs. 30B/33B requires a 24GB card, or 2 x 12GB. LLaMA-2–7b and Mistral-7b have been two of the most popular open source LLMs since their release. Select and download. Make sure to point to the location of your Llama2 weights and tokenizer. cpp, llama-cpp-python. Jul 20, 2023 · Yes, Llama 2 is free to use. Commonly known as foundational models Apr 18, 2024 · The Llama 3 release introduces 4 new open LLM models by Meta based on the Llama 2 architecture. Note also that ExLlamaV2 is only two weeks old. Guide for setting up and running Llama2 on Mac systems with Apple silicon. whl. For recommendations on the best computer hardware configurations to handle WizardLM models smoothly, check out this guide: Best Computer for Running LLaMA and LLama-2 Models. Jul 18, 2023 · The entire point of this article is that Llama 2 is an open model. Links to other models can be found in the index at the bottom. Model Architecture Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. While the LLaMA model would just continue a given code template, you can ask the Alpaca model to write code to Mar 7, 2024 · You want to try running LLaMa 2 on your machine. Full parameter fine-tuning is a method that fine-tunes all the parameters of all the layers of the pre-trained model. For recommendations on the best computer hardware configurations to handle Falcon models smoothly, check out this guide: Best Computer for Running LLaMA and LLama-2 Models. On this page. Below are the Falcon hardware requirements for 4-bit quantization: This guide shows how to accelerate Llama 2 inference using the vLLM library for the 7B, 13B and multi GPU vLLM with 70B. Simply click on the ‘install’ button. Install the LLM which you want to use locally. Output Models generate text only. Download the Llama 2 Model The model is available on Hugging Face. In this repo, we present a permissively licensed open source reproduction of Meta AI's LLaMA large language model. 1. Set up your Python environment. I'm wondering what acceleration I could expect from a GPU and what GPU I would need to procure. Given that it has the same basic model architecture as Llama 2, Llama 3 can easily be integrated into any available software eco-system that currently Llama 2-Chat, a fine-tuned version of Llama 2 that is optimized for dialogue use cases. References(s): Llama 2: Open Foundation and Fine-Tuned Chat Models paper . The Dockerfile will creates a Docker image that starts a Nov 14, 2023 · The performance of an CodeLlama model depends heavily on the hardware it's running on. Our benchmarks show the tokenizer offers improved token efficiency, yielding up to 15% fewer tokens compared to Llama 2. The resource demands vary depending on the model size, with larger models requiring more powerful hardware. The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative […] Aug 31, 2023 · Hardware requirements. We are expanding our team. Below are the CodeLlama hardware requirements for 4-bit quantization: By accessing this model, you are agreeing to the LLama 2 terms and conditions of the license, acceptable use policy and Meta’s privacy policy. We can also reduce the batch size if needed, but this might slow down the training Dec 4, 2023 · NVidia A10 GPUs have been around for a couple of years. Create the following requirements. CPU: Modern CPU with at least 8 cores recommended for efficient backend operations and data preprocessing. LM Studio supports any ggml Llama, MPT, and StarCoder model on Hugging Face (Llama 2, Orca, Vicuna, Nous Hermes, WizardCoder, MPT, etc. Wait, I thought Llama was trained in 16 bits to begin with. Apr 29, 2024 · Before diving into the installation process, it's essential to ensure that your system meets the minimum requirements for running Llama 3 models locally. This guide will run the chat version on the models, and Jul 18, 2023 · Llama 2 is a family of state-of-the-art open-access large language models released by Meta today, and we’re excited to fully support the launch with comprehensive integration in Hugging Face. cpp is a way to use 4-bit quantization to reduce the memory requirements and speed up the inference. 6, otherwise 1) get_peft_model will Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. 5 times slower than 13B on your machine. Fences should be a minimum of 48" high and safe for animal containment Sep 6, 2023 · Today, we are excited to announce the capability to fine-tune Llama 2 models by Meta using Amazon SageMaker JumpStart. Llama 2 model memory footprint Model Model Precision No. docker run -p 5000:5000 llama-cpu-server. If each process/rank within a node loads the Llama-70B model, it would require 70*4*8 GB ~ 2TB of CPU RAM, where 4 is the number of bytes per parameter and 8 is the Depends on what you want for speed, I suppose. Meta's Llama 2 Model Card webpage. II. Dec 5, 2023 · I've installed llama-2 13B on my machine. We provide PyTorch and JAX weights of pre-trained OpenLLaMA models, as Memory requirements. You need 2 x 80GB GPU or 4 x 48GB GPU or 6 x 24GB GPU to run fp16. cpp. I'm sure the OOM happened in model = FSDP(model, ) according to the log. If you want to go faster or bigger you'll want to step up the VRAM, like the 4060ti 16GB, or the 3090 24GB. Mar 7, 2023 · It does not matter where you put the file, you just have to install it. Apr 29, 2024 · Llama 2 is the latest iteration of the Llama language model series, designed to understand and generate human-like text based on the data it's trained on. If you’re a developer or a researcher, It helps you to use the power of AI without relying on cloud-based platforms. Demonstrated running Llama 2 7B and Llama 2-Chat 7B inference on Intel Arc A770 graphics on Windows and WSL2 via Intel Extension for PyTorch. The foundation models are provided along with the chat models to anybody that requests access. The checkpoints uploaded on the Hub use torch_dtype = 'float16', which will be used by the AutoModel API to cast the checkpoints from torch. This example demonstrates how to achieve faster inference with the Llama 2 models by using the open source project vLLM. We envision Llama models as part of a broader system that puts the developer in the driver seat. Below are the Open-LLaMA hardware requirements for 4-bit quantization: For 7B Aug 2, 2023 · Running LLaMA and Llama-2 model on the CPU with GPTQ format model and llama. Llama 2 is an open source LLM family from Meta. float16. Additionally, you will find supplemental materials to further assist you while building with Llama. ai/download. These are the most basic requirements that all llamas and alpacas must have for physical well-being and, as such, define minimum requirements for animal control officers and government officials investigating questionable llama and alpaca care situations. To get to 70B models you'll want 2 3090s, or 2 4090s to run it faster. Linux is available in beta. The size of Llama 2 70B fp16 is around 130GB so no you can't run Llama 2 70B fp16 with 2 x 24GB. Meta's Llama 2 webpage . •. The introduction of Llama 2 by Meta represents a significant leap in the open-source AI arena. To run Llama 2, or any other PyTorch models Minimum Standards of Care are mandatory to llama and alpaca survival and humane treatment. With its Sep 6, 2023 · Illustration of differences in total required memory when fine-tuning the Llama 2 model series with a context length of 512 tokens and a batch size of 8 on a single p4de. RAM: Minimum 16 GB for 8B model and 32 Usage tips. generation of Llama, Meta Llama 3 which, like Llama 2, is licensed for commercial use. This guide provides information and resources to help you set up Llama including how to access the model, hosting, how-to and integration guides. g. The original Orca Mini based on Llama in 3, 7, and 13 billion parameter sizes, and v3 based on Llama 2 in 7, 13, and 70 billion parameter sizes. I'm wondering the minimum GPU requirements for 7B model using FSDP Only (full_shard, parameter parallelism). Use this Quick Start guide to deploy the Llama 2 model for inference with NVIDIA Triton. Head over to Terminal and run the following command ollama run mistral. Aug 5, 2023 · Step 3: Configure the Python Wrapper of llama. This tutorial covers the process of fine-tuning Llama 7 Apr 20, 2024 · Meta Llama 3 is the latest entrant into the pantheon of LLMs, coming in two variants – an 8 billion parameter version and a more robust 70 billion parameter model. Sep 13, 2023 · Challenges with fine-tuning LLaMa 70B. Clear cache. We release variants of this model with 7B, 13B, and 70B parameters as well. co account. 70b models generally require at least 64GB of RAM; If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory. We’ll use the Python wrapper of llama. txt file: Fine-tuning. Note: Use of this model is governed by the Meta license. In case you use parameter-efficient methods like QLoRa, memory requirements are greatly reduced: Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA. Processor and Memory. Code Llama is available in four sizes with 7B, 13B, 34B, and 70B parameters respectively. 10+xpu) officially supports Intel Arc A-series graphics on WSL2, built-in Windows and built-in Linux. Install the 13B Llama 2 Model: Open a terminal window and run the following command to download the 13B model: ollama pull llama2:13b. edited Aug 27, 2023. Llama 3 will be everywhere. GGML is a weight quantization method that can be applied to any model. Feb 13, 2024 · 1. Ensure you are running code on GPU(s) when using AI Notebooks or AI Training. If you access or use Llama 2, you agree to this Acceptable Use Policy (“Policy”). We need Minimum 1324 GB of Graphics card VRAM to train LLaMa-1 7B with Batch Size = 32. Llama 2 enables you to create chatbots or can be adapted for various natural language generation tasks. Installing Command Line. Fine-tuned LLMs, called Llama-2-chat, are optimized for dialogue use cases. Although the LLaMa models were trained on A100 80GB GPUs it is possible to run the models on different and smaller multi-GPU hardware for inference. If you have 16gb of ram you should try running the 13B model now. 0. Meta-Llama-3-8b: Base 8B model. Aug 31, 2023 · Hardware requirements. e. Hugging Face recommends using 1x Nvidia Nov 5, 2023 · Since we are not training all the parameters but only a subset, we have to add the LoRA adapters to the model using huggingface peft. This release of Llama 3 features both 8B and 70B pretrained and instruct fine-tuned versions to help support a broad range of application environments. Input Models input text only. But since your command prompt is already navigated to the GTPQ-for-LLaMa folder you might as well place the . You are concerned about data privacy when using third-party LLM models. Post-installation, download Llama 2: ollama pull llama2 or for a larger version: ollama pull llama2:13b. Implement LLMs on your machine. Oct 26, 2023 · Using the "DashboardUrl" provided in the "Outputs" tab, open the Llama application dashboard in your web browser. A summary of the minimum GPU requirements and recommended AIME systems to run a specific LLaMa model with near realtime reading performance: To run Llama 3 models locally, your system must meet the following prerequisites: Hardware Requirements. Ollama AI is an open-source framework that allows you to run large language models (LLMs) locally on your computer. Plus, it can handle specific applications while running on local machines. If you are on Windows: Aug 31, 2023 · The performance of an Open-LLaMA model depends heavily on the hardware it's running on. Aug 8, 2023 · Hi there! Although I haven't personally tried it myself, I've done some research and found that some people have been able to fine-tune llama2-13b using 1x NVidia Titan RTX 24G, but it may take several weeks to do so. To enable GPU support, set certain environment variables before compiling: set Feb 17, 2024 · Feb 17, 2024. Llama 2 is being released with a very permissive community license and is available for commercial use. Running a large language model normally needs a large memory of GPU with a strong CPU, for example, it is about 280GB VRAM for a 70B Jul 18, 2023 · October 2023: This post was reviewed and updated with support for finetuning. , GPT-3 with 175B parameters). In general, it can achieve the best performance but it is also the most resource-intensive and time consuming: it requires most GPU resources and takes the longest. # Llama 2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and features, including Llama 2. The performance of an Falcon model depends heavily on the hardware it's running on. To interact with the model: ollama run llama2. It's a product of extensive research and development, capable of performing a wide range of NLP tasks, from simple text generation to complex problem-solving. GPU: Powerful GPU with at least 8GB VRAM, preferably an NVIDIA GPU with CUDA support. Also, Group Query Attention (GQA) now has been added to Llama 3 8B as well. 0-cp310-cp310-win_amd64. This repo provides instructions for installing prerequisites like Python and Git, cloning the necessary repositories, downloading and converting the Llama models, and finally running the model with example prompts. Llama2 7B Llama2 7B-chat Llama2 13B Llama2 13B-chat Llama2 70B Llama2 70B-chat Jul 27, 2023 · To proceed with accessing the Llama-2–70b-chat-hf model, kindly visit the Llama downloads page and register using the same email address associated with your huggingface. cpp also has support for Linux/Windows. But you can run Llama 2 70B 4-bit GPTQ on 2 x 24GB and many people are doing this. For recommendations on the best computer hardware configurations to handle Open-LLaMA models smoothly, check out this guide: Best Computer for Running LLaMA and LLama-2 Models. Run Llama 2: Now, you can run Llama 2 right from the terminal. They come in two sizes: 8B and 70B parameters, each with base (pre-trained) and instruct-tuned versions. I think that yes, 32GB will be enough for 33B to launch and slowly generate text. Using Ollama, users can easily personalize and create language models according to their preferences. I'm also seeing indications of far larger memory requirements when reading about fine tuning some LLMs. Mar 13, 2023 · On Friday, a software developer named Georgi Gerganov created a tool called "llama. my 3070 + R5 3600 runs 13B at ~6. /llama-2-7b-chat directory. Made possible thanks to the llama. Below are the WizardLM hardware requirements for 4-bit Hardware Requirements. Use API Documentation for Testing. Model Architecture: Architecture Type: Transformer Network Jul 20, 2023 · This will provide you with a comprehensive view of the model’s strengths and limitations. Jul 21, 2023 · Powerful Computing Resources: Fine-tuning the Llama 2 model requires substantial computational power. We believe that the open release of LLMs, when done safely, will be a net benefit to society. Ensure your GPU has enough memory. Oct 10, 2023 · Llama 2 is predominantly used by individual researchers and companies because of its modest hardware requirements. With the quantization technique of reducing the weights size to 4 bits, even the powerful Llama 2 70B model can be deployed on 2xA10 GPUs. In this blog post we will show how to Oct 29, 2023 · Afterwards you can build and run the Docker container with: docker build -t llama-cpu-server . Model variants. Here’s a one-liner you can use to install it on your M1/M2 Mac: Here’s what that one-liner does: cd llama. The expected format is a JSONL file with {‘input’: ‘xxx’, ‘output’: ‘yyy’} pairs. Aug 6, 2023 · I have 8 * RTX 3090 (24 G), but still encountered with "CUDA out of memory" when training 7B model (enable fsdp with bf16 and without peft). Not only can open access to Llama allow more developers to scrutinize it for The resource requirements for deploying and using Llama2 on Azure will depend on the specific model you plan to use and the size of the data you plan to process. I run llama2-70b-guanaco-qlora-ggml at q6_K on my setup (r9 7950x, 4090 24gb, 96gb ram) and get about ~1 t/s with some variance, usually a touch slower. Yes, Llama 2 is free for both commercial use and research. Jul 21, 2023 · what are the minimum hardware requirements to run the models on a local machine ? Requirements CPU : GPU: Ram: For All models. ) Minimum requirements: M1/M2/M3 Mac, or a Windows PC with a processor that supports AVX2. While it performs ok with simple questions, like 'tell me a joke', when I tried to give it a real task with some knowledge base, it takes about 10-15 minutes to process each request. cpp project. Aug 20, 2023 · Getting Started: Download the Ollama app at ollama. Jul 23, 2023 · In this post, I’ll guide you through the minimum steps to set up Llama 2 on your local machine, assuming you have a medium-spec GPU like the RTX 3090. Basically one quantizes the base model in 8 or 4 Use the Llama-2-7b-chat weight to start with the chat application. RTX 3000 series or higher is ideal. Developed by a collaborative effort among academic and research institutions, Llama 3 Jul 22, 2023 · Llama. The DeepSpeed-Chat training framework now provides system support for the Llama and Llama-2 models across all three stages of training. All the variants can be run on various types of consumer hardware and have a context length of 8K tokens. I think htop shows ~56gb of system ram used as well as about ~18-20gb vram for offloaded layers. Hardware Recommendations: Ensure a minimum of 8 GB RAM for the 3B model, 16 GB for the 7B model, and 32 GB for the 13B variant. In this video, I take you through a detailed tutorial on the recent update to the FineTune LLMs repo. 5 tokens/second with little context, and ~3. For example, we will use the Meta-Llama-3-8B-Instruct model for this demo. By default, Ollama uses 4-bit quantization. Table 3. The code, pretrained models, and fine-tuned Aug 3, 2023 · The GPU requirements depend on how GPTQ inference is done. Fine-tuning considerations. Step 2: Data preprocessing. 5~ tokens/sec for llama-2 70b seq length 4096. Ya. Make sure to use peft >= 0. Each of these models is trained with 500B tokens of code and code-related data, apart from 70B, which is trained on 1T tokens. To run the preprocessing, use the script that has already been prepared for you. Modify the Model/Training. Note. PEFT, or Parameter Efficient Fine Tuning, allows You can then run the following command to perform a LoRA finetune of Llama2-7B with two GPUs (each having VRAM of at least 16GB): tune run --nnodes 1 --nproc_per_node 2 lora_finetune_distributed --config llama2/7B_lora. uk jm xk eu gm wz pv ag wq lw