(Redirected from Llama.cpp)
Initial Release | 18 March 2023 |
---|---|
Original Author / Maintainer | Georgi Gerganov |
GitHub Link | Link |
License | MIT License |
Batch Generation | |
Chat | |
Training | |
Quantization | |
Run on CPU alone | |
Run on GPU / CUDA | |
GUI | Via oobabooga or other programs |
Code Bindings | Python, .net/C#, Go, NodeJS, Ruby, Scala 3. |
Model Formats | GGML |
LLaMA.cpp (also written as llama.cpp or LLaMA C++) is an implementation of the transformer model underlying LLaMA and other models written in C++. The implementation was originally targeted at running 4-bit quantized models on the CPU of a MacBook under MacOS, but now also supports running under Linux & Microsoft Windows as well as running models on one or more GPUs.
LLaMA.cpp is notable for being among the first tools available that allowed large language models to be run on personal computers without necessarily depending on GPU capacity.
Related Work[edit | edit source]
LLaMA.cpp is the main tool used with the GGML model format and library, which were developed in conjunction with it.