LLaMA.cpp

From llamawiki.ai
(Redirected from Llama.cpp)
LLaMA.cpp
LLaMA cpp logo.png
Initial Release18 March 2023 (20 months ago) (2023-03-18)
Original Author / MaintainerGeorgi Gerganov
GitHub LinkLink
LicenseMIT License
Batch GenerationcheckY
ChatcheckY
Training☒N
QuantizationcheckY
Run on CPU alonecheckY
Run on GPU / CUDAcheckY
GUIVia oobabooga or other programs
Code BindingsPython, .net/C#, Go, NodeJS, Ruby, Scala 3.
Model FormatsGGML

LLaMA.cpp (also written as llama.cpp or LLaMA C++) is an implementation of the transformer model underlying LLaMA and other models written in C++. The implementation was originally targeted at running 4-bit quantized models on the CPU of a MacBook under MacOS, but now also supports running under Linux & Microsoft Windows as well as running models on one or more GPUs.

LLaMA.cpp is notable for being among the first tools available that allowed large language models to be run on personal computers without necessarily depending on GPU capacity.

Related Work[edit | edit source]

LLaMA.cpp is the main tool used with the GGML model format and library, which were developed in conjunction with it.

See Also[edit | edit source]