Use Unsloth LoRA Adapter with Ollama in 3 Steps
Use LLama.Cpp to convert Unsloth Lora Adapter to GGML(.bin) and use it in Ollama — with a single GPU
Background
I recently ran my first LLM fine-tuning session with Unsloth. And their option to save only Lora Adapter is awesome. I researched a little more and found that it could be used in Ollama directly with ADAPTER
instruction.
Overview
- Download LORA Adapter
- Convert
adapter_config.json
toggml-adapter-model.bin
withllama.cpp
- Add
ADAPTER
instruction into OllamaModelfile
- Usage
- From Unsloth
get_chat_template
to Ollama Template - Conclusion
Download the Model LORA Adapter from Huggingface
Setup huggingface-cli
first, follow this guide.
Then download the model. In this example, you can use my pacozaa/tinyllama-alpaca-lora
huggingface-cli download pacozaa/tinyllama-alpaca-lora
After running the script, you should see where your model is saved, please note it down.
Convert with llama.cpp — convert-lora-to-ggml.py
Run llama.cpp
convert script
git clone https://github.com/ggerganov/llama.cpp/tree/master
cd llama.cpp
pip install -r reqiurement.txt
python convert-lora-to-ggml.py [path to LORA adapter folder]/adapter_config.json
This script will print out your .bin
location, please note it down.
Add ADAPTER instruction
Run the following commands
ollama pull tinyllama
touch ModelfileTinyllama
Add the content to the ModelfileTinyllama
file as below
**NOTE: Ollama usually use the Chat Fine-Tuned model, so I need to revise a base model to a pre-trained one. In this case, tinyllama
is already fine-tuned for chat but we override the template to instruction format as we fine-tune it.
FROM tinyllama:latest
ADAPTER ./ggml-adapter-model.bin
TEMPLATE """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
{{ if .System }}### Instruction:
{{ .System }}{{ end }}
{{ if .Prompt }}### Input:
{{ .Prompt }}{{ end }}
### Response:
"""
SYSTEM """Continue the fibonnaci sequence."""
PARAMETER stop "### Response:"
PARAMETER stop "### Instruction:"
PARAMETER stop "### Input:"
PARAMETER stop "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request."
PARAMETER num_predict 200
Now create and run the model
ollama create tinyadap -f ./ModelfileTinyllama
ollama run tinyadap
or you can try the one I uploaded to the Ollama library
ollama run pacozaa/tinyllama-alpaca-lora
Usage
I have set up .System
prompt as a prompt for ### Instruction:
, and the default is Continue the fibonnaci sequence.
If you want to change it to something else, you can do it after run ollama run pacozaa/tinyllama-alpaca-lora
by using /set system
For example
>>>/set system You're a kitty. Answer using kitty sounds.
From Unsloth get_chat_template
to Ollama Template
Let’s take a look into more variety of Unsloth pre-made chat template
The snippets of Modelfile
could look something like these…
ChatML Template
Reference: https://ollama.com/library/qwen
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant"""
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
Unsloth Template
TEMPLATE """{{ .System }}
>>> User: {{ .Prompt }}
>>> Assistant:
"""
PARAMETER stop ">>> User:"
PARAMETER stop ">>> Assistant:"
Vicuna Template
Reference: https://ollama.com/library/wizard-vicuna
TEMPLATE """{{ .System }}
USER: {{ .Prompt }}
ASSISTANT:
"""
PARAMETER stop "User:"
PARAMETER stop "Assistant:"
LLama2 Template
Reference: https://ollama.com/library/llama2
Mistral Template
Reference: https://ollama.com/library/mistral
Conclusion
Now you can train a model with Unsloth’s Google or Kaggle Notebook and run it locally with the LORA adapter which is faster than saving the entire GGUF file.