
gittech. site
for different kinds of informations and explorations.
Make Llama 3.1 8B talk in Rick Sanchez's style
Rick LLM
Make Llama 3.1 8B talk in Rick Sanchez’s style
Table of Contents
- 1. Introduction
- 2. Project Design
- 3. Configuration
- 4. Running the project
- 4.1 Create the dataset
- 4.2 Configure your Lambda Labs account
- 4.3 Create an SSH key
- 4.4 Launching a Lambda Labs instance
- 4.5 Fetching the instance IP
- 4.6 Syncing the local filesystem with the remote one
- 4.7 Configuring Lambda Labs instance
- 4.8 Finetuning the model
- 4.9 Terminate the Lambda Labs instance
- 4.10 Creating the Ollama model
- 5. Contributing
Introduction
This project shows you how to make Llama 3.1 8B speak like Rick Sanchez by:
- Creating a custom dataset from Rick and Morty transcripts in ShareGPT format
- Finetuning the model using Unsloth's optimizations on Lambda Labs GPUs
- Converting and deploying the model to Ollama for local use
It's a fun way to learn LLM finetuning while creating your own Rick-speaking AI assistant.
Wubba lubba dub dub!
Project Design
The project can be divided into three main parts:
- Dataset creation: Creating a custom dataset from Rick and Morty transcripts in ShareGPT format.
- Model finetuning: Finetuning the model using Unsloth's optimizations on Lambda Labs GPUs.
- Model deployment: Converting and deploying the model to Ollama for local use.
Let's begin with the dataset creation.
Dataset Creation
To train the LLM, we need an instruct dataset. This dataset will contain the instructions for the model to follow. In this case, we want the model to speak like Rick Sanchez, so we'll create a dataset with Rick and Morty transcripts in ShareGPT format.
This dataset will be pushed to Hugging Face, so we can use it later in the finetuning process. I've already created the dataset in The Neural Maze organization, so you can use it directly.
You have all the code for this part here.
Model finetuning
Now that we have the dataset, we can start the finetuning process. We'll use the Unsloth library to finetune the model. Unsloth is a library that provides a set of optimizations for finetuning LLMs, making the process faster and more efficient.
We are not going to appply a full finetuning, instead, we'll apply a LoRA finetuning. LoRA is a technique that allows us to finetune the model without retraining all the weights. This is a great way to save time and resources, but it's not as accurate as a full finetuning.
Since you might not have access to a local GPU (that's my case, at least), I've designed this process to be fully remote. This means that you'll need to have access to a cloud GPU. I've used Lambda Labs for this, but you can use any other cloud provider that supports GPUs.
You have all the finetuning code under the rick_llm folder.
Model deployment
Once the model is finetuned, we need to convert it to a format that can be used by Ollama. The two files we need are:
- The model file:
gguf
- The model file:
Modelfile
These two files will be located under the ollama_files
folder.
Configuration
Environment Variables
Create a .env
file in the project root with the following variables:
OPENAI_API_KEY="PUT_YOUR_OPENAI_API_KEY_HERE"
HUGGINGFACE_TOKEN="PUT_YOUR_HUGGINGFACE_TOKEN_HERE"
LAMBDA_API_KEY="PUT_YOUR_LAMBDA_API_KEY_HERE"
You can use the .env.example
file as a template. Simply copy it to .env
and fill in the values.
The variables are needed for the following:
- OPENAI_API_KEY: To use the OpenAI API to clean the dataset.
- HUGGINGFACE_TOKEN: To use the Hugging Face API to push the dataset to Hugging Face.
- LAMBDA_API_KEY: To use the Lambda Labs API to sync the local filesystem with the remote one.
Running the project
Create the dataset (Optional)
The first thing we need to do is to create the dataset. This is optional, since I already created the dataset in The Neural Maze organization. In case you still want to create it, you can do it with the following command:
make create-dataset
This will create the dataset and push it to Hugging Face.
Don't forget to change the dataset name in the
src/dataset.py
file!!
Configure your Lambda Labs account
You need to go to Lambda Labs and create an account. Once you have an account, you can create a new API key. This key will be used to sync the local filesystem with the remote one.
Don't forget to add the key to your
.env
file.
Create an SSH key
You need to create an SSH key to be able to sync the local filesystem with the remote one. You can do this with the following command:
make generate-ssh-key
This will create the key and add it to your Lambda Labs account.
Launching a Lambda Labs instance
Once you have the SSH key, you can launch a Lambda Labs instance with the following command, that will also attach the key to the instance:
make launch-lambda-instance
For availability issues, I'm using the gpu_1x_a100_sxm4
instance type. If you want to change it, you can do it here.
Fetching the instance IP
Once the instance is launched, you can fetch the instance IP with the following command:
make get-lambda-ip
You might need to wait a few minutes for the instance to be ready. To verify the instance is ready, you can run the
make list-instances
command or go to the Lambda Labs dashboard.
Copy the IP address, since you'll need it to connect to the instance.
Syncing the local filesystem with the remote one
Once you have the IP address, you can sync the local filesystem with the remote one with the following command:
rsync -av .env Makefile src/lambda/requirements_lambda.txt ubuntu@<INSTANCE_IP>:/home/ubuntu/
rsync -av src/rick_llm ubuntu@<INSTANCE_IP>:/home/ubuntu/src/
Now connect to the instance using SSH:
ssh ubuntu@<INSTANCE_IP>
Configuring Lambda Labs instance
Inside the instance, you need to install some dependencies before running the finetune process. You can do this with the following command:
make lambda-setup
Finetuning the model
Phew! We're almost there. Now we can finetune the model. You can do this with the following command:
make finetune
This will start the finetuning process. You can check the progress of the finetuning by checking the logs.
When the finetuning is finished, both the GGUF and the Modelfile will be pushed to Hugging Face (in this case, to the The Neural Maze organization). If you want to push it to yours, simply change the name here.
Terminate the Lambda Labs instance
Once the finetuning is finished, you can terminate the Lambda Labs instance with the following command:
make terminate-instance
Creating the Ollama model
Now that we have the GGUF in Hugging Face, we need to download it locally. The following command will download the GGUF file to the ollama_files
folder.
make download-model
Now, you can use the Ollama CLI to create the model.
ollama create rick-llm -f ollama_files/Modelfile
Once the model is created, you can start chatting with your Rick-speaking AI assistant.
ollama run rick-llm
Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request