Showing posts with label Weight-Decomposed Low-Rank Adaptation. Show all posts
Showing posts with label Weight-Decomposed Low-Rank Adaptation. Show all posts

Saturday, July 13, 2024

Finetune Large Language Models with DoRA (Weight-Decomposed Low-Rank Adaptation)

 

Introduction.

The DoRA (Weight-Decomposed Low-Rank Adaptation) algorithm offers an advanced approach to fine-tuning Large Language Models (LLMs) by decomposing weight matrices into magnitude and direction components. Traditional methods like Low-Rank Adaptation (LoRA) improve parameter efficiency but often face performance and stability trade-offs. DoRA addresses these issues by leveraging the Frobenius norm to separate the weight matrix into a stable magnitude and a fine-tuned direction. This decomposition ensures efficient learning while maintaining model expressiveness and stability. Key advantages of DoRA include enhanced parameter efficiency, improved generalization, faster adaptation to new tasks, and minimal inference overhead.

Key Points:

  • Decomposes weights into magnitude and direction.
  • Enhances parameter efficiency without compromising performance.
  • Improves training stability and generalization.
  • Facilitates faster adaptation to new tasks.
  • Maintains efficient inference with minimal overhead.

Video Tutorial.

Code: Finetune Large Language Models with DoRA (Train).

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model
from datasets import Dataset
import transformers
# pip install peft
# Sample QA Data
data = {
'question': [
"What is the capital of France?",
"Who painted the Mona Lisa?",
"What is the tallest mountain in the world?",
"When did World War II end?",
"Who wrote the play 'Romeo and Juliet'?",
"What is the chemical symbol for gold?"
],
'context': [
"Paris is the capital and most populous city of France.",
"The Mona Lisa is a half-length portrait painting by Italian Renaissance artist Leonardo da Vinci.",
"Mount Everest is Earth's highest mountain above sea level, located in the Mahalangur Himal sub-range of the Himalayas.",
"World War II (WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
"Romeo and Juliet is a tragedy written by William Shakespeare early in his career about two young star-crossed lovers whose deaths ultimately reconcile their feuding families.",
"Gold is a chemical element with the symbol Au and atomic number 79. In its purest form, it is a bright, slightly reddish yellow, dense, soft, malleable, and ductile metal."
],
'answer': [
"Paris",
"Leonardo da Vinci",
"Mount Everest",
"1945",
"William Shakespeare",
"Au"
]
}
dataset = Dataset.from_dict(data)

# Load Llama Model and Tokenizer
tokenizer = AutoTokenizer.from_pretrained("D:\\OLLAMA_MODELS\\meta-llama\\Meta-Llama-3-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("D:\\OLLAMA_MODELS\\meta-llama\\Meta-Llama-3-8B-Instruct")
# Ensure padding token is set
if tokenizer.pad_token is None:
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
# Configure LoRA
peft_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
use_dora =True
)

# Create PEFT Model
model = get_peft_model(model, peft_config)

# Preprocess Data
def generate_prompt(data_point):
return f"""[INST] {data_point["question"]} [/INST] {data_point["context"]} {data_point["answer"]} [/INST]"""

dataset = dataset.map(lambda data_point: {"text": generate_prompt(data_point)})

# Tokenize Data
def tokenize(prompt):
result = tokenizer(prompt["text"])
return {
"input_ids": result["input_ids"],
"attention_mask": result["attention_mask"],
}
tokenized_dataset = dataset.map(tokenize, batched=True, remove_columns=dataset.column_names)

# Training Arguments (Optimized for CPU)
training_args = TrainingArguments(
per_device_train_batch_size=1, # Very small batch size for CPU
gradient_accumulation_steps=8, # Accumulate gradients over multiple steps
num_train_epochs=3,
learning_rate=1e-4, # Smaller learning rate for CPU
logging_steps=10,
output_dir="./llama-3-finetuned-qa-cpu",
)

# Create Trainer
trainer = Trainer(
model=model,
train_dataset=tokenized_dataset,
args=training_args,
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

# Fine-tune!
model.config.use_cache = False
trainer.train()

# Save the Fine-tuned Model
model.save_pretrained("./llama-3-finetuned-qa-cpu")

Code: Finetune Large Language Models with DoRA (Test).

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

# Load Fine-Tuned Model and Tokenizer
model_path = "E:\\Niraj_Work\\DL_Projects\\llm_projects\\llm_advance_1\\llama-3-finetuned-qa-cpu" # Path to your saved model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

# Ensure Model is on CPU
device = torch.device("cpu")
model.to(device)
if tokenizer.pad_token is None:
# tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.pad_token = tokenizer.eos_token
# Load Your Question-Answering Dataset (Replace with your dataset)
# Assuming you have a list of dictionaries, each with 'question', 'context', and 'answer' keys
eval_data = [
{"question": "What is the capital of France?", "context": "Paris is the capital and most populous city of France.", "answer": "Paris"},
{"question": "Who painted the Mona Lisa?", "context": "The Mona Lisa is a half-length portrait painting by Italian Renaissance artist Leonardo da Vinci.", "answer": "Leonardo da Vinci"},
]

# Function to generate the prompt
def generate_prompt(data_point):
return f"""[INST] {data_point["question"]} [/INST] {data_point["context"]} {data_point["answer"]} [/INST]"""


# Test the Model
for data_point in eval_data:
input_text = generate_prompt(data_point)
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device) # Move input to CPU

# Generate Answer
generation_output = model.generate(
input_ids=input_ids,
max_new_tokens=50, # Adjust as needed
num_beams=1, # You can try increasing num_beams if you have enough memory
early_stopping=True,
)

# Extract and Print Answer
generated_answer = tokenizer.decode(generation_output[0])
print(f"Question: {data_point['question']}")
print(f"Generated Answer: {generated_answer.split('[/INST]')[-2].strip()}")
print(f"Actual Answer: {data_point['answer']}")

Reference.

  1. Liu, Shih-Yang, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. "Dora: Weight-decomposed low-rank adaptation." arXiv preprint arXiv:2402.09353 (2024).
  2. https://huggingface.co/papers/2402.09353
  3. https://www.nirajai.com/home/llm