<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://lms.onnocenter.or.id/wiki/index.php?action=history&amp;feed=atom&amp;title=LLM%3A_Fine_Tune_Ollama_gemma3%3A270m</id>
	<title>LLM: Fine Tune Ollama gemma3:270m - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://lms.onnocenter.or.id/wiki/index.php?action=history&amp;feed=atom&amp;title=LLM%3A_Fine_Tune_Ollama_gemma3%3A270m"/>
	<link rel="alternate" type="text/html" href="https://lms.onnocenter.or.id/wiki/index.php?title=LLM:_Fine_Tune_Ollama_gemma3:270m&amp;action=history"/>
	<updated>2026-04-19T23:52:41Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.45.1</generator>
	<entry>
		<id>https://lms.onnocenter.or.id/wiki/index.php?title=LLM:_Fine_Tune_Ollama_gemma3:270m&amp;diff=72950&amp;oldid=prev</id>
		<title>Unknown user: Created page with &quot;### Panduan Fine-Tuning Model Gemma 3:270M dengan Dataset JSONL untuk Ollama  Model Gemma 3 270M (dari `google/gemma-3-270m` di Hugging Face) adalah varian ringan yang cocok u...&quot;</title>
		<link rel="alternate" type="text/html" href="https://lms.onnocenter.or.id/wiki/index.php?title=LLM:_Fine_Tune_Ollama_gemma3:270m&amp;diff=72950&amp;oldid=prev"/>
		<updated>2025-10-04T21:55:14Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;### Panduan Fine-Tuning Model Gemma 3:270M dengan Dataset JSONL untuk Ollama  Model Gemma 3 270M (dari `google/gemma-3-270m` di Hugging Face) adalah varian ringan yang cocok u...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;### Panduan Fine-Tuning Model Gemma 3:270M dengan Dataset JSONL untuk Ollama&lt;br /&gt;
&lt;br /&gt;
Model Gemma 3 270M (dari `google/gemma-3-270m` di Hugging Face) adalah varian ringan yang cocok untuk fine-tuning efisien, terutama pada perangkat dengan sumber daya terbatas. Proses fine-tuning menggunakan Parameter-Efficient Fine-Tuning (PEFT) seperti QLoRA untuk menghemat memori, dengan dataset dalam format JSONL (misalnya, format chat dengan role &amp;quot;user&amp;quot; dan &amp;quot;assistant&amp;quot;). Setelah fine-tuning, model akan di-merge, dikonversi ke format GGUF (untuk kompatibilitas lokal), dan di-deploy langsung ke Ollama.&lt;br /&gt;
&lt;br /&gt;
**Prasyarat:**&lt;br /&gt;
- Python 3.10+ dan GPU (minimal NVIDIA dengan CUDA 11.8+ untuk QLoRA).&lt;br /&gt;
- Akun Hugging Face dengan token (setujui lisensi Gemma di HF).&lt;br /&gt;
- Install Git dan clone repo llama.cpp untuk konversi GGUF.&lt;br /&gt;
- Dataset JSONL Anda harus berformat seperti ini (satu baris per contoh):&lt;br /&gt;
  ```&lt;br /&gt;
  {&amp;quot;messages&amp;quot;: [{&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;Pertanyaan pengguna&amp;quot;}, {&amp;quot;role&amp;quot;: &amp;quot;assistant&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;Jawaban yang diinginkan&amp;quot;}]}&lt;br /&gt;
  ```&lt;br /&gt;
  Simpan sebagai `train.jsonl` (minimal 100-1000 contoh untuk hasil baik).&lt;br /&gt;
&lt;br /&gt;
#### Langkah 1: Setup Environment&lt;br /&gt;
Install dependensi utama menggunakan pip. Jalankan di terminal atau Jupyter/Colab:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
pip install torch&amp;gt;=2.4.0 transformers&amp;gt;=4.51.3 datasets==3.3.2 accelerate==1.4.0 evaluate==0.4.3 bitsandbytes==0.45.3 trl==0.21.0 peft==0.14.0 protobuf sentencepiece&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Login ke Hugging Face (ganti `YOUR_HF_TOKEN` dengan token Anda):&lt;br /&gt;
&lt;br /&gt;
```python&lt;br /&gt;
from huggingface_hub import login&lt;br /&gt;
login(token=&amp;quot;YOUR_HF_TOKEN&amp;quot;)&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
#### Langkah 2: Persiapan Dataset JSONL&lt;br /&gt;
Load dan format dataset Anda menggunakan Hugging Face Datasets. Asumsikan file `train.jsonl` Anda sudah siap.&lt;br /&gt;
&lt;br /&gt;
```python&lt;br /&gt;
from datasets import load_dataset&lt;br /&gt;
&lt;br /&gt;
# Load dataset JSONL (ganti path sesuai file Anda)&lt;br /&gt;
dataset = load_dataset(&amp;quot;json&amp;quot;, data_files={&amp;quot;train&amp;quot;: &amp;quot;train.jsonl&amp;quot;}, split=&amp;quot;train&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
# Format untuk chat (jika belum, tambahkan system prompt opsional)&lt;br /&gt;
def format_chat(example):&lt;br /&gt;
    messages = example[&amp;quot;messages&amp;quot;]&lt;br /&gt;
    # Tambahkan system prompt jika perlu, e.g., {&amp;quot;role&amp;quot;: &amp;quot;system&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;You are a helpful assistant.&amp;quot;}&lt;br /&gt;
    return {&amp;quot;text&amp;quot;: tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)}&lt;br /&gt;
&lt;br /&gt;
dataset = dataset.map(format_chat)&lt;br /&gt;
dataset = dataset.train_test_split(test_size=0.1)  # Split 90% train, 10% eval&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Tokenize dataset:&lt;br /&gt;
&lt;br /&gt;
```python&lt;br /&gt;
from transformers import AutoTokenizer&lt;br /&gt;
&lt;br /&gt;
model_id = &amp;quot;google/gemma-3-270m&amp;quot;&lt;br /&gt;
tokenizer = AutoTokenizer.from_pretrained(model_id)&lt;br /&gt;
tokenizer.pad_token = tokenizer.eos_token  # Penting untuk Gemma&lt;br /&gt;
&lt;br /&gt;
def tokenize(example):&lt;br /&gt;
    return tokenizer(example[&amp;quot;text&amp;quot;], truncation=True, max_length=512)&lt;br /&gt;
&lt;br /&gt;
tokenized_dataset = dataset.map(tokenize, batched=True)&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
#### Langkah 3: Fine-Tuning dengan QLoRA&lt;br /&gt;
Load model dengan 4-bit quantization untuk efisiensi (cocok untuk 270M params).&lt;br /&gt;
&lt;br /&gt;
```python&lt;br /&gt;
import torch&lt;br /&gt;
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments&lt;br /&gt;
from peft import LoraConfig, get_peft_model, TaskType&lt;br /&gt;
from trl import SFTTrainer&lt;br /&gt;
&lt;br /&gt;
# Config quantization&lt;br /&gt;
bnb_config = BitsAndBytesConfig(&lt;br /&gt;
    load_in_4bit=True,&lt;br /&gt;
    bnb_4bit_quant_type=&amp;quot;nf4&amp;quot;,&lt;br /&gt;
    bnb_4bit_compute_dtype=torch.bfloat16,&lt;br /&gt;
    bnb_4bit_use_double_quant=True,&lt;br /&gt;
)&lt;br /&gt;
&lt;br /&gt;
# Load model&lt;br /&gt;
model = AutoModelForCausalLM.from_pretrained(&lt;br /&gt;
    model_id,&lt;br /&gt;
    quantization_config=bnb_config,&lt;br /&gt;
    device_map=&amp;quot;auto&amp;quot;,&lt;br /&gt;
    torch_dtype=torch.bfloat16,&lt;br /&gt;
)&lt;br /&gt;
&lt;br /&gt;
# Config LoRA (target modules untuk Gemma: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)&lt;br /&gt;
lora_config = LoraConfig(&lt;br /&gt;
    r=16,  # Rank LoRA&lt;br /&gt;
    lora_alpha=32,&lt;br /&gt;
    target_modules=[&amp;quot;q_proj&amp;quot;, &amp;quot;k_proj&amp;quot;, &amp;quot;v_proj&amp;quot;, &amp;quot;o_proj&amp;quot;, &amp;quot;gate_proj&amp;quot;, &amp;quot;up_proj&amp;quot;, &amp;quot;down_proj&amp;quot;],&lt;br /&gt;
    lora_dropout=0.05,&lt;br /&gt;
    bias=&amp;quot;none&amp;quot;,&lt;br /&gt;
    task_type=TaskType.CAUSAL_LM,&lt;br /&gt;
)&lt;br /&gt;
&lt;br /&gt;
model = get_peft_model(model, lora_config)&lt;br /&gt;
&lt;br /&gt;
# Training args&lt;br /&gt;
training_args = TrainingArguments(&lt;br /&gt;
    output_dir=&amp;quot;./gemma3-270m-finetuned&amp;quot;,&lt;br /&gt;
    num_train_epochs=3,&lt;br /&gt;
    per_device_train_batch_size=4,&lt;br /&gt;
    gradient_accumulation_steps=4,&lt;br /&gt;
    learning_rate=2e-4,&lt;br /&gt;
    fp16=True,  # Atau bfloat16 jika GPU support&lt;br /&gt;
    save_steps=500,&lt;br /&gt;
    logging_steps=100,&lt;br /&gt;
    evaluation_strategy=&amp;quot;steps&amp;quot;,&lt;br /&gt;
    eval_steps=500,&lt;br /&gt;
    load_best_model_at_end=True,&lt;br /&gt;
)&lt;br /&gt;
&lt;br /&gt;
# Trainer&lt;br /&gt;
trainer = SFTTrainer(&lt;br /&gt;
    model=model,&lt;br /&gt;
    args=training_args,&lt;br /&gt;
    train_dataset=tokenized_dataset[&amp;quot;train&amp;quot;],&lt;br /&gt;
    eval_dataset=tokenized_dataset[&amp;quot;test&amp;quot;],&lt;br /&gt;
    tokenizer=tokenizer,&lt;br /&gt;
    peft_config=lora_config,&lt;br /&gt;
    dataset_text_field=&amp;quot;text&amp;quot;,  # Field yang berisi prompt + response&lt;br /&gt;
)&lt;br /&gt;
&lt;br /&gt;
trainer.train()&lt;br /&gt;
trainer.save_model(&amp;quot;./gemma3-270m-finetuned&amp;quot;)  # Simpan adapter LoRA&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Proses ini memakan waktu 30-60 menit tergantung dataset dan hardware.&lt;br /&gt;
&lt;br /&gt;
#### Langkah 4: Merge Adapter dan Simpan Model Penuh&lt;br /&gt;
Merge LoRA adapter ke base model untuk inference.&lt;br /&gt;
&lt;br /&gt;
```python&lt;br /&gt;
from peft import PeftModel&lt;br /&gt;
&lt;br /&gt;
base_model = AutoModelForCausalLM.from_pretrained(&lt;br /&gt;
    model_id,&lt;br /&gt;
    device_map=&amp;quot;auto&amp;quot;,&lt;br /&gt;
    torch_dtype=torch.bfloat16,&lt;br /&gt;
)&lt;br /&gt;
&lt;br /&gt;
peft_model = PeftModel.from_pretrained(base_model, &amp;quot;./gemma3-270m-finetuned&amp;quot;)&lt;br /&gt;
merged_model = peft_model.merge_and_unload()&lt;br /&gt;
&lt;br /&gt;
merged_model.save_pretrained(&amp;quot;./gemma3-270m-merged&amp;quot;)&lt;br /&gt;
tokenizer.save_pretrained(&amp;quot;./gemma3-270m-merged&amp;quot;)&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
#### Langkah 5: Konversi ke Format GGUF&lt;br /&gt;
Clone llama.cpp dan konversi model HF ke GGUF.&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
git clone https://github.com/ggerganov/llama.cpp&lt;br /&gt;
cd llama.cpp&lt;br /&gt;
pip install -r requirements.txt&lt;br /&gt;
python convert_hf_to_gguf.py ../gemma3-270m-merged --outfile gemma3-270m-finetuned.gguf&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Quantize untuk ukuran lebih kecil (misalnya Q4_K_M untuk keseimbangan kualitas/ukuran):&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
./llama-quantize gemma3-270m-finetuned.gguf gemma3-270m-finetuned-q4.gguf Q4_K_M&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
#### Langkah 6: Deploy ke Ollama&lt;br /&gt;
Buat Modelfile untuk Ollama (simpan sebagai `Modelfile`):&lt;br /&gt;
&lt;br /&gt;
```&lt;br /&gt;
FROM ./gemma3-270m-finetuned-q4.gguf&lt;br /&gt;
TEMPLATE &amp;quot;&amp;quot;&amp;quot;{{ if .System }}&amp;lt;|im_start|&amp;gt;system&lt;br /&gt;
{{ .System }}&amp;lt;|im_end|&amp;gt;&lt;br /&gt;
{{ end }}{{ if .Prompt }}&amp;lt;|im_start|&amp;gt;user&lt;br /&gt;
{{ .Prompt }}&amp;lt;|im_end|&amp;gt;&lt;br /&gt;
{{ end }}&amp;lt;|im_start|&amp;gt;assistant&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
PARAMETER stop &amp;quot;&amp;lt;|im_end|&amp;gt;&amp;quot;&lt;br /&gt;
PARAMETER stop &amp;quot;&amp;lt;|im_start|&amp;gt;&amp;quot;&lt;br /&gt;
SYSTEM &amp;quot;You are a helpful assistant.&amp;quot;&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Jalankan:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
ollama create gemma3-270m-custom -f Modelfile&lt;br /&gt;
ollama run gemma3-270m-custom&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Sekarang model siap digunakan di Ollama! Test dengan prompt sederhana. Jika ada error, pastikan template chat sesuai dengan tokenizer Gemma (gunakan `apply_chat_template` untuk verifikasi). Untuk dataset lebih besar atau hyperparameter tuning, gunakan tools seperti Unsloth untuk percepatan 2x. Jika butuh bantuan debug, bagikan detail error Anda.&lt;/div&gt;</summary>
		<author><name>Unknown user</name></author>
	</entry>
</feed>