<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://lms.onnocenter.or.id/wiki/index.php?action=history&amp;feed=atom&amp;title=LLM%3A_Fine_Tune_Ollama_qwen3%3A1.7b</id>
	<title>LLM: Fine Tune Ollama qwen3:1.7b - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://lms.onnocenter.or.id/wiki/index.php?action=history&amp;feed=atom&amp;title=LLM%3A_Fine_Tune_Ollama_qwen3%3A1.7b"/>
	<link rel="alternate" type="text/html" href="https://lms.onnocenter.or.id/wiki/index.php?title=LLM:_Fine_Tune_Ollama_qwen3:1.7b&amp;action=history"/>
	<updated>2026-04-21T03:00:49Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.45.1</generator>
	<entry>
		<id>https://lms.onnocenter.or.id/wiki/index.php?title=LLM:_Fine_Tune_Ollama_qwen3:1.7b&amp;diff=72955&amp;oldid=prev</id>
		<title>Unknown user: Created page with &quot;### Panduan Fine-Tuning Model Qwen2-1.5B dengan Dataset JSONL dan Deployment ke Ollama  Halo! Saya asumsikan &quot;qwen3:1.7b&quot; merujuk pada model Qwen2-1.5B atau Qwen2.5-1.5B (vers...&quot;</title>
		<link rel="alternate" type="text/html" href="https://lms.onnocenter.or.id/wiki/index.php?title=LLM:_Fine_Tune_Ollama_qwen3:1.7b&amp;diff=72955&amp;oldid=prev"/>
		<updated>2025-10-04T22:04:46Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;### Panduan Fine-Tuning Model Qwen2-1.5B dengan Dataset JSONL dan Deployment ke Ollama  Halo! Saya asumsikan &amp;quot;qwen3:1.7b&amp;quot; merujuk pada model Qwen2-1.5B atau Qwen2.5-1.5B (vers...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;### Panduan Fine-Tuning Model Qwen2-1.5B dengan Dataset JSONL dan Deployment ke Ollama&lt;br /&gt;
&lt;br /&gt;
Halo! Saya asumsikan &amp;quot;qwen3:1.7b&amp;quot; merujuk pada model Qwen2-1.5B atau Qwen2.5-1.5B (versi terbaru dari Alibaba, karena Qwen3 belum dirilis secara luas per Oktober 2025). Model ini bisa di-fine-tune menggunakan dataset dalam format JSONL (misalnya, format Alpaca: `{&amp;quot;instruction&amp;quot;: &amp;quot;...&amp;quot;, &amp;quot;input&amp;quot;: &amp;quot;...&amp;quot;, &amp;quot;output&amp;quot;: &amp;quot;...&amp;quot;}` per baris). Proses ini melibatkan fine-tuning dengan LoRA (via Hugging Face dan Unsloth untuk efisiensi), konversi ke format GGUF (untuk Ollama), dan pembuatan Modelfile.&lt;br /&gt;
&lt;br /&gt;
**Prasyarat:**&lt;br /&gt;
- GPU dengan minimal 8GB VRAM (untuk model 1.5B dengan 4-bit quantization).&lt;br /&gt;
- Python 3.10+, Git.&lt;br /&gt;
- Dataset JSONL siap (contoh: `train.jsonl`).&lt;br /&gt;
- Jalankan di Google Colab atau lokal dengan CUDA.&lt;br /&gt;
&lt;br /&gt;
Proses ini bisa selesai dalam 1-2 jam tergantung dataset size. Saya akan jelaskan step-by-step dengan contoh kode. Gunakan Unsloth untuk mempercepat fine-tuning (hingga 2x lebih cepat).&lt;br /&gt;
&lt;br /&gt;
#### Step 1: Persiapan Lingkungan dan Instalasi&lt;br /&gt;
Jalankan di terminal atau Colab:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
pip install &amp;quot;unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git&amp;quot;&lt;br /&gt;
pip install --no-deps xformers &amp;quot;trl&amp;lt;0.9.0&amp;quot; peft accelerate bitsandbytes datasets transformers&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
#### Step 2: Load Model dan Tokenizer&lt;br /&gt;
Gunakan versi quantized dari Unsloth untuk efisiensi. Ganti `Qwen/Qwen2.5-1.5B` jika model spesifikmu berbeda.&lt;br /&gt;
&lt;br /&gt;
```python&lt;br /&gt;
from unsloth import FastLanguageModel&lt;br /&gt;
import torch&lt;br /&gt;
from datasets import load_dataset&lt;br /&gt;
&lt;br /&gt;
max_seq_length = 2048  # Sesuaikan dengan kebutuhan&lt;br /&gt;
dtype = None  # Auto-detect (bfloat16 jika GPU support)&lt;br /&gt;
load_in_4bit = True&lt;br /&gt;
&lt;br /&gt;
model, tokenizer = FastLanguageModel.from_pretrained(&lt;br /&gt;
    model_name=&amp;quot;unsloth/Qwen2.5-1.5B-bnb-4bit&amp;quot;,  # Atau &amp;quot;Qwen/Qwen2-1.5B-Instruct&amp;quot;&lt;br /&gt;
    max_seq_length=max_seq_length,&lt;br /&gt;
    dtype=dtype,&lt;br /&gt;
    load_in_4bit=load_in_4bit,&lt;br /&gt;
    # token=&amp;quot;hf_...&amp;quot; jika private model&lt;br /&gt;
)&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
#### Step 3: Konfigurasi LoRA (PEFT)&lt;br /&gt;
Terapkan LoRA untuk fine-tuning efisien (hanya train ~1% parameter).&lt;br /&gt;
&lt;br /&gt;
```python&lt;br /&gt;
model = FastLanguageModel.get_peft_model(&lt;br /&gt;
    model,&lt;br /&gt;
    r=16,  # Rank LoRA, mulai dari 16&lt;br /&gt;
    target_modules=[&amp;quot;q_proj&amp;quot;, &amp;quot;k_proj&amp;quot;, &amp;quot;v_proj&amp;quot;, &amp;quot;o_proj&amp;quot;, &amp;quot;gate_proj&amp;quot;, &amp;quot;up_proj&amp;quot;, &amp;quot;down_proj&amp;quot;],&lt;br /&gt;
    lora_alpha=16,&lt;br /&gt;
    lora_dropout=0,&lt;br /&gt;
    bias=&amp;quot;none&amp;quot;,&lt;br /&gt;
    use_gradient_checkpointing=&amp;quot;unsloth&amp;quot;,  # Hemat memori&lt;br /&gt;
    random_state=3407,&lt;br /&gt;
    use_rslora=False,&lt;br /&gt;
    loftq_config=None,&lt;br /&gt;
)&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
#### Step 4: Load dan Preprocessing Dataset JSONL&lt;br /&gt;
Load dari file JSONL, format ke prompt Qwen2 (mirip Alpaca).&lt;br /&gt;
&lt;br /&gt;
```python&lt;br /&gt;
# Definisikan prompt template untuk Qwen2 Instruct&lt;br /&gt;
alpaca_prompt = &amp;quot;&amp;quot;&amp;quot;&amp;lt;|im_start|&amp;gt;system&lt;br /&gt;
You are a helpful assistant.&amp;lt;|im_end|&amp;gt;&lt;br /&gt;
&amp;lt;|im_start|&amp;gt;user&lt;br /&gt;
{}&amp;lt;|im_end|&amp;gt;&lt;br /&gt;
&amp;lt;|im_start|&amp;gt;assistant&lt;br /&gt;
{}&amp;lt;|im_end|&amp;gt;&amp;quot;&amp;quot;&amp;quot;  # Sesuaikan dengan formatmu; tambah EOS jika perlu&lt;br /&gt;
&lt;br /&gt;
EOS_TOKEN = tokenizer.eos_token&lt;br /&gt;
&lt;br /&gt;
def formatting_prompts_func(examples):&lt;br /&gt;
    instructions = examples[&amp;quot;instruction&amp;quot;]&lt;br /&gt;
    inputs = examples[&amp;quot;input&amp;quot;] if &amp;quot;input&amp;quot; in examples else [&amp;quot;&amp;quot;] * len(instructions)&lt;br /&gt;
    outputs = examples[&amp;quot;output&amp;quot;]&lt;br /&gt;
    texts = []&lt;br /&gt;
    for instruction, input_, output in zip(instructions, inputs, outputs):&lt;br /&gt;
        # Gabung input jika ada&lt;br /&gt;
        user_input = f&amp;quot;{instruction}\n{input_}&amp;quot; if input_ else instruction&lt;br /&gt;
        text = alpaca_prompt.format(user_input, output) + EOS_TOKEN&lt;br /&gt;
        texts.append(text)&lt;br /&gt;
    return {&amp;quot;text&amp;quot;: texts}&lt;br /&gt;
&lt;br /&gt;
# Load dataset JSONL&lt;br /&gt;
dataset = load_dataset(&amp;quot;json&amp;quot;, data_files={&amp;quot;train&amp;quot;: &amp;quot;path/to/your/file.jsonl&amp;quot;}, split=&amp;quot;train&amp;quot;)&lt;br /&gt;
dataset = dataset.map(formatting_prompts_func, batched=True)&lt;br /&gt;
&lt;br /&gt;
# Split jika perlu (80% train, 20% eval)&lt;br /&gt;
dataset = dataset.train_test_split(test_size=0.2)&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
#### Step 5: Training dengan SFTTrainer&lt;br /&gt;
Konfigurasi dan jalankan training.&lt;br /&gt;
&lt;br /&gt;
```python&lt;br /&gt;
from trl import SFTTrainer&lt;br /&gt;
from transformers import TrainingArguments&lt;br /&gt;
&lt;br /&gt;
trainer = SFTTrainer(&lt;br /&gt;
    model=model,&lt;br /&gt;
    tokenizer=tokenizer,&lt;br /&gt;
    train_dataset=dataset[&amp;quot;train&amp;quot;],&lt;br /&gt;
    eval_dataset=dataset[&amp;quot;test&amp;quot;],&lt;br /&gt;
    dataset_text_field=&amp;quot;text&amp;quot;,&lt;br /&gt;
    max_seq_length=max_seq_length,&lt;br /&gt;
    dataset_num_proc=2,&lt;br /&gt;
    packing=False,  # Untuk sequence panjang&lt;br /&gt;
    args=TrainingArguments(&lt;br /&gt;
        per_device_train_batch_size=2,  # Sesuaikan VRAM&lt;br /&gt;
        gradient_accumulation_steps=4,&lt;br /&gt;
        warmup_steps=5,&lt;br /&gt;
        max_steps=60,  # Atau num_epochs=1&lt;br /&gt;
        learning_rate=2e-4,&lt;br /&gt;
        fp16=not torch.cuda.is_bf16_supported(),&lt;br /&gt;
        bf16=torch.cuda.is_bf16_supported(),&lt;br /&gt;
        logging_steps=1,&lt;br /&gt;
        optim=&amp;quot;adamw_8bit&amp;quot;,&lt;br /&gt;
        weight_decay=0.01,&lt;br /&gt;
        lr_scheduler_type=&amp;quot;linear&amp;quot;,&lt;br /&gt;
        seed=3407,&lt;br /&gt;
        output_dir=&amp;quot;outputs&amp;quot;,&lt;br /&gt;
    ),&lt;br /&gt;
)&lt;br /&gt;
&lt;br /&gt;
trainer_stats = trainer.train()&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Simpan model setelah training:&lt;br /&gt;
```python&lt;br /&gt;
model.save_pretrained(&amp;quot;fine_tuned_qwen2_1.5b&amp;quot;)&lt;br /&gt;
tokenizer.save_pretrained(&amp;quot;fine_tuned_qwen2_1.5b&amp;quot;)&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
#### Step 6: Konversi ke Format GGUF untuk Ollama&lt;br /&gt;
Gunakan llama.cpp untuk convert. Ini memerlukan CPU/GPU, tapi bisa di-Colab.&lt;br /&gt;
&lt;br /&gt;
1. Clone repo:&lt;br /&gt;
   ```bash:disable-run&lt;br /&gt;
   git clone https://github.com/ggerganov/llama.cpp&lt;br /&gt;
   cd llama.cpp&lt;br /&gt;
   pip install -r requirements.txt&lt;br /&gt;
   ```&lt;br /&gt;
&lt;br /&gt;
2. Convert model Hugging Face ke GGUF (FP16 dulu):&lt;br /&gt;
   ```bash&lt;br /&gt;
   python convert.py /path/to/fine_tuned_qwen2_1.5b --outfile fine_tuned_qwen2.gguf --outtype f16&lt;br /&gt;
   ```&lt;br /&gt;
&lt;br /&gt;
3. Quantize (opsional, untuk hemat memori; Q4_K_M bagus untuk 1.5B):&lt;br /&gt;
   ```bash&lt;br /&gt;
   ./llama-quantize fine_tuned_qwen2.gguf fine_tuned_qwen2_q4.gguf Q4_K_M&lt;br /&gt;
   ```&lt;br /&gt;
&lt;br /&gt;
#### Step 7: Deployment ke Ollama&lt;br /&gt;
Install Ollama jika belum (https://ollama.com/download).&lt;br /&gt;
&lt;br /&gt;
1. Buat file `Modelfile` (tanpa ekstensi) di folder yang sama dengan GGUF:&lt;br /&gt;
   ```&lt;br /&gt;
   FROM ./fine_tuned_qwen2_q4.gguf&lt;br /&gt;
&lt;br /&gt;
   TEMPLATE &amp;quot;&amp;quot;&amp;quot;&amp;lt;|im_start|&amp;gt;system&lt;br /&gt;
   You are a helpful assistant.&amp;lt;|im_end|&amp;gt;&lt;br /&gt;
   &amp;lt;|im_start|&amp;gt;user&lt;br /&gt;
   {{ .Prompt }}&amp;lt;|im_end|&amp;gt;&lt;br /&gt;
   &amp;lt;|im_start|&amp;gt;assistant&lt;br /&gt;
   &amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
&lt;br /&gt;
   PARAMETER stop &amp;quot;&amp;lt;|im_end|&amp;gt;&amp;quot;&lt;br /&gt;
   PARAMETER stop &amp;quot;&amp;lt;|im_start|&amp;gt;&amp;quot;&lt;br /&gt;
   PARAMETER num_ctx 2048  # Sesuaikan dengan max_seq_length&lt;br /&gt;
   PARAMETER temperature 0.7&lt;br /&gt;
   PARAMETER top_p 0.9&lt;br /&gt;
   ```&lt;br /&gt;
&lt;br /&gt;
   - Template ini untuk Qwen2 Instruct; sesuaikan jika custom.&lt;br /&gt;
&lt;br /&gt;
2. Create dan run model:&lt;br /&gt;
   ```bash&lt;br /&gt;
   ollama create my_fine_tuned_qwen -f Modelfile&lt;br /&gt;
   ollama run my_fine_tuned_qwen&lt;br /&gt;
   ```&lt;br /&gt;
&lt;br /&gt;
Sekarang model siap digunakan di Ollama! Test dengan prompt sederhana.&lt;br /&gt;
&lt;br /&gt;
**Tips Tambahan:**&lt;br /&gt;
- Jika looping infinite (masalah umum), tambah `PARAMETER num_predict -1` di Modelfile atau cek rope_freq_base di metadata HF.&lt;br /&gt;
- Dataset besar? Gunakan `num_epochs=1` dan monitor loss.&lt;br /&gt;
- Error VRAM? Turunkan batch_size atau gunakan 8-bit optimizer.&lt;br /&gt;
- Sumber: Berdasarkan tutorial Unsloth untuk Qwen2, konversi llama.cpp, dan Ollama docs untuk Qwen.&lt;br /&gt;
&lt;br /&gt;
Jika butuh bantuan debug kode atau contoh JSONL, beri tahu!&lt;br /&gt;
```&lt;/div&gt;</summary>
		<author><name>Unknown user</name></author>
	</entry>
</feed>