LLM: tips untuk CPU

From OnnoCenterWiki
Revision as of 21:22, 16 July 2024 by Unknown user (talk) (Created page with "Kata CGPT: saat pake CPU, coba: 1. Batch Processing u. kurangi overhead & speedup embedding. 2. Kurangi presisi model; float32->float16/int8; speedup tanpa korbankan akuras...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Kata CGPT: saat pake CPU, coba:

1. Batch Processing u. kurangi overhead & speedup embedding.
2. Kurangi presisi model; float32->float16/int8;  speedup tanpa korbankan akurasi.
3. Buat versi kecil dari model yg sama.
4. Multi-threading.
5. Gunakan Intel MKL / OpenBLAS.