LLM: tips untuk CPU: Difference between revisions

From OnnoCenterWiki
Jump to navigationJump to search
Created page with "Kata CGPT: saat pake CPU, coba: 1. Batch Processing u. kurangi overhead & speedup embedding. 2. Kurangi presisi model; float32->float16/int8; speedup tanpa korbankan akuras..."
 
No edit summary
Line 5: Line 5:
  4. Multi-threading.
  4. Multi-threading.
  5. Gunakan Intel MKL / OpenBLAS.
  5. Gunakan Intel MKL / OpenBLAS.
saya pakai model intfloat pak, lumayan cepet di CPU
https://huggingface.co/intfloat/multilingual-e5-large

Revision as of 21:25, 16 July 2024

Kata CGPT: saat pake CPU, coba:

1. Batch Processing u. kurangi overhead & speedup embedding.
2. Kurangi presisi model; float32->float16/int8;  speedup tanpa korbankan akurasi.
3. Buat versi kecil dari model yg sama.
4. Multi-threading.
5. Gunakan Intel MKL / OpenBLAS.

saya pakai model intfloat pak, lumayan cepet di CPU

https://huggingface.co/intfloat/multilingual-e5-large