LLM: tips untuk CPU: Difference between revisions
From OnnoCenterWiki
Jump to navigationJump to search
Created page with "Kata CGPT: saat pake CPU, coba: 1. Batch Processing u. kurangi overhead & speedup embedding. 2. Kurangi presisi model; float32->float16/int8; speedup tanpa korbankan akuras..." |
No edit summary |
||
| Line 5: | Line 5: | ||
4. Multi-threading. | 4. Multi-threading. | ||
5. Gunakan Intel MKL / OpenBLAS. | 5. Gunakan Intel MKL / OpenBLAS. | ||
saya pakai model intfloat pak, lumayan cepet di CPU | |||
https://huggingface.co/intfloat/multilingual-e5-large | |||
Revision as of 21:25, 16 July 2024
Kata CGPT: saat pake CPU, coba:
1. Batch Processing u. kurangi overhead & speedup embedding. 2. Kurangi presisi model; float32->float16/int8; speedup tanpa korbankan akurasi. 3. Buat versi kecil dari model yg sama. 4. Multi-threading. 5. Gunakan Intel MKL / OpenBLAS.
saya pakai model intfloat pak, lumayan cepet di CPU
https://huggingface.co/intfloat/multilingual-e5-large