Files
FlareBlog/content/en/posts/quantization-llama-cpp/index.md
2024-03-12 00:36:05 -04:00

1.6 KiB

title, subtitle, date, slug, draft, author, description, keywords, license, comment, weight, tags, categories, hiddenFromHomePage, hiddenFromSearch, hiddenFromRss, hiddenFromRelated, summary, resources, toc, math, lightgallery, password, message, repost
title subtitle date slug draft author description keywords license comment weight tags categories hiddenFromHomePage hiddenFromSearch hiddenFromRss hiddenFromRelated summary resources toc math lightgallery password message repost
Choice a Ideal Quantization Type for llama.cpp 2024-03-09T20:59:27-05:00 quantization-llama-cpp true
name link email avatar
James https://www.jamesflare.com /site-logo.avif
true 0
LLM
Ollama
llama.cpp
AI
false false false false
name src
featured-image featured-image.jpg
name src
featured-image-preview featured-image-preview.jpg
true false false
enable url
true
Q Type Size ppl Change Note
Q2_K_S 2.16G +9.0634 @ LLaMA-v1-7B
Q2_K 2.63G +0.6717 @ LLaMA-v1-7B
Q3_K_S 2.75G +0.5551 @ LLaMA-v1-7B
Q3_K - - alias for Q3_K_M
Q3_K_M 3.07G +0.2496 @ LLaMA-v1-7B
Q3_K_L 3.35G +0.1764 @ LLaMA-v1-7B
Q4_0 3.56G +0.2166 @ LLaMA-v1-7B
Q4_K_S 3.59G +0.0992 @ LLaMA-v1-7B
Q4_K - - alias for Q4_K_M
Q4_K_M 3.80G +0.0532 @ LLaMA-v1-7B
Q4_1 3.90G +0.1585 @ LLaMA-v1-7B
Q5_0 4.33G +0.0683 @ LLaMA-v1-7B
Q5_K_S 4.33G +0.0400 @ LLaMA-v1-7B
Q5_1 4.70G +0.0349 @ LLaMA-v1-7B
Q5_K - - alias for Q5_K_M
Q5_K_M 4.45G +0.0122 @ LLaMA-v1-7B
Q6_K 5.15G +0.0008 @ LLaMA-v1-7B
Q8_0 6.70G +0.0004 @ LLaMA-v1-7B