FlareBlog/content/en/posts/quantization-llama-cpp/index.md at 33584d0123558f9db59be901328c78bc9168a8f5

Files

2024-03-12 00:36:05 -04:00

1.6 KiB

Raw Blame History

title, subtitle, date, slug, draft, author, description, keywords, license, comment, weight, tags, categories, hiddenFromHomePage, hiddenFromSearch, hiddenFromRss, hiddenFromRelated, summary, resources, toc, math, lightgallery, password, message, repost

title

subtitle

date

slug

draft

author

description

keywords

license

comment

weight

categories

hiddenFromHomePage

hiddenFromSearch

hiddenFromRss

hiddenFromRelated

summary

resources

toc

math

lightgallery

password

message

repost

Choice a Ideal Quantization Type for llama.cpp

2024-03-09T20:59:27-05:00

quantization-llama-cpp

true

name	link	email	avatar
James	https://www.jamesflare.com		/site-logo.avif

true

LLM

Ollama

llama.cpp

false

name	src
featured-image	featured-image.jpg

name	src
featured-image-preview	featured-image-preview.jpg

true

false

enable	url
true

Q Type	Size	ppl Change	Note
Q2_K_S	2.16G	+9.0634	@ LLaMA-v1-7B
Q2_K	2.63G	+0.6717	@ LLaMA-v1-7B
Q3_K_S	2.75G	+0.5551	@ LLaMA-v1-7B
Q3_K	-	-	alias for Q3_K_M
Q3_K_M	3.07G	+0.2496	@ LLaMA-v1-7B
Q3_K_L	3.35G	+0.1764	@ LLaMA-v1-7B
Q4_0	3.56G	+0.2166	@ LLaMA-v1-7B
Q4_K_S	3.59G	+0.0992	@ LLaMA-v1-7B
Q4_K	-	-	alias for Q4_K_M
Q4_K_M	3.80G	+0.0532	@ LLaMA-v1-7B
Q4_1	3.90G	+0.1585	@ LLaMA-v1-7B
Q5_0	4.33G	+0.0683	@ LLaMA-v1-7B
Q5_K_S	4.33G	+0.0400	@ LLaMA-v1-7B
Q5_1	4.70G	+0.0349	@ LLaMA-v1-7B
Q5_K	-	-	alias for Q5_K_M
Q5_K_M	4.45G	+0.0122	@ LLaMA-v1-7B
Q6_K	5.15G	+0.0008	@ LLaMA-v1-7B
Q8_0	6.70G	+0.0004	@ LLaMA-v1-7B

1.6 KiB Raw Blame History

1.6 KiB

Raw Blame History