Files
FlareBlog/content/en/posts/quantization-llama-cpp/index.md
2024-03-12 00:36:05 -04:00

66 lines
1.6 KiB
Markdown

---
title: Choice a Ideal Quantization Type for llama.cpp
subtitle:
date: 2024-03-09T20:59:27-05:00
slug: quantization-llama-cpp
draft: true
author:
name: James
link: https://www.jamesflare.com
email:
avatar: /site-logo.avif
description:
keywords:
license:
comment: true
weight: 0
tags:
- LLM
- Ollama
- llama.cpp
categories:
- AI
hiddenFromHomePage: false
hiddenFromSearch: false
hiddenFromRss: false
hiddenFromRelated: false
summary:
resources:
- name: featured-image
src: featured-image.jpg
- name: featured-image-preview
src: featured-image-preview.jpg
toc: true
math: false
lightgallery: false
password:
message:
repost:
enable: true
url:
# See details front matter: https://fixit.lruihao.cn/documentation/content-management/introduction/#front-matter
---
<!--more-->
| Q Type | Size | ppl Change | Note |
|:---:|:---:|:---:|:---:|
| Q2\_K\_S | 2.16G | +9.0634 | @ LLaMA-v1-7B |
| Q2\_K | 2.63G | +0.6717 | @ LLaMA-v1-7B |
| Q3\_K\_S | 2.75G | +0.5551 | @ LLaMA-v1-7B |
| Q3\_K | - | - | alias for Q3\_K\_M |
| Q3\_K\_M | 3.07G | +0.2496 | @ LLaMA-v1-7B |
| Q3\_K\_L | 3.35G | +0.1764 | @ LLaMA-v1-7B |
| Q4\_0 | 3.56G | +0.2166 | @ LLaMA-v1-7B |
| Q4\_K\_S | 3.59G | +0.0992 | @ LLaMA-v1-7B |
| Q4\_K | - | - | alias for Q4\_K\_M |
| Q4\_K\_M | 3.80G | +0.0532 | @ LLaMA-v1-7B |
| Q4\_1 | 3.90G | +0.1585 | @ LLaMA-v1-7B |
| Q5\_0 | 4.33G | +0.0683 | @ LLaMA-v1-7B |
| Q5\_K\_S | 4.33G | +0.0400 | @ LLaMA-v1-7B |
| Q5\_1 | 4.70G | +0.0349 | @ LLaMA-v1-7B |
| Q5\_K | - | - | alias for Q5\_K\_M |
| Q5\_K\_M | 4.45G | +0.0122 | @ LLaMA-v1-7B |
| Q6\_K | 5.15G | +0.0008 | @ LLaMA-v1-7B |
| Q8\_0 | 6.70G | +0.0004 | @ LLaMA-v1-7B |