Most Efficient Large Language Models for AI PC#
This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2025.1, 13 April 2025.
The tables below list the key performance indicators for inference on built-in GPUs.
Topology | Precision | Input Size | max rss memory | 1st latency (ms) | 2nd latency (ms) | 2nd token per sec (2nd lat^(-1)) |
---|---|---|---|---|---|---|
opt-125m-gptq | INT4-MIXED | 32 | 922.6 | 10.4 | 3.2 | 312.5 |
opt-125m-gptq | INT4-MIXED | 1024 | 999.2 | 18.1 | 3.4 | 294.1176471 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 32 | 3242 | 41.8 | 21.7 | 46.08294931 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 3322.1 | 50.8 | 24.6 | 40.6504065 |
phi-2 | INT4-MIXED | 32 | 3225.9 | 62 | 24.7 | 40.48582996 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 1024 | 3482.6 | 431.6 | 26.8 | 37.31343284 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 3700.1 | 411 | 29.5 | 33.89830508 |
phi-2 | INT4-MIXED | 1024 | 3623.8 | 384.1 | 29.6 | 33.78378378 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 32 | 4128.5 | 54.2 | 32.8 | 30.48780488 |
phi-2 | INT8-CW | 32 | 4183.2 | 56.3 | 33.7 | 29.6735905 |
stablelm-3b-4e1t | INT8-CW | 32 | 4301.3 | 53.2 | 34 | 29.41176471 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 4398.8 | 53.2 | 34.3 | 29.15451895 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 3449.3 | 49.8 | 34.7 | 28.8184438 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 1024 | 4389.7 | 467.1 | 36.1 | 27.70083102 |
phi-2 | INT8-CW | 1024 | 4424.9 | 415.9 | 36.6 | 27.32240437 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 4639.3 | 430 | 36.9 | 27.100271 |
stablelm-3b-4e1t | INT8-CW | 1024 | 4542 | 432.5 | 37 | 27.02702703 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 3598.4 | 398.8 | 37.8 | 26.45502646 |
flan-t5-xxl | INT4-MIXED | 33 | 13686.8 | 63.7 | 39.7 | 25.18891688 |
chatglm3-6b | INT4-MIXED | 32 | 5116.2 | 69.7 | 42.9 | 23.31002331 |
chatglm3-6b | INT4-MIXED | 1024 | 4642 | 779.3 | 45.1 | 22.172949 |
flan-t5-xxl | INT4-MIXED | 1139 | 14965.1 | 243 | 48.8 | 20.49180328 |
codegen25-7b | INT4-MIXED | 32 | 5393 | 77 | 48.9 | 20.44989775 |
gpt-j-6b | INT4-MIXED | 32 | 5216.2 | 130.7 | 49.7 | 20.12072435 |
codegen25-7b | INT4-MIXED | 1024 | 5749.4 | 892.5 | 53.6 | 18.65671642 |
gpt-j-6b | INT4-MIXED | 1024 | 6187.8 | 801.8 | 53.8 | 18.58736059 |
falcon-7b-instruct | INT4-MIXED | 32 | 5357.4 | 85.2 | 55.5 | 18.01801802 |
falcon-7b-instruct | INT4-MIXED | 1024 | 5025.2 | 876.4 | 57.9 | 17.27115717 |
gemma-7b-it | INT4-MIXED | 32 | 6446.6 | 94.1 | 66.2 | 15.10574018 |
flan-t5-xxl | INT8-CW | 33 | 23323.5 | 220.7 | 67.2 | 14.88095238 |
llama-2-7b-gptq | INT4-MIXED | 32 | 4962.8 | 67.7 | 67.5 | 14.81481481 |
chatglm3-6b | INT8-CW | 32 | 7496.3 | 92.8 | 69.2 | 14.45086705 |
mistral-7b-v0.1 | INT4-MIXED | 32 | 5781.8 | 73.7 | 69.3 | 14.43001443 |
phi-2 | FP16 | 32 | 6914.7 | 83 | 69.4 | 14.4092219 |
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 6664.9 | 77.9 | 69.7 | 14.3472023 |
gemma-7b-it | INT4-MIXED | 1024 | 6752.7 | 1054 | 69.7 | 14.3472023 |
stablelm-3b-4e1t | FP16 | 32 | 6832 | 86.7 | 71.1 | 14.06469761 |
stable-zephyr-3b-dpo | FP16 | 32 | 6932.8 | 85.9 | 71.2 | 14.04494382 |
chatglm3-6b | INT8-CW | 1024 | 7260.3 | 732.5 | 71.2 | 14.04494382 |
llama-2-7b-gptq | INT4-MIXED | 1024 | 5745 | 761.4 | 72.1 | 13.86962552 |
mistral-7b-v0.1 | INT4-MIXED | 1024 | 5546.3 | 792.7 | 72.2 | 13.85041551 |
gpt-j-6b | INT8-CW | 32 | 7470 | 113.1 | 72.5 | 13.79310345 |
phi-2 | FP16 | 1024 | 7064.8 | 460.8 | 72.5 | 13.79310345 |
red-pajama-incite-chat-3b-v1 | FP16 | 1024 | 7048.8 | 505.5 | 72.7 | 13.75515818 |
stablelm-3b-4e1t | FP16 | 1024 | 7001 | 479.5 | 74.1 | 13.49527665 |
stable-zephyr-3b-dpo | FP16 | 1024 | 7099.5 | 479.4 | 74.1 | 13.49527665 |
qwen-7b-chat-gptq | INT4-MIXED | 32 | 6273.1 | 122.8 | 74.2 | 13.47708895 |
chatglm3-6b-gptq | INT4-MIXED | 32 | 5547 | 66.2 | 75.4 | 13.26259947 |
gpt-j-6b | INT8-CW | 1024 | 8324.6 | 814 | 76.7 | 13.03780965 |
chatglm3-6b-gptq | INT4-MIXED | 1024 | 5340.5 | 651.8 | 77.2 | 12.95336788 |
flan-t5-xxl | INT8-CW | 1139 | 24798.5 | 304 | 77.5 | 12.90322581 |
codegen25-7b | INT8-CW | 32 | 8018.3 | 102.6 | 78.2 | 12.78772379 |
qwen-7b-chat-gptq | INT4-MIXED | 1024 | 6663.8 | 939.4 | 78.8 | 12.69035533 |
mistral-7b-v0.1 | INT8-CW | 32 | 8565.8 | 110.1 | 82.2 | 12.16545012 |
codegen25-7b | INT8-CW | 1024 | 8606.8 | 857.3 | 82.3 | 12.15066829 |
falcon-7b-instruct | INT8-CW | 32 | 8045.3 | 118.3 | 83.6 | 11.96172249 |
mistral-7b-v0.1 | INT8-CW | 1024 | 8384.8 | 890.1 | 85.1 | 11.75088132 |
falcon-7b-instruct | INT8-CW | 1024 | 7768.2 | 873.9 | 86.8 | 11.52073733 |
baichuan2-13b-chat | INT4-MIXED | 32 | 9146.9 | 139.2 | 90.4 | 11.0619469 |
baichuan2-13b-chat | INT4-MIXED | 1024 | 10292 | 2716.7 | 97.2 | 10.28806584 |
gemma-7b-it | INT8-CW | 32 | 9467.9 | 136 | 97.5 | 10.25641026 |
Topology | Precision | Input Size | max rss memory | 1st latency (ms) | 2nd latency (ms) | 2nd token per sec (2nd lat^(-1)) |
---|---|---|---|---|---|---|
opt-125m-gptq | INT4-MIXED | 32 | 1145.1 | 13.7 | 3.7 | 270.2702703 |
opt-125m-gptq | INT4-MIXED | 1024 | 1221.7 | 17.2 | 3.8 | 263.1578947 |
gemma-2b-it | INT4-MIXED | 32 | 3317.5 | 31.7 | 17.7 | 56.49717514 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 32 | 3389.9 | 45.4 | 17.9 | 55.86592179 |
dolly-v2-3b | INT4-MIXED | 32 | 3423 | 54.4 | 18.4 | 54.34782609 |
gemma-2b-it | INT4-MIXED | 1024 | 3317.5 | 203.9 | 18.7 | 53.47593583 |
phi-2 | INT4-MIXED | 32 | 3377.7 | 55.4 | 19.5 | 51.28205128 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 1024 | 3710 | 349 | 19.9 | 50.25125628 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 3436.6 | 58.2 | 20.3 | 49.26108374 |
dolly-v2-3b | INT4-MIXED | 1024 | 3812.8 | 392 | 20.4 | 49.01960784 |
phi-2 | INT4-MIXED | 1024 | 3888.2 | 333.7 | 21.5 | 46.51162791 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 3609.7 | 54.8 | 22.1 | 45.24886878 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 3952.6 | 342.4 | 22.6 | 44.24778761 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 3982.1 | 412.4 | 24 | 41.66666667 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 3796.6 | 45.4 | 25.6 | 39.0625 |
gemma-2b-it | INT8-CW | 32 | 3693.6 | 35.9 | 28.2 | 35.46099291 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 4370.2 | 479.4 | 28.2 | 35.46099291 |
gemma-2b-it | INT8-CW | 1024 | 3819.1 | 218.3 | 29 | 34.48275862 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 32 | 4323.9 | 40.5 | 30.1 | 33.22259136 |
dolly-v2-3b | INT8-CW | 32 | 4443.4 | 48.7 | 30.7 | 32.5732899 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 4473.1 | 48.1 | 31.1 | 32.15434084 |
stablelm-3b-4e1t | INT8-CW | 32 | 4372.4 | 48.8 | 31.1 | 32.15434084 |
phi-2 | INT8-CW | 32 | 4362.7 | 49 | 31.4 | 31.84713376 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 1024 | 4722.6 | 358.7 | 32.2 | 31.05590062 |
dolly-v2-3b | INT8-CW | 1024 | 4843.1 | 400.8 | 32.8 | 30.48780488 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 4835.6 | 339.4 | 33 | 30.3030303 |
stablelm-3b-4e1t | INT8-CW | 1024 | 4735.2 | 341.4 | 33.1 | 30.21148036 |
phi-2 | INT8-CW | 1024 | 4743.6 | 333.3 | 33.3 | 30.03003003 |
chatglm3-6b | INT4-MIXED | 32 | 5177.2 | 52.4 | 33.8 | 29.58579882 |
flan-t5-xxl | INT4-MIXED | 33 | 13487.1 | 54.4 | 35 | 28.57142857 |
chatglm3-6b | INT4-MIXED | 1024 | 5086.7 | 525.5 | 35.8 | 27.93296089 |
llama-2-7b-gptq | INT4-MIXED | 32 | 5242.6 | 64.2 | 37.7 | 26.52519894 |
codegen25-7b | INT4-MIXED | 32 | 5509.4 | 64.2 | 38 | 26.31578947 |
flan-t5-xxl | INT4-MIXED | 1139 | 15154.2 | 208.5 | 39.2 | 25.51020408 |
gpt-j-6b | INT4-MIXED | 32 | 5150.3 | 73.6 | 40 | 25 |
chatglm3-6b-gptq | INT4-MIXED | 32 | 5753.8 | 69.7 | 40.4 | 24.75247525 |
llama-2-7b-gptq | INT4-MIXED | 1024 | 5974.5 | 828.1 | 41.4 | 24.15458937 |
codegen25-7b | INT4-MIXED | 1024 | 5974.3 | 625 | 41.6 | 24.03846154 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 5367.1 | 53.3 | 42.3 | 23.64066194 |
chatglm3-6b-gptq | INT4-MIXED | 1024 | 5569.6 | 594.2 | 42.7 | 23.41920375 |
qwen-7b-chat-gptq | INT4-MIXED | 32 | 6355.6 | 81.5 | 43.2 | 23.14814815 |
falcon-7b-instruct | INT4-MIXED | 32 | 5449.6 | 70.8 | 43.6 | 22.93577982 |
gpt-j-6b | INT4-MIXED | 1024 | 6350.8 | 649.3 | 43.7 | 22.88329519 |
mistral-7b-v0.1 | INT4-MIXED | 32 | 5805.1 | 73.8 | 44.6 | 22.42152466 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 5693.2 | 501.6 | 45.1 | 22.172949 |
falcon-7b-instruct | INT4-MIXED | 1024 | 5225.6 | 589.4 | 45.7 | 21.88183807 |
mistral-7b-v0.1 | INT4-MIXED | 1024 | 5716.1 | 679.8 | 47.5 | 21.05263158 |
qwen-7b-chat-gptq | INT4-MIXED | 1024 | 6891.4 | 919.7 | 47.5 | 21.05263158 |
zephyr-7b-beta | INT4-MIXED | 32 | 6069.5 | 76.5 | 48.2 | 20.74688797 |
baichuan2-7b-chat | INT4-MIXED | 32 | 6579.9 | 75.5 | 49.7 | 20.12072435 |
zephyr-7b-beta | INT4-MIXED | 1024 | 6031.1 | 684.8 | 51.5 | 19.41747573 |
gemma-7b-it | INT4-MIXED | 32 | 6544.9 | 88.9 | 51.8 | 19.30501931 |
baichuan2-7b-chat | INT4-MIXED | 1024 | 7236.5 | 1629.9 | 54.6 | 18.31501832 |
gemma-7b-it | INT4-MIXED | 1024 | 6993.3 | 949.9 | 57.2 | 17.48251748 |
qwen-7b-chat | INT4-MIXED | 32 | 7311 | 80.1 | 58.3 | 17.15265866 |
gemma-2b-it | FP16 | 32 | 6159.8 | 66.1 | 62.2 | 16.07717042 |
gemma-2b-it | FP16 | 1024 | 6301.2 | 253.2 | 63.1 | 15.84786054 |
qwen-7b-chat | INT4-MIXED | 1024 | 7873 | 673.6 | 63.4 | 15.77287066 |
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 6872.8 | 65.7 | 63.7 | 15.69858713 |
phi-2 | FP16 | 32 | 6903.9 | 67.2 | 64.2 | 15.57632399 |
dolly-v2-3b | FP16 | 32 | 6890.6 | 66.5 | 64.5 | 15.50387597 |
stable-zephyr-3b-dpo | FP16 | 32 | 7138.1 | 63.4 | 65.5 | 15.26717557 |
stablelm-3b-4e1t | FP16 | 32 | 7041.8 | 63.2 | 65.6 | 15.24390244 |
red-pajama-incite-chat-3b-v1 | FP16 | 1024 | 7273.4 | 392.6 | 65.7 | 15.22070015 |
chatglm3-6b | INT8-CW | 32 | 7561.1 | 78.4 | 66.2 | 15.10574018 |
phi-2 | FP16 | 1024 | 7304.4 | 363.9 | 66.5 | 15.03759398 |
dolly-v2-3b | FP16 | 1024 | 7291.5 | 427.4 | 66.8 | 14.97005988 |
stable-zephyr-3b-dpo | FP16 | 1024 | 7556.6 | 339.1 | 67.5 | 14.81481481 |
stablelm-3b-4e1t | FP16 | 1024 | 7460.5 | 339.6 | 67.6 | 14.79289941 |
chatglm3-6b | INT8-CW | 1024 | 7479.9 | 585.8 | 68.8 | 14.53488372 |
flan-t5-xxl | INT8-CW | 33 | 23367.6 | 255.4 | 69.4 | 14.4092219 |
falcon-7b-instruct | INT8-CW | 32 | 8115 | 88.5 | 73.9 | 13.53179973 |
flan-t5-xxl | INT8-CW | 1139 | 24996.3 | 262.7 | 74.4 | 13.44086022 |
gpt-j-6b | INT8-CW | 32 | 7522.7 | 87.7 | 75.7 | 13.21003963 |
falcon-7b-instruct | INT8-CW | 1024 | 8033.7 | 818.8 | 76.6 | 13.05483029 |
codegen25-7b | INT8-CW | 32 | 8304.5 | 88 | 76.8 | 13.02083333 |
baichuan2-13b-chat | INT4-MIXED | 32 | 9436.6 | 131.7 | 77.3 | 12.93661061 |
gpt-j-6b | INT8-CW | 1024 | 8611.2 | 723.9 | 79.4 | 12.59445844 |
baichuan2-7b-chat | INT8-CW | 32 | 8611.4 | 93.4 | 80.3 | 12.45330012 |
mistral-7b-v0.1 | INT8-CW | 32 | 8704.2 | 94 | 80.9 | 12.36093943 |
zephyr-7b-beta | INT8-CW | 32 | 8613.3 | 94.6 | 80.9 | 12.36093943 |
qwen-7b-chat | INT8-CW | 32 | 8954.3 | 94.6 | 81.3 | 12.300123 |
codegen25-7b | INT8-CW | 1024 | 8807.5 | 793.2 | 81.3 | 12.300123 |
mistral-7b-v0.1 | INT8-CW | 1024 | 8699.6 | 837.7 | 84.2 | 11.87648456 |
baichuan2-13b-chat | INT4-MIXED | 1024 | 10511.6 | 3082 | 84.3 | 11.8623962 |
phi-3-mini-4k-instruct | FP16 | 32 | 8803.8 | 87.6 | 84.4 | 11.84834123 |
zephyr-7b-beta | INT8-CW | 1024 | 8609.6 | 872.8 | 84.4 | 11.84834123 |
baichuan2-7b-chat | INT8-CW | 1024 | 9505.5 | 1838.8 | 84.7 | 11.80637544 |
qwen-7b-chat | INT8-CW | 1024 | 9772.6 | 811.6 | 85.7 | 11.66861144 |
phi-3-mini-4k-instruct | FP16 | 1024 | 9373.8 | 462.9 | 87.2 | 11.46788991 |
starcoder | INT4-MIXED | 32 | 9531.5 | 145.8 | 88.4 | 11.31221719 |
gemma-7b-it | INT8-CW | 32 | 9620.7 | 115.4 | 92.5 | 10.81081081 |
starcoder | INT4-MIXED | 1024 | 9403.7 | 1825.5 | 92.8 | 10.77586207 |
gemma-7b-it | INT8-CW | 1024 | 10423.3 | 964.3 | 96.4 | 10.37344398 |
Topology | Precision | Input Size | max rss memory | 1st latency (ms) | 2nd latency (ms) | 2nd token per sec (2nd lat^(-1)) |
---|---|---|---|---|---|---|
opt-125m-gptq | INT4-MIXED | 32 | 1127.1 | 12.3 | 5 | 200 |
opt-125m-gptq | INT4-MIXED | 1024 | 1257.5 | 50.9 | 5.5 | 181.8181818 |
phi-2 | INT4-MIXED | 32 | 3258.8 | 69.5 | 26.8 | 37.31343284 |
dolly-v2-3b | INT4-MIXED | 32 | 3157.7 | 71.4 | 26.9 | 37.17472119 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 32 | 3126.6 | 70.7 | 27.5 | 36.36363636 |
gemma-2b-it | INT4-MIXED | 32 | 3752.9 | 62.6 | 27.6 | 36.23188406 |
gemma-2b-it | INT4-MIXED | 1024 | 3720.2 | 768 | 28.4 | 35.21126761 |
dolly-v2-3b | INT4-MIXED | 1024 | 3566 | 1114.5 | 29.9 | 33.44481605 |
phi-2 | INT4-MIXED | 1024 | 3661.8 | 1086.6 | 29.9 | 33.44481605 |
red-pajama-incite-chat-3b-v1 | INT4-MIXED | 1024 | 3524.1 | 1112.5 | 30.4 | 32.89473684 |
stablelm-3b-4e1t | INT4-MIXED | 32 | 3494.2 | 82.1 | 34.5 | 28.98550725 |
phi-3-mini-4k-instruct | INT4-MIXED | 32 | 3851.1 | 97.2 | 37.3 | 26.80965147 |
stablelm-3b-4e1t | INT4-MIXED | 1024 | 3836.3 | 1111.2 | 37.6 | 26.59574468 |
gemma-2b-it | INT8-CW | 32 | 4432.2 | 90 | 40.1 | 24.93765586 |
phi-3-mini-4k-instruct | INT4-MIXED | 1024 | 4168.5 | 1435.3 | 40.7 | 24.57002457 |
gemma-2b-it | INT8-CW | 1024 | 4412.8 | 820.4 | 40.9 | 24.44987775 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 32 | 4213.6 | 122 | 41.7 | 23.98081535 |
phi-2 | INT8-CW | 32 | 4249 | 103.8 | 42.1 | 23.75296912 |
dolly-v2-3b | INT8-CW | 32 | 4249.7 | 103.9 | 42.5 | 23.52941176 |
stablelm-3b-4e1t | INT8-CW | 32 | 4394.7 | 104.6 | 44 | 22.72727273 |
red-pajama-incite-chat-3b-v1 | INT8-CW | 1024 | 4619.9 | 1181.3 | 44.4 | 22.52252252 |
phi-2 | INT8-CW | 1024 | 4689.7 | 1166.8 | 45 | 22.22222222 |
dolly-v2-3b | INT8-CW | 1024 | 4685.4 | 1192.5 | 45.3 | 22.07505519 |
stablelm-3b-4e1t | INT8-CW | 1024 | 4572.9 | 1210.4 | 46.9 | 21.32196162 |
stable-zephyr-3b-dpo | INT4-MIXED | 32 | 3311.9 | 112.3 | 49.8 | 20.08032129 |
chatglm3-6b | INT4-MIXED | 32 | 4960 | 141.1 | 50.9 | 19.64636542 |
chatglm3-6b | INT4-MIXED | 1024 | 4793.1 | 2082.5 | 52.4 | 19.08396947 |
gpt-j-6b | INT4-MIXED | 32 | 5136.9 | 136.3 | 53.3 | 18.76172608 |
flan-t5-xxl | INT4-MIXED | 33 | 13564.8 | 91.3 | 54 | 18.51851852 |
stable-zephyr-3b-dpo | INT4-MIXED | 1024 | 3720.9 | 1385.5 | 55.2 | 18.11594203 |
phi-3-mini-4k-instruct | INT8-CW | 32 | 5144.3 | 111.5 | 56.5 | 17.69911504 |
gpt-j-6b | INT4-MIXED | 1024 | 6309.5 | 2268.2 | 57.2 | 17.48251748 |
codegen25-7b | INT4-MIXED | 32 | 5257.9 | 160.4 | 58.7 | 17.03577513 |
chatglm3-6b-gptq | INT4-MIXED | 32 | 5783.9 | 160.9 | 58.9 | 16.97792869 |
llama-2-7b-gptq | INT4-MIXED | 32 | 5203.2 | 160.5 | 59 | 16.94915254 |
flan-t5-xxl | INT4-MIXED | 1139 | 15446.9 | 451.5 | 59.6 | 16.77852349 |
phi-3-mini-4k-instruct | INT8-CW | 1024 | 5723.8 | 1589.4 | 59.8 | 16.72240803 |
chatglm3-6b-gptq | INT4-MIXED | 1024 | 5488.1 | 1991.8 | 61.1 | 16.36661211 |
stable-zephyr-3b-dpo | INT8-CW | 32 | 4228.1 | 158.5 | 61.2 | 16.33986928 |
codegen25-7b | INT4-MIXED | 1024 | 5904.5 | 2448.2 | 62.9 | 15.89825119 |
mistral-7b-v0.1 | INT4-MIXED | 32 | 5669.7 | 178.4 | 63.3 | 15.79778831 |
qwen-7b-chat-gptq | INT4-MIXED | 32 | 6092.5 | 171.4 | 64.7 | 15.45595054 |
falcon-7b-instruct | INT4-MIXED | 32 | 5624.3 | 197.3 | 65.1 | 15.3609831 |
mistral-7b-v0.1 | INT4-MIXED | 1024 | 5730.7 | 2562.1 | 65.8 | 15.19756839 |
falcon-7b-instruct | INT4-MIXED | 1024 | 5562.6 | 2566.4 | 66.5 | 15.03759398 |
stable-zephyr-3b-dpo | INT8-CW | 1024 | 4670 | 1548.6 | 67.1 | 14.90312966 |
llama-2-7b-gptq | INT4-MIXED | 1024 | 6361.5 | 2368.4 | 67.6 | 14.79289941 |
zephyr-7b-beta | INT4-MIXED | 32 | 5918.5 | 178.6 | 67.8 | 14.74926254 |
qwen-7b-chat-gptq | INT4-MIXED | 1024 | 6784.8 | 2446.5 | 68.9 | 14.5137881 |
zephyr-7b-beta | INT4-MIXED | 1024 | 6098.7 | 2548 | 70.2 | 14.24501425 |
gemma-2b-it | FP16 | 32 | 7458.3 | 95.2 | 70.7 | 14.14427157 |
gemma-2b-it | FP16 | 1024 | 7199.2 | 1407.4 | 71.6 | 13.96648045 |
baichuan2-7b-chat | INT4-MIXED | 32 | 6542.8 | 157.5 | 71.8 | 13.9275766 |
phi-2 | FP16 | 32 | 6826.6 | 104.5 | 74 | 13.51351351 |
dolly-v2-3b | FP16 | 32 | 6826.4 | 104.5 | 74.2 | 13.47708895 |
red-pajama-incite-chat-3b-v1 | FP16 | 32 | 6797.2 | 108.9 | 74.4 | 13.44086022 |
stablelm-3b-4e1t | FP16 | 32 | 6739.5 | 110.5 | 75.9 | 13.17523057 |
baichuan2-7b-chat | INT4-MIXED | 1024 | 7262 | 2959.3 | 76.1 | 13.14060447 |
phi-2 | FP16 | 1024 | 7623.9 | 1465.3 | 78.6 | 12.72264631 |
dolly-v2-3b | FP16 | 1024 | 7645.3 | 1489.3 | 78.9 | 12.67427123 |
red-pajama-incite-chat-3b-v1 | FP16 | 1024 | 7595.3 | 1499.6 | 79 | 12.65822785 |
qwen-7b-chat | INT4-MIXED | 32 | 7296.4 | 152.4 | 80.1 | 12.48439451 |
stablelm-3b-4e1t | FP16 | 1024 | 7550.6 | 1474.6 | 80.7 | 12.39157373 |
qwen-7b-chat | INT4-MIXED | 1024 | 7997.9 | 2530 | 84.4 | 11.84834123 |
gpt-j-6b | INT8-CW | 32 | 7431.5 | 154.2 | 84.6 | 11.82033097 |
chatglm3-6b | INT8-CW | 32 | 7445.3 | 154.4 | 85.1 | 11.75088132 |
chatglm3-6b | INT8-CW | 1024 | 7415.9 | 2680.7 | 86.6 | 11.54734411 |
stable-zephyr-3b-dpo | FP16 | 32 | 7029.8 | 149.8 | 87.4 | 11.4416476 |
gpt-j-6b | INT8-CW | 1024 | 8402.4 | 2379.2 | 88.6 | 11.28668172 |
flan-t5-xxl | INT8-CW | 33 | 20214.9 | 162.5 | 91.6 | 10.91703057 |
codegen25-7b | INT8-CW | 32 | 8327.7 | 159.4 | 95.4 | 10.48218029 |
falcon-7b-instruct | INT8-CW | 32 | 8515.7 | 196.9 | 95.9 | 10.42752868 |
stable-zephyr-3b-dpo | FP16 | 1024 | 7648.4 | 2633.5 | 95.9 | 10.42752868 |
flan-t5-xxl | INT8-CW | 1139 | 22256 | 540.9 | 97.1 | 10.29866117 |
falcon-7b-instruct | INT8-CW | 1024 | 8313.2 | 2786.1 | 97.5 | 10.25641026 |
codegen25-7b | INT8-CW | 1024 | 8852.2 | 2705.5 | 99.6 | 10.04016064 |
All models listed here were tested with the following parameters:
Framework: PyTorch
Beam: 1
Batch size: 1