Most Efficient Large Language Models for AI PC#

This page is regularly updated to help you identify the best-performing LLMs on the Intel® Core™ Ultra processor family and AI PCs. The current data is as of OpenVINO 2025.1, 13 April 2025.

The tables below list the key performance indicators for inference on built-in GPUs.

Topology

Precision

Input Size

max rss memory

1st latency (ms)

2nd latency (ms)

2nd token per sec (2nd lat^(-1))

opt-125m-gptq

INT4-MIXED

32

922.6

10.4

3.2

312.5

opt-125m-gptq

INT4-MIXED

1024

999.2

18.1

3.4

294.1176471

red-pajama-incite-chat-3b-v1

INT4-MIXED

32

3242

41.8

21.7

46.08294931

stable-zephyr-3b-dpo

INT4-MIXED

32

3322.1

50.8

24.6

40.6504065

phi-2

INT4-MIXED

32

3225.9

62

24.7

40.48582996

red-pajama-incite-chat-3b-v1

INT4-MIXED

1024

3482.6

431.6

26.8

37.31343284

stable-zephyr-3b-dpo

INT4-MIXED

1024

3700.1

411

29.5

33.89830508

phi-2

INT4-MIXED

1024

3623.8

384.1

29.6

33.78378378

red-pajama-incite-chat-3b-v1

INT8-CW

32

4128.5

54.2

32.8

30.48780488

phi-2

INT8-CW

32

4183.2

56.3

33.7

29.6735905

stablelm-3b-4e1t

INT8-CW

32

4301.3

53.2

34

29.41176471

stable-zephyr-3b-dpo

INT8-CW

32

4398.8

53.2

34.3

29.15451895

stablelm-3b-4e1t

INT4-MIXED

32

3449.3

49.8

34.7

28.8184438

red-pajama-incite-chat-3b-v1

INT8-CW

1024

4389.7

467.1

36.1

27.70083102

phi-2

INT8-CW

1024

4424.9

415.9

36.6

27.32240437

stable-zephyr-3b-dpo

INT8-CW

1024

4639.3

430

36.9

27.100271

stablelm-3b-4e1t

INT8-CW

1024

4542

432.5

37

27.02702703

stablelm-3b-4e1t

INT4-MIXED

1024

3598.4

398.8

37.8

26.45502646

flan-t5-xxl

INT4-MIXED

33

13686.8

63.7

39.7

25.18891688

chatglm3-6b

INT4-MIXED

32

5116.2

69.7

42.9

23.31002331

chatglm3-6b

INT4-MIXED

1024

4642

779.3

45.1

22.172949

flan-t5-xxl

INT4-MIXED

1139

14965.1

243

48.8

20.49180328

codegen25-7b

INT4-MIXED

32

5393

77

48.9

20.44989775

gpt-j-6b

INT4-MIXED

32

5216.2

130.7

49.7

20.12072435

codegen25-7b

INT4-MIXED

1024

5749.4

892.5

53.6

18.65671642

gpt-j-6b

INT4-MIXED

1024

6187.8

801.8

53.8

18.58736059

falcon-7b-instruct

INT4-MIXED

32

5357.4

85.2

55.5

18.01801802

falcon-7b-instruct

INT4-MIXED

1024

5025.2

876.4

57.9

17.27115717

gemma-7b-it

INT4-MIXED

32

6446.6

94.1

66.2

15.10574018

flan-t5-xxl

INT8-CW

33

23323.5

220.7

67.2

14.88095238

llama-2-7b-gptq

INT4-MIXED

32

4962.8

67.7

67.5

14.81481481

chatglm3-6b

INT8-CW

32

7496.3

92.8

69.2

14.45086705

mistral-7b-v0.1

INT4-MIXED

32

5781.8

73.7

69.3

14.43001443

phi-2

FP16

32

6914.7

83

69.4

14.4092219

red-pajama-incite-chat-3b-v1

FP16

32

6664.9

77.9

69.7

14.3472023

gemma-7b-it

INT4-MIXED

1024

6752.7

1054

69.7

14.3472023

stablelm-3b-4e1t

FP16

32

6832

86.7

71.1

14.06469761

stable-zephyr-3b-dpo

FP16

32

6932.8

85.9

71.2

14.04494382

chatglm3-6b

INT8-CW

1024

7260.3

732.5

71.2

14.04494382

llama-2-7b-gptq

INT4-MIXED

1024

5745

761.4

72.1

13.86962552

mistral-7b-v0.1

INT4-MIXED

1024

5546.3

792.7

72.2

13.85041551

gpt-j-6b

INT8-CW

32

7470

113.1

72.5

13.79310345

phi-2

FP16

1024

7064.8

460.8

72.5

13.79310345

red-pajama-incite-chat-3b-v1

FP16

1024

7048.8

505.5

72.7

13.75515818

stablelm-3b-4e1t

FP16

1024

7001

479.5

74.1

13.49527665

stable-zephyr-3b-dpo

FP16

1024

7099.5

479.4

74.1

13.49527665

qwen-7b-chat-gptq

INT4-MIXED

32

6273.1

122.8

74.2

13.47708895

chatglm3-6b-gptq

INT4-MIXED

32

5547

66.2

75.4

13.26259947

gpt-j-6b

INT8-CW

1024

8324.6

814

76.7

13.03780965

chatglm3-6b-gptq

INT4-MIXED

1024

5340.5

651.8

77.2

12.95336788

flan-t5-xxl

INT8-CW

1139

24798.5

304

77.5

12.90322581

codegen25-7b

INT8-CW

32

8018.3

102.6

78.2

12.78772379

qwen-7b-chat-gptq

INT4-MIXED

1024

6663.8

939.4

78.8

12.69035533

mistral-7b-v0.1

INT8-CW

32

8565.8

110.1

82.2

12.16545012

codegen25-7b

INT8-CW

1024

8606.8

857.3

82.3

12.15066829

falcon-7b-instruct

INT8-CW

32

8045.3

118.3

83.6

11.96172249

mistral-7b-v0.1

INT8-CW

1024

8384.8

890.1

85.1

11.75088132

falcon-7b-instruct

INT8-CW

1024

7768.2

873.9

86.8

11.52073733

baichuan2-13b-chat

INT4-MIXED

32

9146.9

139.2

90.4

11.0619469

baichuan2-13b-chat

INT4-MIXED

1024

10292

2716.7

97.2

10.28806584

gemma-7b-it

INT8-CW

32

9467.9

136

97.5

10.25641026

Topology

Precision

Input Size

max rss memory

1st latency (ms)

2nd latency (ms)

2nd token per sec (2nd lat^(-1))

opt-125m-gptq

INT4-MIXED

32

1145.1

13.7

3.7

270.2702703

opt-125m-gptq

INT4-MIXED

1024

1221.7

17.2

3.8

263.1578947

gemma-2b-it

INT4-MIXED

32

3317.5

31.7

17.7

56.49717514

red-pajama-incite-chat-3b-v1

INT4-MIXED

32

3389.9

45.4

17.9

55.86592179

dolly-v2-3b

INT4-MIXED

32

3423

54.4

18.4

54.34782609

gemma-2b-it

INT4-MIXED

1024

3317.5

203.9

18.7

53.47593583

phi-2

INT4-MIXED

32

3377.7

55.4

19.5

51.28205128

red-pajama-incite-chat-3b-v1

INT4-MIXED

1024

3710

349

19.9

50.25125628

stable-zephyr-3b-dpo

INT4-MIXED

32

3436.6

58.2

20.3

49.26108374

dolly-v2-3b

INT4-MIXED

1024

3812.8

392

20.4

49.01960784

phi-2

INT4-MIXED

1024

3888.2

333.7

21.5

46.51162791

stablelm-3b-4e1t

INT4-MIXED

32

3609.7

54.8

22.1

45.24886878

stable-zephyr-3b-dpo

INT4-MIXED

1024

3952.6

342.4

22.6

44.24778761

stablelm-3b-4e1t

INT4-MIXED

1024

3982.1

412.4

24

41.66666667

phi-3-mini-4k-instruct

INT4-MIXED

32

3796.6

45.4

25.6

39.0625

gemma-2b-it

INT8-CW

32

3693.6

35.9

28.2

35.46099291

phi-3-mini-4k-instruct

INT4-MIXED

1024

4370.2

479.4

28.2

35.46099291

gemma-2b-it

INT8-CW

1024

3819.1

218.3

29

34.48275862

red-pajama-incite-chat-3b-v1

INT8-CW

32

4323.9

40.5

30.1

33.22259136

dolly-v2-3b

INT8-CW

32

4443.4

48.7

30.7

32.5732899

stable-zephyr-3b-dpo

INT8-CW

32

4473.1

48.1

31.1

32.15434084

stablelm-3b-4e1t

INT8-CW

32

4372.4

48.8

31.1

32.15434084

phi-2

INT8-CW

32

4362.7

49

31.4

31.84713376

red-pajama-incite-chat-3b-v1

INT8-CW

1024

4722.6

358.7

32.2

31.05590062

dolly-v2-3b

INT8-CW

1024

4843.1

400.8

32.8

30.48780488

stable-zephyr-3b-dpo

INT8-CW

1024

4835.6

339.4

33

30.3030303

stablelm-3b-4e1t

INT8-CW

1024

4735.2

341.4

33.1

30.21148036

phi-2

INT8-CW

1024

4743.6

333.3

33.3

30.03003003

chatglm3-6b

INT4-MIXED

32

5177.2

52.4

33.8

29.58579882

flan-t5-xxl

INT4-MIXED

33

13487.1

54.4

35

28.57142857

chatglm3-6b

INT4-MIXED

1024

5086.7

525.5

35.8

27.93296089

llama-2-7b-gptq

INT4-MIXED

32

5242.6

64.2

37.7

26.52519894

codegen25-7b

INT4-MIXED

32

5509.4

64.2

38

26.31578947

flan-t5-xxl

INT4-MIXED

1139

15154.2

208.5

39.2

25.51020408

gpt-j-6b

INT4-MIXED

32

5150.3

73.6

40

25

chatglm3-6b-gptq

INT4-MIXED

32

5753.8

69.7

40.4

24.75247525

llama-2-7b-gptq

INT4-MIXED

1024

5974.5

828.1

41.4

24.15458937

codegen25-7b

INT4-MIXED

1024

5974.3

625

41.6

24.03846154

phi-3-mini-4k-instruct

INT8-CW

32

5367.1

53.3

42.3

23.64066194

chatglm3-6b-gptq

INT4-MIXED

1024

5569.6

594.2

42.7

23.41920375

qwen-7b-chat-gptq

INT4-MIXED

32

6355.6

81.5

43.2

23.14814815

falcon-7b-instruct

INT4-MIXED

32

5449.6

70.8

43.6

22.93577982

gpt-j-6b

INT4-MIXED

1024

6350.8

649.3

43.7

22.88329519

mistral-7b-v0.1

INT4-MIXED

32

5805.1

73.8

44.6

22.42152466

phi-3-mini-4k-instruct

INT8-CW

1024

5693.2

501.6

45.1

22.172949

falcon-7b-instruct

INT4-MIXED

1024

5225.6

589.4

45.7

21.88183807

mistral-7b-v0.1

INT4-MIXED

1024

5716.1

679.8

47.5

21.05263158

qwen-7b-chat-gptq

INT4-MIXED

1024

6891.4

919.7

47.5

21.05263158

zephyr-7b-beta

INT4-MIXED

32

6069.5

76.5

48.2

20.74688797

baichuan2-7b-chat

INT4-MIXED

32

6579.9

75.5

49.7

20.12072435

zephyr-7b-beta

INT4-MIXED

1024

6031.1

684.8

51.5

19.41747573

gemma-7b-it

INT4-MIXED

32

6544.9

88.9

51.8

19.30501931

baichuan2-7b-chat

INT4-MIXED

1024

7236.5

1629.9

54.6

18.31501832

gemma-7b-it

INT4-MIXED

1024

6993.3

949.9

57.2

17.48251748

qwen-7b-chat

INT4-MIXED

32

7311

80.1

58.3

17.15265866

gemma-2b-it

FP16

32

6159.8

66.1

62.2

16.07717042

gemma-2b-it

FP16

1024

6301.2

253.2

63.1

15.84786054

qwen-7b-chat

INT4-MIXED

1024

7873

673.6

63.4

15.77287066

red-pajama-incite-chat-3b-v1

FP16

32

6872.8

65.7

63.7

15.69858713

phi-2

FP16

32

6903.9

67.2

64.2

15.57632399

dolly-v2-3b

FP16

32

6890.6

66.5

64.5

15.50387597

stable-zephyr-3b-dpo

FP16

32

7138.1

63.4

65.5

15.26717557

stablelm-3b-4e1t

FP16

32

7041.8

63.2

65.6

15.24390244

red-pajama-incite-chat-3b-v1

FP16

1024

7273.4

392.6

65.7

15.22070015

chatglm3-6b

INT8-CW

32

7561.1

78.4

66.2

15.10574018

phi-2

FP16

1024

7304.4

363.9

66.5

15.03759398

dolly-v2-3b

FP16

1024

7291.5

427.4

66.8

14.97005988

stable-zephyr-3b-dpo

FP16

1024

7556.6

339.1

67.5

14.81481481

stablelm-3b-4e1t

FP16

1024

7460.5

339.6

67.6

14.79289941

chatglm3-6b

INT8-CW

1024

7479.9

585.8

68.8

14.53488372

flan-t5-xxl

INT8-CW

33

23367.6

255.4

69.4

14.4092219

falcon-7b-instruct

INT8-CW

32

8115

88.5

73.9

13.53179973

flan-t5-xxl

INT8-CW

1139

24996.3

262.7

74.4

13.44086022

gpt-j-6b

INT8-CW

32

7522.7

87.7

75.7

13.21003963

falcon-7b-instruct

INT8-CW

1024

8033.7

818.8

76.6

13.05483029

codegen25-7b

INT8-CW

32

8304.5

88

76.8

13.02083333

baichuan2-13b-chat

INT4-MIXED

32

9436.6

131.7

77.3

12.93661061

gpt-j-6b

INT8-CW

1024

8611.2

723.9

79.4

12.59445844

baichuan2-7b-chat

INT8-CW

32

8611.4

93.4

80.3

12.45330012

mistral-7b-v0.1

INT8-CW

32

8704.2

94

80.9

12.36093943

zephyr-7b-beta

INT8-CW

32

8613.3

94.6

80.9

12.36093943

qwen-7b-chat

INT8-CW

32

8954.3

94.6

81.3

12.300123

codegen25-7b

INT8-CW

1024

8807.5

793.2

81.3

12.300123

mistral-7b-v0.1

INT8-CW

1024

8699.6

837.7

84.2

11.87648456

baichuan2-13b-chat

INT4-MIXED

1024

10511.6

3082

84.3

11.8623962

phi-3-mini-4k-instruct

FP16

32

8803.8

87.6

84.4

11.84834123

zephyr-7b-beta

INT8-CW

1024

8609.6

872.8

84.4

11.84834123

baichuan2-7b-chat

INT8-CW

1024

9505.5

1838.8

84.7

11.80637544

qwen-7b-chat

INT8-CW

1024

9772.6

811.6

85.7

11.66861144

phi-3-mini-4k-instruct

FP16

1024

9373.8

462.9

87.2

11.46788991

starcoder

INT4-MIXED

32

9531.5

145.8

88.4

11.31221719

gemma-7b-it

INT8-CW

32

9620.7

115.4

92.5

10.81081081

starcoder

INT4-MIXED

1024

9403.7

1825.5

92.8

10.77586207

gemma-7b-it

INT8-CW

1024

10423.3

964.3

96.4

10.37344398

Topology

Precision

Input Size

max rss memory

1st latency (ms)

2nd latency (ms)

2nd token per sec (2nd lat^(-1))

opt-125m-gptq

INT4-MIXED

32

1127.1

12.3

5

200

opt-125m-gptq

INT4-MIXED

1024

1257.5

50.9

5.5

181.8181818

phi-2

INT4-MIXED

32

3258.8

69.5

26.8

37.31343284

dolly-v2-3b

INT4-MIXED

32

3157.7

71.4

26.9

37.17472119

red-pajama-incite-chat-3b-v1

INT4-MIXED

32

3126.6

70.7

27.5

36.36363636

gemma-2b-it

INT4-MIXED

32

3752.9

62.6

27.6

36.23188406

gemma-2b-it

INT4-MIXED

1024

3720.2

768

28.4

35.21126761

dolly-v2-3b

INT4-MIXED

1024

3566

1114.5

29.9

33.44481605

phi-2

INT4-MIXED

1024

3661.8

1086.6

29.9

33.44481605

red-pajama-incite-chat-3b-v1

INT4-MIXED

1024

3524.1

1112.5

30.4

32.89473684

stablelm-3b-4e1t

INT4-MIXED

32

3494.2

82.1

34.5

28.98550725

phi-3-mini-4k-instruct

INT4-MIXED

32

3851.1

97.2

37.3

26.80965147

stablelm-3b-4e1t

INT4-MIXED

1024

3836.3

1111.2

37.6

26.59574468

gemma-2b-it

INT8-CW

32

4432.2

90

40.1

24.93765586

phi-3-mini-4k-instruct

INT4-MIXED

1024

4168.5

1435.3

40.7

24.57002457

gemma-2b-it

INT8-CW

1024

4412.8

820.4

40.9

24.44987775

red-pajama-incite-chat-3b-v1

INT8-CW

32

4213.6

122

41.7

23.98081535

phi-2

INT8-CW

32

4249

103.8

42.1

23.75296912

dolly-v2-3b

INT8-CW

32

4249.7

103.9

42.5

23.52941176

stablelm-3b-4e1t

INT8-CW

32

4394.7

104.6

44

22.72727273

red-pajama-incite-chat-3b-v1

INT8-CW

1024

4619.9

1181.3

44.4

22.52252252

phi-2

INT8-CW

1024

4689.7

1166.8

45

22.22222222

dolly-v2-3b

INT8-CW

1024

4685.4

1192.5

45.3

22.07505519

stablelm-3b-4e1t

INT8-CW

1024

4572.9

1210.4

46.9

21.32196162

stable-zephyr-3b-dpo

INT4-MIXED

32

3311.9

112.3

49.8

20.08032129

chatglm3-6b

INT4-MIXED

32

4960

141.1

50.9

19.64636542

chatglm3-6b

INT4-MIXED

1024

4793.1

2082.5

52.4

19.08396947

gpt-j-6b

INT4-MIXED

32

5136.9

136.3

53.3

18.76172608

flan-t5-xxl

INT4-MIXED

33

13564.8

91.3

54

18.51851852

stable-zephyr-3b-dpo

INT4-MIXED

1024

3720.9

1385.5

55.2

18.11594203

phi-3-mini-4k-instruct

INT8-CW

32

5144.3

111.5

56.5

17.69911504

gpt-j-6b

INT4-MIXED

1024

6309.5

2268.2

57.2

17.48251748

codegen25-7b

INT4-MIXED

32

5257.9

160.4

58.7

17.03577513

chatglm3-6b-gptq

INT4-MIXED

32

5783.9

160.9

58.9

16.97792869

llama-2-7b-gptq

INT4-MIXED

32

5203.2

160.5

59

16.94915254

flan-t5-xxl

INT4-MIXED

1139

15446.9

451.5

59.6

16.77852349

phi-3-mini-4k-instruct

INT8-CW

1024

5723.8

1589.4

59.8

16.72240803

chatglm3-6b-gptq

INT4-MIXED

1024

5488.1

1991.8

61.1

16.36661211

stable-zephyr-3b-dpo

INT8-CW

32

4228.1

158.5

61.2

16.33986928

codegen25-7b

INT4-MIXED

1024

5904.5

2448.2

62.9

15.89825119

mistral-7b-v0.1

INT4-MIXED

32

5669.7

178.4

63.3

15.79778831

qwen-7b-chat-gptq

INT4-MIXED

32

6092.5

171.4

64.7

15.45595054

falcon-7b-instruct

INT4-MIXED

32

5624.3

197.3

65.1

15.3609831

mistral-7b-v0.1

INT4-MIXED

1024

5730.7

2562.1

65.8

15.19756839

falcon-7b-instruct

INT4-MIXED

1024

5562.6

2566.4

66.5

15.03759398

stable-zephyr-3b-dpo

INT8-CW

1024

4670

1548.6

67.1

14.90312966

llama-2-7b-gptq

INT4-MIXED

1024

6361.5

2368.4

67.6

14.79289941

zephyr-7b-beta

INT4-MIXED

32

5918.5

178.6

67.8

14.74926254

qwen-7b-chat-gptq

INT4-MIXED

1024

6784.8

2446.5

68.9

14.5137881

zephyr-7b-beta

INT4-MIXED

1024

6098.7

2548

70.2

14.24501425

gemma-2b-it

FP16

32

7458.3

95.2

70.7

14.14427157

gemma-2b-it

FP16

1024

7199.2

1407.4

71.6

13.96648045

baichuan2-7b-chat

INT4-MIXED

32

6542.8

157.5

71.8

13.9275766

phi-2

FP16

32

6826.6

104.5

74

13.51351351

dolly-v2-3b

FP16

32

6826.4

104.5

74.2

13.47708895

red-pajama-incite-chat-3b-v1

FP16

32

6797.2

108.9

74.4

13.44086022

stablelm-3b-4e1t

FP16

32

6739.5

110.5

75.9

13.17523057

baichuan2-7b-chat

INT4-MIXED

1024

7262

2959.3

76.1

13.14060447

phi-2

FP16

1024

7623.9

1465.3

78.6

12.72264631

dolly-v2-3b

FP16

1024

7645.3

1489.3

78.9

12.67427123

red-pajama-incite-chat-3b-v1

FP16

1024

7595.3

1499.6

79

12.65822785

qwen-7b-chat

INT4-MIXED

32

7296.4

152.4

80.1

12.48439451

stablelm-3b-4e1t

FP16

1024

7550.6

1474.6

80.7

12.39157373

qwen-7b-chat

INT4-MIXED

1024

7997.9

2530

84.4

11.84834123

gpt-j-6b

INT8-CW

32

7431.5

154.2

84.6

11.82033097

chatglm3-6b

INT8-CW

32

7445.3

154.4

85.1

11.75088132

chatglm3-6b

INT8-CW

1024

7415.9

2680.7

86.6

11.54734411

stable-zephyr-3b-dpo

FP16

32

7029.8

149.8

87.4

11.4416476

gpt-j-6b

INT8-CW

1024

8402.4

2379.2

88.6

11.28668172

flan-t5-xxl

INT8-CW

33

20214.9

162.5

91.6

10.91703057

codegen25-7b

INT8-CW

32

8327.7

159.4

95.4

10.48218029

falcon-7b-instruct

INT8-CW

32

8515.7

196.9

95.9

10.42752868

stable-zephyr-3b-dpo

FP16

1024

7648.4

2633.5

95.9

10.42752868

flan-t5-xxl

INT8-CW

1139

22256

540.9

97.1

10.29866117

falcon-7b-instruct

INT8-CW

1024

8313.2

2786.1

97.5

10.25641026

codegen25-7b

INT8-CW

1024

8852.2

2705.5

99.6

10.04016064

All models listed here were tested with the following parameters:

  • Framework: PyTorch

  • Beam: 1

  • Batch size: 1