Skip to content

Qwen3 0.6B benchmarks

Here are a few benchmarks of Qwen3 0.6B (Q4_0) on a Dimensity 9000+:

Specs: 64 bit LPDDR5X-7500 (60.0 GB/s), 1xX2 (3350MHz), 3xA710 (3200 MHz), 4xA510 (1800MHz)

All benchmarks are done using llama.cpp build: 6602 (72b24d96) with clang version 20.1.8 (Fedora 20.1.8-4.fc42) for aarch64-redhat-linux-gnu with ubatch = 64. Tests on A510 are done with mmap enabled.

Compilation options: -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DGGML_OPENMP=off

1st run: One A510 core vs. one A710 core vs. one X2 core

One A510 core

model size params backend threads n_ubatch test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 1 64 pp512 14.83 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 1 64 tg128 4.34 ± 0.00

One A710 core

model size params backend threads n_ubatch mmap test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 1 64 0 pp512 96.77 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 1 64 0 tg128 27.20 ± 0.00

One X2 core

model size params backend threads n_ubatch mmap test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 1 64 0 pp512 143.94 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 1 64 0 tg128 39.32 ± 0.00

2nd run: Two A510 cores vs. two A710 cores vs. A710+X2

Two A510 cores

model size params backend threads n_ubatch test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 2 64 pp512 25.97 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 2 64 tg128 6.92 ± 0.00

Two A710 cores

model size params backend threads n_ubatch mmap test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 2 64 0 pp512 184.00 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 2 64 0 tg128 48.63 ± 0.00

A710+X2

model size params backend threads n_ubatch mmap test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 2 64 0 pp512 196.54 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 2 64 0 tg128 52.45 ± 0.00

3rd run: 3 A510 cores vs. 3 A710 cores vs. 2xA710+X2

3 A510 cores

model size params backend threads n_ubatch test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 3 64 pp512 39.05 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 3 64 tg128 10.40 ± 0.00

3 A710 cores

model size params backend threads n_ubatch mmap test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 3 64 0 pp512 267.38 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 3 64 0 tg128 64.33 ± 0.00

2xA710+X2

model size params backend threads n_ubatch mmap test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 3 64 0 pp512 284.89 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 3 64 0 tg128 65.91 ± 0.00

4th run: 4 A510 cores vs. 3xA710+X2

4 A510 cores

model size params backend threads n_ubatch test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 4 64 pp512 43.76 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 4 64 tg128 10.51 ± 0.00

3xA710+X2

model size params backend threads n_ubatch mmap test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 4 64 0 pp512 359.16 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 4 64 0 tg128 74.01 ± 0.00

5th run: All cores

model size params backend threads n_ubatch mmap test t/s
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 8 64 0 pp512 86.80 ± 0.00
qwen3 0.6B Q4_0 358.78 MiB 596.05 M CPU 8 64 0 tg128 22.08 ± 0.00