Skip to content

Gemma 3N E2B benchmarks

Here are a few benchmarks of Gemma 3N E2B (Q4_0) on a Snapdragon 730G:

Specs: 32 bit LPDDR4X-3733 (14.9 GB/s), 2xA76 (2208MHz, downclocks to 2169MHz), 6xA55 (1804MHz)

All benchmarks are done using llama.cpp build: 5891 (0d922676) with mmap disabled.

Compilation options: -DGGML_NATIVE=off -DGGML_OPENMP=off -DGGML_CPU_ARM_ARCH=armv8.2-a+fp16+dotprod

1st run: One A55 core vs. one A76 core

One A55 core

model size params backend threads mmap test t/s
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 1 0 pp512 3.21 ± 0.00
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 1 0 tg128 1.05 ± 0.00

One A76 core

model size params backend threads mmap test t/s
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 1 0 pp512 13.65 ± 0.00
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 1 0 tg128 5.80 ± 0.00

2nd run: Two A55 cores vs. two A76 cores

Two A55 cores

model size params backend threads mmap test t/s
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 2 0 pp512 6.46 ± 0.00
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 2 0 tg128 2.13 ± 0.00

Two A76 cores (best configuration for TG, 2-3t/s more in real world usage compared to all cores)

model size params backend threads mmap test t/s
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 2 0 pp512 23.06 ± 0.00
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 2 0 tg128 6.81 ± 0.00

3rd run: 6 A55 cores vs all cores

6 A55 cores

model size params backend threads mmap test t/s
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 6 0 pp512 18.18 ± 0.00
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 6 0 tg128 4.41 ± 0.00

All cores (best configuration for PP, but 2xA76 has negligible difference)

model size params backend threads mmap test t/s
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 8 0 pp512 27.51 ± 0.00
gemma3n E2B Q4_0 3.34 GiB 4.46 B CPU 8 0 tg128 5.26 ± 0.00