Gemma 3N E2B benchmarks
Here are a few benchmarks of Gemma 3N E2B (Q4_0) on a Snapdragon 730G:
Specs: 32 bit LPDDR4X-3733 (14.9 GB/s), 2xA76 (2208MHz, downclocks to 2169MHz), 6xA55 (1804MHz)
All benchmarks are done using llama.cpp build: 5891 (0d922676)
with mmap disabled.
Compilation options: -DGGML_NATIVE=off -DGGML_OPENMP=off -DGGML_CPU_ARM_ARCH=armv8.2-a+fp16+dotprod
1st run: One A55 core vs. one A76 core
One A55 core
model | size | params | backend | threads | mmap | test | t/s |
---|---|---|---|---|---|---|---|
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 1 | 0 | pp512 | 3.21 ± 0.00 |
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 1 | 0 | tg128 | 1.05 ± 0.00 |
One A76 core
model | size | params | backend | threads | mmap | test | t/s |
---|---|---|---|---|---|---|---|
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 1 | 0 | pp512 | 13.65 ± 0.00 |
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 1 | 0 | tg128 | 5.80 ± 0.00 |
2nd run: Two A55 cores vs. two A76 cores
Two A55 cores
model | size | params | backend | threads | mmap | test | t/s |
---|---|---|---|---|---|---|---|
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 2 | 0 | pp512 | 6.46 ± 0.00 |
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 2 | 0 | tg128 | 2.13 ± 0.00 |
Two A76 cores (best configuration for TG, 2-3t/s more in real world usage compared to all cores)
model | size | params | backend | threads | mmap | test | t/s |
---|---|---|---|---|---|---|---|
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 2 | 0 | pp512 | 23.06 ± 0.00 |
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 2 | 0 | tg128 | 6.81 ± 0.00 |
3rd run: 6 A55 cores vs all cores
6 A55 cores
model | size | params | backend | threads | mmap | test | t/s |
---|---|---|---|---|---|---|---|
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 6 | 0 | pp512 | 18.18 ± 0.00 |
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 6 | 0 | tg128 | 4.41 ± 0.00 |
All cores (best configuration for PP, but 2xA76 has negligible difference)
model | size | params | backend | threads | mmap | test | t/s |
---|---|---|---|---|---|---|---|
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 8 | 0 | pp512 | 27.51 ± 0.00 |
gemma3n E2B Q4_0 | 3.34 GiB | 4.46 B | CPU | 8 | 0 | tg128 | 5.26 ± 0.00 |