Gemma 3N E2B benchmarks
Here are a few benchmarks of Gemma 3N E2B (Q4_0) on a Snapdragon 730G:
Specs: 32 bit LPDDR4X-3733 (14.9 GB/s), 2xA76 (2208MHz, downclocks to 2169MHz), 6xA55 (1804MHz)
All benchmarks are done using llama.cpp build: 5891 (0d922676)
with mmap disabled.
Compilation options: -DGGML_NATIVE=off -DGGML_OPENMP=off -DGGML_CPU_ARM_ARCH=armv8.2-a+fp16+dotprod
1st run: One A55 core vs. one A76 core
One A55 core
model |
size |
params |
backend |
threads |
mmap |
test |
t/s |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
1 |
0 |
pp512 |
3.21 ± 0.00 |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
1 |
0 |
tg128 |
1.05 ± 0.00 |
One A76 core
model |
size |
params |
backend |
threads |
mmap |
test |
t/s |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
1 |
0 |
pp512 |
13.65 ± 0.00 |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
1 |
0 |
tg128 |
5.80 ± 0.00 |
2nd run: Two A55 cores vs. two A76 cores
Two A55 cores
model |
size |
params |
backend |
threads |
mmap |
test |
t/s |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
2 |
0 |
pp512 |
6.46 ± 0.00 |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
2 |
0 |
tg128 |
2.13 ± 0.00 |
Two A76 cores (best configuration for TG, 2-3t/s more in real world usage compared to all cores)
model |
size |
params |
backend |
threads |
mmap |
test |
t/s |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
2 |
0 |
pp512 |
23.06 ± 0.00 |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
2 |
0 |
tg128 |
6.81 ± 0.00 |
3rd run: 6 A55 cores vs all cores
6 A55 cores
model |
size |
params |
backend |
threads |
mmap |
test |
t/s |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
6 |
0 |
pp512 |
18.18 ± 0.00 |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
6 |
0 |
tg128 |
4.41 ± 0.00 |
All cores (best configuration for PP, but 2xA76 has negligible difference)
model |
size |
params |
backend |
threads |
mmap |
test |
t/s |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
8 |
0 |
pp512 |
27.51 ± 0.00 |
gemma3n E2B Q4_0 |
3.34 GiB |
4.46 B |
CPU |
8 |
0 |
tg128 |
5.26 ± 0.00 |