Fun

February 14, 2026
in Fun
1 min read

An useless operating system

It was at the time when I saw these cheap retro gaming consoles with only 4GB of storage and 512MB of RAM. I've thought: Could I make a complete retro gaming distribution in that same space?

The answer: Not really. I could squeeze things in 4GB of space, but not 512MB of RAM. Not even close.

But anyways, here's the download link: https://drive.google.com/file/d/1-LzuryJ2MBLoBvcXmFuBgm9OQ3V_TUy0/view?usp=sharing. Use it at your own risk. (You can also drop it directly to a Ventoy USB.)

There's no sound (I'm too lazy to install it). Wireless drivers and other things took up too much space, so it's also excluded.

This is a simple Debian installation with ES-DE and RetroArch (cores less than 20MB in size were kept).

When you start the OS, an ES-DE instance will start at VT8. Stopping ES-DE makes it restart. You can login as emustation (password is the same.)

It's too resource hogging, and Batocera exists, so it's probably useless. But it's at least fun to try at the end :)

February 8, 2026
in Fun
1 min read

Does the Dimensity 9000 and 10750H hold well in the benchmarks? (again)

I've taken a look at https://www.phoronix.com/review/16-armlinux-sep2018/, and decided to test the Dimensity 9000 on these benchmarks.

The result: https://openbenchmarking.org/result/2602284-YOSH-260227012. Most of the benchmarks the Dimensity wins by a landslide excluding pgbench (it got a lead but the Socionext Developerbox is brute forcing it), or Perl Interpreter (I blame proot for this).

The X2-core was completely destroying anything else in 2018 (obviously), and the total run-time is so fast nothing even comes close (thanks to the X2-core again.). On the desktop leaderboard, the 9000 Plus pales in comparison. I have not tested that out but it should rank at the bottom.

I've also tested a few benchmarks out of my 10750H and it got about 4960X-5960X performance: https://openbenchmarking.org/result/2602287-YOSH-YOSHI9552.

A summary: https://docs.google.com/spreadsheets/d/1MC92otAyJLy6xrpeCMe5lM960kpfgVo6Gvjeg3wx6aE/edit?usp=sharing.

September 27, 2025
in Fun
3 min read

Qwen3 0.6B benchmarks

Here are a few benchmarks of Qwen3 0.6B (Q4_0) on a Dimensity 9000+:

Specs: 64 bit LPDDR5X-7500 (60.0 GB/s), 1xX2 (3350MHz), 3xA710 (3200 MHz), 4xA510 (1800MHz)

All benchmarks are done using llama.cpp build: 6602 (72b24d96) with clang version 20.1.8 (Fedora 20.1.8-4.fc42) for aarch64-redhat-linux-gnu with ubatch = 64. Tests on A510 are done with mmap enabled.

Compilation options: -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DGGML_OPENMP=off

1st run: One A510 core vs. one A710 core vs. one X2 core

One A510 core

model	size	params	backend	threads	n_ubatch	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	1	64	pp512	14.83 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	1	64	tg128	4.34 ± 0.00

One A710 core

model	size	params	backend	threads	n_ubatch	mmap	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	1	64	0	pp512	96.77 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	1	64	0	tg128	27.20 ± 0.00

One X2 core

model	size	params	backend	threads	n_ubatch	mmap	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	1	64	0	pp512	143.94 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	1	64	0	tg128	39.32 ± 0.00

2nd run: Two A510 cores vs. two A710 cores vs. A710+X2

Two A510 cores

model	size	params	backend	threads	n_ubatch	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	2	64	pp512	25.97 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	2	64	tg128	6.92 ± 0.00

Two A710 cores

model	size	params	backend	threads	n_ubatch	mmap	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	2	64	0	pp512	184.00 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	2	64	0	tg128	48.63 ± 0.00

A710+X2

model	size	params	backend	threads	n_ubatch	mmap	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	2	64	0	pp512	196.54 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	2	64	0	tg128	52.45 ± 0.00

3rd run: 3 A510 cores vs. 3 A710 cores vs. 2xA710+X2

3 A510 cores

model	size	params	backend	threads	n_ubatch	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	3	64	pp512	39.05 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	3	64	tg128	10.40 ± 0.00

3 A710 cores

model	size	params	backend	threads	n_ubatch	mmap	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	3	64	0	pp512	267.38 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	3	64	0	tg128	64.33 ± 0.00

2xA710+X2

model	size	params	backend	threads	n_ubatch	mmap	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	3	64	0	pp512	284.89 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	3	64	0	tg128	65.91 ± 0.00

4th run: 4 A510 cores vs. 3xA710+X2

4 A510 cores

model	size	params	backend	threads	n_ubatch	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	4	64	pp512	43.76 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	4	64	tg128	10.51 ± 0.00

3xA710+X2

model	size	params	backend	threads	n_ubatch	mmap	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	4	64	0	pp512	359.16 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	4	64	0	tg128	74.01 ± 0.00

5th run: All cores

model	size	params	backend	threads	n_ubatch	mmap	test	t/s
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	8	64	0	pp512	86.80 ± 0.00
qwen3 0.6B Q4_0	358.78 MiB	596.05 M	CPU	8	64	0	tg128	22.08 ± 0.00

July 14, 2025
in Fun
2 min read

Gemma 3N E2B benchmarks

Here are a few benchmarks of Gemma 3N E2B (Q4_0) on a Snapdragon 730G:

Specs: 32 bit LPDDR4X-3733 (14.9 GB/s), 2xA76 (2208MHz, downclocks to 2169MHz), 6xA55 (1804MHz)

All benchmarks are done using llama.cpp build: 5891 (0d922676) with mmap disabled.

Compilation options: -DGGML_NATIVE=off -DGGML_OPENMP=off -DGGML_CPU_ARM_ARCH=armv8.2-a+fp16+dotprod

1st run: One A55 core vs. one A76 core

One A55 core

model	size	params	backend	threads	mmap	test	t/s
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	1	0	pp512	3.21 ± 0.00
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	1	0	tg128	1.05 ± 0.00

One A76 core

model	size	params	backend	threads	mmap	test	t/s
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	1	0	pp512	13.65 ± 0.00
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	1	0	tg128	5.80 ± 0.00

2nd run: Two A55 cores vs. two A76 cores

Two A55 cores

model	size	params	backend	threads	mmap	test	t/s
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	2	0	pp512	6.46 ± 0.00
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	2	0	tg128	2.13 ± 0.00

Two A76 cores (best configuration for TG, 2-3t/s more in real world usage compared to all cores)

model	size	params	backend	threads	mmap	test	t/s
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	2	0	pp512	23.06 ± 0.00
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	2	0	tg128	6.81 ± 0.00

3rd run: 6 A55 cores vs all cores

6 A55 cores

model	size	params	backend	threads	mmap	test	t/s
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	6	0	pp512	18.18 ± 0.00
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	6	0	tg128	4.41 ± 0.00

All cores (best configuration for PP, but 2xA76 has negligible difference)

model	size	params	backend	threads	mmap	test	t/s
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	8	0	pp512	27.51 ± 0.00
gemma3n E2B Q4_0	3.34 GiB	4.46 B	CPU	8	0	tg128	5.26 ± 0.00

May 1, 2025
in Fun
3 min read

Run Codeforces 1952J codes 20000x faster

I have created a Codeforces 1952J language to C++ transpiler, the resulting compiled code is 20000x* faster compared to the reference implementation.

Code

#include <bits/stdc++.h>
using namespace std;
#define int long long
#ifndef yoshi_likes_e4
#define endl '\n'
#endif
#define problem ""
#define multitest 0
#define debug(x) cerr << #x << " = " << x << endl;
void init()
{
}
map<string, bool> var_type;
void Add_variable(string k)
{
    bool ok = 0;
    try
    {
        std::stoi(k);
        ok = true;
    }
    catch (const std::invalid_argument &e)
    {
    }
    catch (const std::out_of_range &e)
    {
    }
    if (!ok)
    {
        int pos = k.find('[');
        if (pos == string::npos)
            var_type[k] = 0;
        else
            var_type[string(k.begin(), k.begin() + pos)] = 1;
    }
}
void Yoshi()
{
    vector<vector<string>> code;
    string s;
    while (getline(cin, s))
    {
        stringstream t(s);
        code.push_back({});
        string x;
        while (t >> x)
            code.back().push_back(x);
    }
    map<int, int> label_id;
    int lid = 0;
    for (auto &lines : code)
    {
        if (lines[0] == "simp")
        {
            int v = stoi(lines[2]) - 1;
            if (label_id.find(v) == label_id.end())
                label_id[v] = lid++;
        }
        if (lines[0] == "vibe")
            Add_variable(lines[2]), Add_variable(lines[4]);
        if (lines[0] == "bruh")
            Add_variable(lines[1]), Add_variable(lines[5]);
        if (lines[0] == "*slaps")
            Add_variable(lines[1]), Add_variable(lines[5].substr(0, lines[5].size() - 1));
        if (lines[0] == "rip")
            Add_variable(lines[2]), Add_variable(lines[6]);
        if (lines[0] == "yoink")
            Add_variable(lines[1]);
        if (lines[0] == "yeet")
            Add_variable(lines[1]);
    }
    vector<string> var0, var1;
    for (auto &[u, v] : var_type)
        if (v)
            var1.push_back(u);
        else
            var0.push_back(u);
    cout << R""""(#include <bits/stdc++.h>
using namespace std;
void input(int &x)
{
    string s;
    getline(cin, s);
    x = stoi(s);
}
void input(vector<int> &x)
{
    string s;
    getline(cin, s);
    stringstream t(s);
    while (t >> s)
        x.push_back(stoi(s));
})"""";
    cout << "\nint main(){\ncin.tie(0)->sync_with_stdio(0);\n";
    if (var0.size())
    {
        cout << "int ";
        for (auto &i : var0)
            cout << i << (&i != &var0.back() ? ", " : ";\n");
    }
    if (var1.size())
    {
        cout << "vector<int> ";
        for (auto &i : var1)
            cout << i << (&i != &var1.back() ? ", " : ";\n");
    }
    for (auto &lines : code)
    {
        if (label_id.find(&lines - &code[0]) != label_id.end())
            cout << "L" << label_id[&lines - &code[0]] << ":\n";
        if (lines[0] == "simp")
            cout << "goto L" << label_id[stoi(lines[2]) - 1] << ";\n";
        if (lines[0] == "vibe")
            cout << "if (" << lines[2] << " > " << lines[4] << ')' << "\n";
        if (lines[0] == "bruh")
            cout << lines[1] << " = " << lines[5] << ";\n";
        if (lines[0] == "*slaps")
            cout << lines[5].substr(0, lines[5].size() - 1) << " += " << lines[1] << ";\n";
        if (lines[0] == "rip")
            cout << lines[2] << " -= " << lines[6] << ";\n";
        if (lines[0] == "yoink")
            cout << "input(" << lines[1] << ");\n";
        if (lines[0] == "yeet")
            cout << "cout << " << lines[1] << " << \"\\n\";\n";
        if (lines[0] == "go")
            cout << "return 0;\n";
    }
    cout << "}" << endl;
}
signed main()
{
#ifndef yoshi_likes_e4
    ios::sync_with_stdio(0);
    cin.tie(0);
    if (fopen(problem ".inp", "r"))
    {
        freopen(problem ".inp", "r", stdin);
        freopen(problem ".out", "w", stdout);
    }
#endif
    init();
    int t = 1;
#if multitest
    cin >> t;
#endif
    while (t--)
        Yoshi();
}

*: Selection sort performance:

\(n=3000\):

Implementation	Runtime
Reference	41.0s
Compiled	2ms

\(n=5000\):

Implementation	Runtime
Reference	123.3s
Compiled	6ms

November 13, 2023
in Fun
1 min read

SymPy

I have created a SymPy package that runs on ARM64, based on Alpine (note: requires Termux,proot/chroot OR any rooted computer that is ARM64 (e.g. RPi,etc.)) just for fun (to see how small it can be).

Download link

Usage:
- Extract this file to your home directory (e.g by tar -xzf ../sympy.cmax.tgz /home/...).
- Run chroot /home/.../alpine /bin/bash -l
- You should see a Python shell with SymPy loaded. (Note: exiting the shell will stop the chroot. To stop this (e.g. for customization,etc..) remove the exit 0 line in the .../alpine/etc/profile file.)