2025

May 14, 2025
in LLM
2 min read

Qwen3 FAQ

There are other Qwen3 FAQs, with this one made me do my FAQ. I just put random commonly asked questions here for fun.

1. HOW TO DISABLE THINKING 🔥🔥🔥🔥

Use /no_think in your prompts. Or just add it to system prompt.

2. What are the system requirements?

In progress

3. What can I run?

FOR MOE MODELS, REMEMBER TO USE TENSOR OVERRIDES!!!

Any units of information here refers to VRAM size if not explicitly stated.

This list is from best to worst.

If you have a GPU server with \(\geq 150\) GB of free VRAM: go for the 235B model! You won't be disappointed.

Else, if you have CPU platforms with fast RAM (\(\geq 200\) GB/s and \(\geq 96\) GB), you can try running the 235B model. NOTE: You should add GPUs (\(24\) GB will work fine) then offload to them as much as possible (this will boost your speed by a huge margin).

Else, If you have a GPU gaming rig or server with \(\leq 32\) GB of VRAM, or is RAM-limited (\(\leq 96\) GB), the 32B model is best for you. It performs well, maybe a slight bit less than the 235B.

If you want fast inference on big GPU rigs, you can also try the 32B model.

For platforms with \(12-16\) GB of VRAM, running 14B or 30BA3B is advised. You can get very high performance with 30BA3B using an ordinary computer with 16GB VRAM.

30BA3B have relatively high performance even on DDR4 RAM and VRAM-limited machines. It can reach 20t/s on an ordinary machine with a 8GB GPU and DDR5 RAM. Think of it as like a "flash" model.

30BA3B doesn't require expensive GPUs, anything \(\geq 8\) GB and relatively recent is already an usable experience.

8b is also a viable option for 8GB platforms. You can choose between 30BA3B (slower) or 8b (faster, but somewhat less intelligent).

4B is a viable option for 4-6 GB platforms and ordinary DDR4/5 CPU inference. You can use this model on 6GB GPUs while keeping a bit of memory for the system.

1.7B is suitable for fast, ordinary CPU inference. You can even do it on high end phones!!!

0.6B is suitable for anything except single-channel RAM DDR \(\leq 4\) devices.

4. What are tensor overrides?

Think of it as a way to force some tensors, not layers onto the GPU. You would to like to load everything (except the FFN*) on to the GPU. Then, load the FFN accordingly if you have enough VRAM.

*: The FFN tensors are really big, but they can be processed efficiently with MoE models. Other parts (such as shared experts, etc...) needs to be put on the GPU because they are activated every time.

May 1, 2025
in Fun
3 min read

Run Codeforces 1952J codes 20000x faster

I have created a Codeforces 1952J language to C++ transpiler, the resulting compiled code is 20000x* faster compared to the reference implementation.

Code

#include <bits/stdc++.h>
using namespace std;
#define int long long
#ifndef yoshi_likes_e4
#define endl '\n'
#endif
#define problem ""
#define multitest 0
#define debug(x) cerr << #x << " = " << x << endl;
void init()
{
}
map<string, bool> var_type;
void Add_variable(string k)
{
    bool ok = 0;
    try
    {
        std::stoi(k);
        ok = true;
    }
    catch (const std::invalid_argument &e)
    {
    }
    catch (const std::out_of_range &e)
    {
    }
    if (!ok)
    {
        int pos = k.find('[');
        if (pos == string::npos)
            var_type[k] = 0;
        else
            var_type[string(k.begin(), k.begin() + pos)] = 1;
    }
}
void Yoshi()
{
    vector<vector<string>> code;
    string s;
    while (getline(cin, s))
    {
        stringstream t(s);
        code.push_back({});
        string x;
        while (t >> x)
            code.back().push_back(x);
    }
    map<int, int> label_id;
    int lid = 0;
    for (auto &lines : code)
    {
        if (lines[0] == "simp")
        {
            int v = stoi(lines[2]) - 1;
            if (label_id.find(v) == label_id.end())
                label_id[v] = lid++;
        }
        if (lines[0] == "vibe")
            Add_variable(lines[2]), Add_variable(lines[4]);
        if (lines[0] == "bruh")
            Add_variable(lines[1]), Add_variable(lines[5]);
        if (lines[0] == "*slaps")
            Add_variable(lines[1]), Add_variable(lines[5].substr(0, lines[5].size() - 1));
        if (lines[0] == "rip")
            Add_variable(lines[2]), Add_variable(lines[6]);
        if (lines[0] == "yoink")
            Add_variable(lines[1]);
        if (lines[0] == "yeet")
            Add_variable(lines[1]);
    }
    vector<string> var0, var1;
    for (auto &[u, v] : var_type)
        if (v)
            var1.push_back(u);
        else
            var0.push_back(u);
    cout << R""""(#include <bits/stdc++.h>
using namespace std;
void input(int &x)
{
    string s;
    getline(cin, s);
    x = stoi(s);
}
void input(vector<int> &x)
{
    string s;
    getline(cin, s);
    stringstream t(s);
    while (t >> s)
        x.push_back(stoi(s));
})"""";
    cout << "\nint main(){\ncin.tie(0)->sync_with_stdio(0);\n";
    if (var0.size())
    {
        cout << "int ";
        for (auto &i : var0)
            cout << i << (&i != &var0.back() ? ", " : ";\n");
    }
    if (var1.size())
    {
        cout << "vector<int> ";
        for (auto &i : var1)
            cout << i << (&i != &var1.back() ? ", " : ";\n");
    }
    for (auto &lines : code)
    {
        if (label_id.find(&lines - &code[0]) != label_id.end())
            cout << "L" << label_id[&lines - &code[0]] << ":\n";
        if (lines[0] == "simp")
            cout << "goto L" << label_id[stoi(lines[2]) - 1] << ";\n";
        if (lines[0] == "vibe")
            cout << "if (" << lines[2] << " > " << lines[4] << ')' << "\n";
        if (lines[0] == "bruh")
            cout << lines[1] << " = " << lines[5] << ";\n";
        if (lines[0] == "*slaps")
            cout << lines[5].substr(0, lines[5].size() - 1) << " += " << lines[1] << ";\n";
        if (lines[0] == "rip")
            cout << lines[2] << " -= " << lines[6] << ";\n";
        if (lines[0] == "yoink")
            cout << "input(" << lines[1] << ");\n";
        if (lines[0] == "yeet")
            cout << "cout << " << lines[1] << " << \"\\n\";\n";
        if (lines[0] == "go")
            cout << "return 0;\n";
    }
    cout << "}" << endl;
}
signed main()
{
#ifndef yoshi_likes_e4
    ios::sync_with_stdio(0);
    cin.tie(0);
    if (fopen(problem ".inp", "r"))
    {
        freopen(problem ".inp", "r", stdin);
        freopen(problem ".out", "w", stdout);
    }
#endif
    init();
    int t = 1;
#if multitest
    cin >> t;
#endif
    while (t--)
        Yoshi();
}

*: Selection sort performance:

\(n=3000\):

Implementation	Runtime
Reference	41.0s
Compiled	2ms

\(n=5000\):

Implementation	Runtime
Reference	123.3s
Compiled	6ms

April 30, 2025
in CP
1 min read

Shortest path between 2 special nodes

This problem is one of our practice problems, and I have been struggling a bit to solve it. There's a "solution" on GFG that's basically useless.

The solution is plainly \(O(VE)\) and basically have no relations to the actual solution.

I'll explain 2 real solutions:

Solution 1. BFS with multiple sources

Do normal BFS but you push all the sources to the queue. Create a \(source_v\) array which will be the optimal source for the \(v\)-th node. Then, for each edge \((u,v)\), let \(Result=min(Result,d[u]+d[v])\) if \(source[u] \neq source[v]\).

Complexity: \(O(V+E)\).

Solution 2. SPFA with multiple sources

You do normal SPFA relaxation (start from node 1, then add node 2 as a source, etc...) but update the \(Result\) variable when there's a node that were NOT a source and is special. The relaxation is only performed if \(d[u]+1<Result\).

Complexity: \(O(VE)\) (I don't know if it can be hacked, it runs relatively fast (~2x slower than BFS)).