Qwen3 FAQ

There are other Qwen3 FAQs, with this one made me do my FAQ. I just put random commonly asked questions here for fun.

1. HOW TO DISABLE THINKING 🔥🔥🔥🔥

Use /no_think in your prompts. Or just add it to system prompt.

2. What are the system requirements?

In progress

3. What can I run?

FOR MOE MODELS, REMEMBER TO USE TENSOR OVERRIDES!!!

Any units of information here refers to VRAM size if not explicitly stated.

This list is from best to worst.

If you have a GPU server with \(\geq 150\) GB of free VRAM: go for the 235B model! You won't be disappointed.

Else, if you have CPU platforms with fast RAM (\(\geq 200\) GB/s and \(\geq 96\) GB), you can try running the 235B model. NOTE: You should add GPUs (\(24\) GB will work fine) then offload to them as much as possible (this will boost your speed by a huge margin).

Else, If you have a GPU gaming rig or server with \(\leq 32\) GB of VRAM, or is RAM-limited (\(\leq 96\) GB), the 32B model is best for you. It performs well, maybe a slight bit less than the 235B.

If you want fast inference on big GPU rigs, you can also try the 32B model.

For platforms with \(12-16\) GB of VRAM, running 14B or 30BA3B is advised. You can get very high performance with 30BA3B using an ordinary computer with 16GB VRAM.

30BA3B have relatively high performance even on DDR4 RAM and VRAM-limited machines. It can reach 20t/s on an ordinary machine with a 8GB GPU and DDR5 RAM. Think of it as like a "flash" model.

30BA3B doesn't require expensive GPUs, anything \(\geq 8\) GB and relatively recent is already an usable experience.

8b is also a viable option for 8GB platforms. You can choose between 30BA3B (slower) or 8b (faster, but somewhat less intelligent).

4B is a viable option for 4-6 GB platforms and ordinary DDR4/5 CPU inference. You can use this model on 6GB GPUs while keeping a bit of memory for the system.

1.7B is suitable for fast, ordinary CPU inference. You can even do it on high end phones!!!

0.6B is suitable for anything except single-channel RAM DDR \(\leq 4\) devices.

4. What are tensor overrides?

Think of it as a way to force some tensors, not layers onto the GPU. You would to like to load everything (except the FFN*) on to the GPU. Then, load the FFN accordingly if you have enough VRAM.

*: The FFN tensors are really big, but they can be processed efficiently with MoE models. Other parts (such as shared experts, etc...) needs to be put on the GPU because they are activated every time.

Run Codeforces 1952J codes 20000x faster

I have created a Codeforces 1952J language to C++ transpiler, the resulting compiled code is 20000x* faster compared to the reference implementation.

Code
#include <bits/stdc++.h>
using namespace std;
#define int long long
#ifndef yoshi_likes_e4
#define endl '\n'
#endif
#define problem ""
#define multitest 0
#define debug(x) cerr << #x << " = " << x << endl;
void init()
{
}
map<string, bool> var_type;
void Add_variable(string k)
{
    bool ok = 0;
    try
    {
        std::stoi(k);
        ok = true;
    }
    catch (const std::invalid_argument &e)
    {
    }
    catch (const std::out_of_range &e)
    {
    }
    if (!ok)
    {
        int pos = k.find('[');
        if (pos == string::npos)
            var_type[k] = 0;
        else
            var_type[string(k.begin(), k.begin() + pos)] = 1;
    }
}
void Yoshi()
{
    vector<vector<string>> code;
    string s;
    while (getline(cin, s))
    {
        stringstream t(s);
        code.push_back({});
        string x;
        while (t >> x)
            code.back().push_back(x);
    }
    map<int, int> label_id;
    int lid = 0;
    for (auto &lines : code)
    {
        if (lines[0] == "simp")
        {
            int v = stoi(lines[2]) - 1;
            if (label_id.find(v) == label_id.end())
                label_id[v] = lid++;
        }
        if (lines[0] == "vibe")
            Add_variable(lines[2]), Add_variable(lines[4]);
        if (lines[0] == "bruh")
            Add_variable(lines[1]), Add_variable(lines[5]);
        if (lines[0] == "*slaps")
            Add_variable(lines[1]), Add_variable(lines[5].substr(0, lines[5].size() - 1));
        if (lines[0] == "rip")
            Add_variable(lines[2]), Add_variable(lines[6]);
        if (lines[0] == "yoink")
            Add_variable(lines[1]);
        if (lines[0] == "yeet")
            Add_variable(lines[1]);
    }
    vector<string> var0, var1;
    for (auto &[u, v] : var_type)
        if (v)
            var1.push_back(u);
        else
            var0.push_back(u);
    cout << R""""(#include <bits/stdc++.h>
using namespace std;
void input(int &x)
{
    string s;
    getline(cin, s);
    x = stoi(s);
}
void input(vector<int> &x)
{
    string s;
    getline(cin, s);
    stringstream t(s);
    while (t >> s)
        x.push_back(stoi(s));
})"""";
    cout << "\nint main(){\ncin.tie(0)->sync_with_stdio(0);\n";
    if (var0.size())
    {
        cout << "int ";
        for (auto &i : var0)
            cout << i << (&i != &var0.back() ? ", " : ";\n");
    }
    if (var1.size())
    {
        cout << "vector<int> ";
        for (auto &i : var1)
            cout << i << (&i != &var1.back() ? ", " : ";\n");
    }
    for (auto &lines : code)
    {
        if (label_id.find(&lines - &code[0]) != label_id.end())
            cout << "L" << label_id[&lines - &code[0]] << ":\n";
        if (lines[0] == "simp")
            cout << "goto L" << label_id[stoi(lines[2]) - 1] << ";\n";
        if (lines[0] == "vibe")
            cout << "if (" << lines[2] << " > " << lines[4] << ')' << "\n";
        if (lines[0] == "bruh")
            cout << lines[1] << " = " << lines[5] << ";\n";
        if (lines[0] == "*slaps")
            cout << lines[5].substr(0, lines[5].size() - 1) << " += " << lines[1] << ";\n";
        if (lines[0] == "rip")
            cout << lines[2] << " -= " << lines[6] << ";\n";
        if (lines[0] == "yoink")
            cout << "input(" << lines[1] << ");\n";
        if (lines[0] == "yeet")
            cout << "cout << " << lines[1] << " << \"\\n\";\n";
        if (lines[0] == "go")
            cout << "return 0;\n";
    }
    cout << "}" << endl;
}
signed main()
{
#ifndef yoshi_likes_e4
    ios::sync_with_stdio(0);
    cin.tie(0);
    if (fopen(problem ".inp", "r"))
    {
        freopen(problem ".inp", "r", stdin);
        freopen(problem ".out", "w", stdout);
    }
#endif
    init();
    int t = 1;
#if multitest
    cin >> t;
#endif
    while (t--)
        Yoshi();
}

*: Selection sort performance:

\(n=3000\):

Implementation Runtime
Reference 41.0s
Compiled 2ms

\(n=5000\):

Implementation Runtime
Reference 123.3s
Compiled 6ms

Shortest path between 2 special nodes

This problem is one of our practice problems, and I have been struggling a bit to solve it. There's a "solution" on GFG that's basically useless.

The solution is plainly \(O(VE)\) and basically have no relations to the actual solution.

I'll explain 2 real solutions:

Solution 1. BFS with multiple sources

Do normal BFS but you push all the sources to the queue. Create a \(source_v\) array which will be the optimal source for the \(v\)-th node. Then, for each edge \((u,v)\), let \(Result=min(Result,d[u]+d[v])\) if \(source[u] \neq source[v]\).

Complexity: \(O(V+E)\).

Solution 2. SPFA with multiple sources

You do normal SPFA relaxation (start from node 1, then add node 2 as a source, etc...) but update the \(Result\) variable when there's a node that were NOT a source and is special. The relaxation is only performed if \(d[u]+1<Result\).

Complexity: \(O(VE)\) (I don't know if it can be hacked, it runs relatively fast (~2x slower than BFS)).

CHUNGKHOAN

Recently I revisited this problem which was intended to solve in \(O(n)\) (the problem asks to find the largest subarray which has \(max-min<=T\)). But I can only think of a \(O(n*log(n))\) solution, so I coded the fastest \(O(n*log(n))\) solution yet using vectorized binary search and linear sparse table:

Code
#include <algorithm>
#include <functional>
#include <iostream>
using namespace std;
typedef int vec __attribute__((vector_size(32)));
int lsb(int x)
{
    return x & -x;
}
int smin[4000005], arr[4000005], smax[4000005], mmin[4000005], mmax[4000005];
int b = 16;
int n, t;
int smmin(int r, int size = b)
{
    int dist_from_r = __lg(mmin[r] & ((1 << size) - 1));
    return r - dist_from_r;
}
int smmax(int r, int size = b)
{
    int dist_from_r = __lg(mmax[r] & ((1 << size) - 1));
    return r - dist_from_r;
}
int pmax(int x, int y)
{
    return arr[x] > arr[y] ? x : y;
}
int pmin(int x, int y)
{
    return arr[x] < arr[y] ? x : y;
}
// RMQ constant time insanity that I found online
void build()
{
    int curr_mask = 0;
    for (int i = 0; i < n; i++)
    {
        curr_mask = (curr_mask << 1) & ((1LL << b) - 1);
        while (curr_mask > 0 && pmin(i, i - __lg(lsb(curr_mask))) == i)
        {
            curr_mask ^= lsb(curr_mask);
        }
        curr_mask |= 1;
        mmin[i] = curr_mask;
    }
    curr_mask = 0;
    for (int i = 0; i < n; i++)
    {
        curr_mask = (curr_mask << 1) & ((1LL << b) - 1);
        while (curr_mask > 0 && pmax(i, i - __lg(lsb(curr_mask))) == i)
        {
            curr_mask ^= lsb(curr_mask);
        }
        curr_mask |= 1;
        mmax[i] = curr_mask;
    }
    for (int i = 0; i < (n >> 4); i++)
    {
        smin[i] = smmin(b * i + b - 1);
        smax[i] = smmax(b * i + b - 1);
    }
    for (int j = 1; (1 << j) <= (n >> 4); j++)
        for (int i = 0; i + (1 << j) <= (n >> 4); i++)
        {
            smin[(n >> 4) * j + i] = pmin(smin[(n >> 4) * (j - 1) + i], smin[(n >> 4) * (j - 1) + i + (1 << (j - 1))]);
            smax[(n >> 4) * j + i] = pmax(smax[(n >> 4) * (j - 1) + i], smax[(n >> 4) * (j - 1) + i + (1 << (j - 1))]);
        }
}
vec query8_0(vec &s, vec &t)
{
    vec res;
    for (int i = 0; i < 8; i++)
    {
        int l = s[i], r = t[i];
        if (r - l + 1 <= b)
        {
            res[i] = arr[smmax(r, r - l + 1)] - arr[smmin(r, r - l + 1)];
            continue;
        }
        int ansmax = pmax(smmax(l + b - 1), smmax(r)), ansmin = pmin(smmin(l + b - 1), smmin(r));
        int x = (l >> 4) + 1, y = (r >> 4) - 1;
        if (x <= y)
        {
            int j = __lg(y - x + 1);
            ansmax = pmax(ansmax, pmax(smax[(n >> 4) * j + x], smax[(n >> 4) * j + y - (1 << j) + 1]));
            ansmin = pmin(ansmin, pmin(smin[(n >> 4) * j + x], smin[(n >> 4) * j + y - (1 << j) + 1]));
        }
        res[i] = arr[ansmax] - arr[ansmin];
    }
    return res;
}
bool cmp_l(vec &a, vec &b)
{
    vec c = a <= b;
    return c[0] + c[1] + c[2] + c[3] + c[4] + c[5] + c[6] + c[7];
}
int maxvec(vec a)
{
    return max(max(max(a[0], a[1]), max(a[2], a[3])), max(max(a[4], a[5]), max(a[6], a[7])));
}
vec nm;
vec bs_min(vec &l)
{
    // SIMD binary search
    vec s = l, r = nm;
    while (cmp_l(l, r))
    {
        vec mid = (l + r) >> 1, q4 = query8_0(s, mid);
        l = l - (mid - l + 1) * (q4 <= t);
        r = r - (mid - r - 1) * (q4 > t);
    }
    return r - s + 1;
}
const int BUF_SZ = 33550336;
inline namespace Input
{
char buf[BUF_SZ];
int pos;
int len;
char next_char()
{
    if (pos == len)
    {
        pos = 0;
        len = (int)fread(buf, 1, BUF_SZ, stdin);
        if (!len)
        {
            return EOF;
        }
    }
    return buf[pos++];
}
int read_int()
{
    int x;
    char ch;
    int sgn = 1;
    while (!isdigit(ch = next_char()))
    {
        if (ch == '-')
        {
            sgn *= -1;
        }
    }
    x = ch - '0';
    while (isdigit(ch = next_char()))
    {
        x = x * 10 + (ch - '0');
    }
    return x * sgn;
}
}; // namespace Input
inline namespace Output
{
char buf[BUF_SZ];
int pos;
void flush_out()
{
    fwrite(buf, 1, pos, stdout);
    pos = 0;
}
void write_char(char c)
{
    if (pos == BUF_SZ)
    {
        flush_out();
    }
    buf[pos++] = c;
}
void write_int(int x)
{
    static char num_buf[100];
    if (x < 0)
    {
        write_char('-');
        x *= -1;
    }
    int len = 0;
    for (; x >= 10; x /= 10)
    {
        num_buf[len++] = (char)('0' + (x % 10));
    }
    write_char((char)('0' + x));
    while (len)
    {
        write_char(num_buf[--len]);
    }
    write_char('\n');
} // auto-flush output when program exits
#include <cassert>
void init_output()
{
    assert(atexit(flush_out) == 0);
}
}; // namespace Output
signed main()
{
    init_output();
    t = read_int();
    n = read_int();
    for (int i = 0; i < n; i++)
        arr[i] = read_int();
    build();
    int res = 0;
    nm = vec{n - 1, n - 1, n - 1, n - 1, n - 1, n - 1, n - 1, n - 1};
    vec shift{0, 1, 2, 3, 4, 5, 6, 7};
    for (int i = 0; i < n / 8; i++)
    {
        vec vi = shift + 8 * i;
        res = max(res, maxvec(bs_min(vi)));
        // Fast cut
        if (n - res < 8 * (i + 1))
            break;
    }
    if (n % 8)
    {
        int n4 = (n >> 3) << 3;
        vec c{n4, n4, n4, n4, n4, n4, n4, n4};
        for (int i = 0; i < n % 8; i++)
            c[i]++;
        res = max(res, maxvec(bs_min(c)));
    }
    write_int(res);
}

Hàng cây

1. Subtask 1

Ta lặp qua từng phần tử trong mảng, chọn ra những phần tử thỏa mãn rồi tính giá trị. Đpt: \(O(nq)\).

2. Subtask 2

Ta sẽ xử lý trước các truy vấn \([l,r]\) với l,r thuộc mảng. Ta đặt các mảng next, prev, p để xử lý thao tác xóa phần tử (tăng \(l\) lên -> xóa dần phần tử đi). Ta lặp r, sau đó chèn các phần tử nằm trong khoảng \([0,r]\) vào mảng p. Mảng next sẽ lưu \(i+1\) với phần tử thứ \(i\), và mảng prev sẽ lưu phần tử thứ \(i-1\). Khi chúng ta muốn xóa phần tử thứ \(i\) trong mảng p thì ta sẽ cập nhật next và prev để bỏ qua phần tử đấy và cập nhật chênh lệch. Khi chúng ta xử lý được hết \([l,r]\) trong mảng thì chỉ cần cnp để tìm đáp án.

Code
void update(int *p, int *prev, int *next, int i, long long &sum, int size)
{
    if (next[i] != size && prev[i] != -1)
    {
        sum += abs(p[next[i]] - p[prev[i]]);
    }
    if (next[i] != size)
    {
        sum -= abs(p[next[i]] - p[i]);
        prev[next[i]] = prev[i];
    }
    if (prev[i] != -1)
    {
        sum -= abs(p[prev[i]] - p[i]);
        next[prev[i]] = next[i];
    }
}
void sub2(vector<int> &p, vector<query *> &queries)
{
    int distinct = 0;
    bool cnt[401];
    fill(cnt, cnt + 401, 0);
    for (int i = 0; i < n; i++)
    {
        if (!cnt[p[i]])
        {
            cnt[p[i]] = 1;
            distinct++;
        }
    }
    long long mp[401][401];
    vector<int> p2 = p;
    sort(p.begin(), p.end());
    unique(p.begin(), p.end());
    while (p.size() != distinct)
        p.pop_back();
    int p3[n], prev[n], next[n];
    for (int i = 0; i < p.size(); i++)
    {
        vector<int> idx[401];
        int counter = 0;
        long long sum = 0;
        for (int j = 0; j < n; j++)
        {
            if (p2[j] <= p[i])
            {
                p3[counter] = p2[j];
                prev[counter] = counter - 1;
                next[counter] = counter + 1;
                idx[p2[j]].push_back(counter);
                if (counter >= 1)
                    sum += abs(p3[counter - 1] - p2[j]);
                counter++;
            }
        }
        mp[p[0]][p[i]] = sum;
        for (int j = 1; j <= i; j++)
        {
            for (auto &k : idx[p[j - 1]])
            {
                update(p3, prev, next, k, sum, counter);
            }
            mp[p[j]][p[i]] = sum;
        }
    }
    for (int i = 0; i < q; i++)
    {
        int l = queries[i]->l, r = queries[i]->r;
        int ll = lower_bound(p.begin(), p.end(), l) - p.begin();
        int rr = upper_bound(p.begin(), p.end(), r) - p.begin() - 1;
        if (ll > rr)
        {
            cout << 0 << '\n';
            continue;
        }
        cout << mp[p[ll]][p[rr]] << '\n';
    }
}

Đpt: \(O((n+h)*h+q\log(h))\).

3. Subtask 3

Với subtask 3, ta chỉ cần phải chèn và xóa các phần tử 1 cách trâu. Do giới hạn của subtask này khiến cho chỉ có \(O(n)\) phần tử bị chèn thêm / xóa đi. Ta vẫn sử dụng chặt nhị phân để tìm kết quả (chuyển \([l,r]\) thành \([l',r']\) gần nhất mà \(l',r'\) thuộc mảng).

Code
void add(map<int, int> &s, int i, int x, long long &sum)
{
    auto it = s.lower_bound(i), it2 = it;
    if (it != s.begin())
        it2--;
    if (it != s.end() && it != s.begin())
    {
        sum -= abs((*it).second - (*it2).second);
    }
    if (it != s.end())
    {
        sum += abs((*it).second - x);
    }
    if (it != s.begin())
    {
        sum += abs((*it2).second - x);
    }
    s[i] = x;
}
void rmv(map<int, int> &s, int i, long long &sum)
{
    auto it = s.lower_bound(i), it2 = it, it3 = it;
    if (it != s.begin())
        it3--;
    it2++;
    if (it2 != s.end() && it != s.begin())
    {
        sum += abs((*it2).second - (*it3).second);
    }
    if (it2 != s.end())
    {
        sum -= abs((*it2).second - (*it).second);
    }
    if (it != s.begin())
    {
        sum -= abs((*it3).second - (*it).second);
    }
    s.erase(it);
}
void sub3(vector<int> &p, vector<query *> &queries)
{
    map<int, vector<int>> idx;
    for (auto &i : p)
        idx[i].push_back(&i - &p.front());
    long long sum = 0;
    map<int, int> s;
    sort(p.begin(), p.end());
    unique(p.begin(), p.end());
    while (p.size() != idx.size())
        p.pop_back();
    int oldl = queries.front()->l, oldr = queries.front()->r, oldll, oldrr;
    oldll = lower_bound(p.begin(), p.end(), oldl) - p.begin();
    oldrr = upper_bound(p.begin(), p.end(), oldr) - p.begin() - 1;
    for (int i = oldll; i <= oldrr; i++)
    {
        for (auto &k : idx[p[i]])
        {
            add(s, k, p[i], sum);
        }
    }
    cout << sum << '\n';
    for (int i = 1; i < q; i++)
    {
        int l = queries[i]->l, r = queries[i]->r;
        int ll = lower_bound(p.begin(), p.end(), l) - p.begin();
        int rr = upper_bound(p.begin(), p.end(), r) - p.begin() - 1;
        while (ll > oldll)
        {
            for (auto &k : idx[p[oldll]])
            {
                rmv(s, k, sum);
            }
            oldll++;
        }
        while (rr > oldrr)
        {
            oldrr++;
            for (auto &k : idx[p[oldrr]])
            {
                add(s, k, p[oldrr], sum);
            }
        }
        if (ll > rr)
        {
            cout << 0 << '\n';
            continue;
        }
        cout << sum << '\n';
    }
}

Đpt: \(O((n+q)\log(n))\).

4. Subtask 4

Ta cắt mảng ra thành các phần, mỗi phần có kích thước \(BLOCK\). Sử dụng thuật toán gần giống với sub2 trên mỗi phần tuy nhiên ta sẽ lưu vị trí của \(l',r'\) thay vì lưu \(l',r'\), ta sẽ lấy được kết quả cho mỗi phần. Ta sử dụng RMQ để tìm phần tử gần nhất với 2 đầu mút nằm trong khoảng \([l,r]\). (Vì việc gộp các phần lại cần phải có 2 phần tử đầu mút để tìm chênh lệch đúng). Ta sẽ sắp xếp lại các truy vấn theo \(l\) rồi theo \(r\) để tính vị trí của \(l',r'\) nhanh hơn (mất \(O(q*n/BLOCK)\) thay vì \(O(q*n/BLOCK*\log(BLOCK))\)). Việc xử lý như subtask 2 sẽ mất \(O(BLOCK*n)\).

Code
struct result
{
    long long query;
    int first, last;
};
struct block
{
    // blk^2
    long long mp[BLOCK][BLOCK];
    // blk*log(blk)
    int rmq[lgblk][BLOCK], rmq2[lgblk][BLOCK];
    // blk
    vector<int> p, p2;
    result query(int l, int r, int ll, int rr)
    {
        if (ll > rr)
            return {0, -1, -1};
        int lg = _lg[rr - ll + 1];
        return {mp[ll][rr], p2[min(rmq[lg][ll], rmq[lg][rr - (1 << lg) + 1])],
                p2[max(rmq2[lg][ll], rmq2[lg][rr - (1 << lg) + 1])]};
    }
};
void process(vector<int> p, block &blk)
{
    map<int, vector<int>> idx;
    for (int i = 0; i < BLOCK; i++)
    {
        idx[p[i]].push_back(i);
    }
    blk.p2 = p;
    sort(p.begin(), p.end());
    unique(p.begin(), p.end());
    vector<int> orders(BLOCK);
    while (p.size() != idx.size())
        p.pop_back();
    for (int i = 0; i < BLOCK; i++)
        orders[i] = lower_bound(p.begin(), p.end(), blk.p2[i]) - p.begin();
    blk.p = p;
    for (int i = 0; i < p.size(); i++)
    {
        blk.rmq[0][i] = idx[p[i]].front();
        blk.rmq2[0][i] = idx[p[i]].back();
    }
    for (int i = 1; i <= lgblk; i++)
    {
        for (int j = 0; j + (1 << i) <= p.size(); j++)
        {
            blk.rmq[i][j] =
                min(blk.rmq[i - 1][j], blk.rmq[i - 1][j + (1 << (i - 1))]);
            blk.rmq2[i][j] =
                max(blk.rmq2[i - 1][j], blk.rmq2[i - 1][j + (1 << (i - 1))]);
        }
    }
    int p3[n], prev[n], next[n];
    for (int j = 0; j < p.size(); j++)
    {

        vector<int> idx[BLOCK];
        int counter = 0;
        long long sum = 0;
        for (int i = 0; i < BLOCK; i++)
        {
            if (orders[i] <= j)
            {
                p3[counter] = blk.p2[i];
                prev[counter] = counter - 1;
                next[counter] = counter + 1;
                idx[orders[i]].push_back(counter);
                if (counter >= 1)
                    sum += abs(p3[counter - 1] - blk.p2[i]);
                counter++;
            }
        }
        blk.mp[0][j] = sum;
        for (int i = 1; i <= j; i++)
        {
            for (auto k : idx[i - 1])
            {
                update(p3, prev, next, k, sum, counter);
            }
            blk.mp[i][j] = sum;
        }
    }
}
void sub4(vector<int> &p, vector<query *> &queries)
{
    block blks[n / BLOCK];
    for (int i = 0; i < n / BLOCK; i++)
    {
        process(vector<int>(p.begin() + i * BLOCK, p.begin() + (i + 1) * BLOCK),
                blks[i]);
    }
    vector<int> ll(n / BLOCK), rr(n / BLOCK);
    sort(queries.begin(), queries.end(),
        [](query *a, query *b)
        { return a->l < b->l; });
    for (auto &i : queries)
    {
        for (int j = 0; j < n / BLOCK; j++)
        {
            int psz = blks[j].p.size();
            while (ll[j] < psz && blks[j].p[ll[j]] < i->l)
                ll[j]++;
        }
        i->ll = ll;
    }
    sort(queries.begin(), queries.end(),
        [](query *a, query *b)
        { return a->r < b->r; });
    vector<long long> qres(q);
    for (auto &i : queries)
    {
        long long res = 0;
        vector<int> acc;
        for (int j = 0; j < n / BLOCK; j++)
        {
            int psz = blks[j].p.size();
            while (rr[j] < psz && blks[j].p[rr[j]] <= i->r)
                rr[j]++;
            result x = blks[j].query(i->l, i->r, i->ll[j], rr[j] - 1);
            res += x.query;
            if (x.first != -1)
            {
                acc.push_back(x.first);
                acc.push_back(x.last);
            }
        }
        for (int i = 1; i + 1 < acc.size(); i += 2)
            res += abs(acc[i] - acc[i + 1]);
        qres[i->id] = res;
    }
    for (auto &i : qres)
        cout << i << '\n';
}

Chọn \(BLOCK=\sqrt{n}\) ta có đpt \(O((n+q)\sqrt{n})\).

SymPy

I have created a SymPy package that runs on ARM64, based on Alpine (note: requires Termux,proot/chroot OR any rooted computer that is ARM64 (e.g. RPi,etc.)) just for fun (to see how small it can be).

Download link

  • Usage:
    • Extract this file to your home directory (e.g by tar -xzf ../sympy.cmax.tgz /home/...).
    • Run chroot /home/.../alpine /bin/bash -l
    • You should see a Python shell with SymPy loaded. (Note: exiting the shell will stop the chroot. To stop this (e.g. for customization,etc..) remove the exit 0 line in the .../alpine/etc/profile file.)

First

This is my first post on my site v3.

  • New features:
    • Move to MkDocs
    • Faster & less bloated (only <0.5s build time vs 13s for (unoptimized) v2)
    • ...