
LEVI - KERNEL RITUS
Once upon a time, there was a matrix multiplication. It wasn't particularly large — 64 by 64, to be precise — and it had, at its core, only one modest wish: to be multiplied. Quickly, quietly, without fuss. Like a decent person who drinks their morning coffee and then goes to work.
Instead, it was handed to cuBLAS.
Now, cuBLAS — and this must be said in the interest of fairness — is a magnificent library. For large matrices, there is nothing finer. cuBLAS is like an ocean liner: majestic, powerful, built for the great crossings. The problem arises when you take an ocean liner to the bakery.
Our little 64×64 matrix wanted to go to the bakery. And cuBLAS fired up the liner.
The engines were warmed. The crew was assembled. The captain was notified. Forms were filled out, dispatch tables consulted, and memory regions allocated that nobody needed. By the time the actual multiplication took place — it lasted the blink of an eye — the entire administrative apparatus surrounding it had already taken longer than the calculation itself.
This is like ordering a removal van to deliver a letter to the post office. It works. But it is not what one would call elegant.
LEVI Edge is the bicycle.
It makes no grand preparations. It does not park a removal van in the driveway. It does not telephone the captain. It takes the small matrix, multiplies it, and returns the result. Done. Two to four times faster than the ocean liner, and with considerably less diesel.
The usage is of a simplicity that borders on impertinence:
import levi_edge
levi_edge.patch()
Two lines. After that, all your small matrix multiplications run faster. The rest of your code remains exactly as it was. You need not rewrite anything, restructure anything, or learn anything new. It is as though someone came into your office overnight and sharpened all your pencils. The next morning, you simply notice that everything flows a little better.
Large matrices? Still handed to cuBLAS. LEVI Edge is not megalomaniacal. It knows what it is for. It also knows what it is not for. This kind of self-awareness is so rare in the software industry that one might almost put it in a museum.
Who This Is For
LEVI Edge is for machines that have very little room but are nonetheless expected to think. For Jetson Nanos tucked inside robots. For small GPUs sitting in sensors. For transformer attention heads which — and this is not a joke — are typically only 32 to 128 wide and yet behave as though they require the full machinery of a mainframe.
One might say: LEVI Edge is for everyone who must achieve a great deal with very little. Most of us know the feeling. Not only in computer science.
Who This Is Not For
If you are training GPT, you do not need LEVI Edge. If your matrices are so large that looking at them induces vertigo, you need cuBLAS, and rightly so. If you do not have a GPU, you first need a GPU and only then LEVI Edge. And if you do not know what a matrix is — well, then you first need an evening with a good textbook, a cup of tea, and some patience.
We cannot, regrettably, fix everything. Only that which can be fixed.
How It Works (For the Curious)
LEVI Edge intercepts your matrix multiplications — politely, mind you, not like a highwayman but rather like an attentive doorman — and checks: Is the matrix small enough? Are all dimensions under 256? Is it float32? Is it on the GPU?
If yes, a hand-tuned CUDA kernel takes over. There are two varieties: a simple one for the very small matrices, with clever loop unrolling and no unnecessary ballast. And one with shared memory for the slightly larger candidates that deserve a bit more attention.
If no, the matrix is politely passed on to cuBLAS, and nobody notices that anything happened at all.
It is a bit like a good tailor: he alters only what needs altering. The rest he leaves in peace.
The Moral of the Story
In the world of computers, there exists an old and widely held misconception: that bigger is always better. Bigger libraries. Bigger frameworks. Bigger layers of abstraction. More overhead, more dispatch, more of everything.
Sometimes this is even true. For the big problems, you need big tools.
But for the small problems — and there are so many more small problems than big ones — you need small tools. Precise ones. Fast ones. The kind that do their work and don't spend days talking about it afterwards.
LEVI Edge is a small tool. It solves a small problem. But it solves it well.
And that, if one is honest, is more than can be said for most things in this world.
pip install levi-edge
Two lines of code. Two to four times faster. Zero excuses.
Forgotten Forge — Because good code does not need to be a poem. But it doesn't hurt, either.
More Information on our GitHub Repository!



