LEVI - KERNEL RITUS

Traditional GPU libraries like cuBLAS are optimized for massive workloads - think training giant neural networks with thousands of parameters. But what happens when you need to multiply small matrices? You get:

Setup overhead that takes longer than the actual computation
Memory allocation optimized for gigabytes, not kilobytes
Complex dispatch logic that assumes large-scale parallelism
Enterprise features you don't need for edge computing

Result: A Ferrari stuck in city traffic.

LEVI GPU Library fills the gap between "basic code" and "industrial-strength cuBLAS" with adaptive kernel selection that picks the right tool for each job.

The Sweet Spot: LEVI dominates in the 64-256 range where edge computing lives.

LEVI uses proprietary Kernel Ritus technology to automatically select optimal algorithms:

Two-Kernel Architecture:

// Simple Kernel (small matrices)

Minimal setup overhead
Cache-optimized access patterns
Loop unrolling for better IPC
Perfect for edge devices

// Tiled Kernel (medium+ matrices)

Shared memory utilization
Bank conflict avoidance
Optimized for throughput
Competitive with cuBLAS

Perfect For Edge Computing

Why Edge Needs Different Optimization:

Traditional Data Centers:

Batch size: 1024+ samples
Matrix size: 2048×2048+
Memory: Abundant
Power: Unlimited

Edge Computing:

Batch size: 1-32 samples
Matrix size: 64-512×64-512
Memory: Limited
Power: Battery-constrained

LEVI targets exactly this gap.

When to Use LEVI vs cuBLAS:

Use LEVI when:

✅ Matrix size < 512×512

✅ Edge/mobile deployment

✅ Power/memory constraints

✅ Batch processing many small problems

Use cuBLAS when:

✅ Matrix size > 512×512

✅ Data center deployment

✅ Maximum absolute throughput needed

✅ Deep learning training

More Information on our GitHub Repository!

https://github.com/forgottenforge/levi-gpu

getforged

LEVI - KERNEL RITUS

Two-Kernel Architecture:

Perfect For Edge Computing

Why Edge Needs Different Optimization:

When to Use LEVI vs cuBLAS:

Contact us