
LEVI - KERNEL RITUS
Traditional GPU libraries like cuBLAS are optimized for massive workloads - think training giant neural networks with thousands of parameters. But what happens when you need to multiply small matrices? You get:
-
Setup overhead that takes longer than the actual computation
-
Memory allocation optimized for gigabytes, not kilobytes
-
Complex dispatch logic that assumes large-scale parallelism
-
Enterprise features you don't need for edge computing
-
Result: A Ferrari stuck in city traffic.
LEVI GPU Library fills the gap between "basic code" and "industrial-strength cuBLAS" with adaptive kernel selection that picks the right tool for each job.
The Sweet Spot: LEVI dominates in the 64-256 range where edge computing lives.
LEVI uses proprietary Kernel Ritus technology to automatically select optimal algorithms:
Two-Kernel Architecture:
// Simple Kernel (small matrices)
-
Minimal setup overhead
-
Cache-optimized access patterns
-
Loop unrolling for better IPC
-
Perfect for edge devices
// Tiled Kernel (medium+ matrices)
-
Shared memory utilization
-
Bank conflict avoidance
-
Optimized for throughput
-
Competitive with cuBLAS
Perfect For Edge Computing
Why Edge Needs Different Optimization:
Traditional Data Centers:
-
Batch size: 1024+ samples
-
Matrix size: 2048×2048+
-
Memory: Abundant
-
Power: Unlimited
Edge Computing:
-
Batch size: 1-32 samples
-
Matrix size: 64-512×64-512
-
Memory: Limited
-
Power: Battery-constrained
LEVI targets exactly this gap.
When to Use LEVI vs cuBLAS:
Use LEVI when:
✅ Matrix size < 512×512
✅ Edge/mobile deployment
✅ Power/memory constraints
✅ Batch processing many small problems
Use cuBLAS when:
✅ Matrix size > 512×512
✅ Data center deployment
✅ Maximum absolute throughput needed
✅ Deep learning training
More Information on our GitHub Repository!



