Portfolio About Research Blogs Contact
Home / Blogs

Technical
Writing

Deep dives into LLMs, quantization, on-device AI, inference runtimes, and digital health — written for engineers who want the full picture.

14
Filter:
Featured
All Posts
Compression
TurboQuantNew
Google solved the KV cache memory bottleneck — 6× compression, 8× speedup, zero accuracy loss, no retraining.
AI Safety
NemoClaw — NVIDIA's Security Layer for LLM AgentsNew
OpenClaw became the fastest-growing open source project in history. Then the security incidents started.
Inference
LLM Inference Runtimes
GGUF, TensorRT, QNN & everything in between. A deep dive into every major runtime across CPUs, NVIDIA GPUs, Qualcomm NPUs, and Apple Silicon.
Compression
Knowledge Distillation in LLMs
How do you fit a trillion-parameter mind into a phone? Temperature, soft labels, and teaching a small model to think like a large one.
Architecture
SSM vs Mamba vs Transformers
Attention is quadratic. The world is sequential. How three paradigms try to model time, and why it matters more than ever.
Foundations
The Brain Behind ChatGPT: How AI Actually Learns
From a blank model to production AI — plain-English deep dive into every concept, technique, and piece of hardware behind modern LLMs.
Architecture
Sarvam AI: India's Own LLMs
From a 2B parameter Indic language model to a sovereign 105B MoE flagship — how India built its first full-stack foundation models.
Architecture
The Big LLM Architecture Comparison
A deep dive into 20+ modern LLM architectures — from DeepSeek V3 to GLM-5 — comparing MoE, MLA, sliding window attention, and more.
RAG & Retrieval
Vector Databases: The Complete Guide
From concept to implementation: indexing strategies, similarity search, HNSW, IVF, and choosing the right vector store for your RAG pipeline.
RAG & Retrieval
Building a RAG System with SQLite
From concept to implementation: a comprehensive guide to building retrieval-augmented generation systems with SQLite as the vector store.
RAG & Retrieval
How RAG Makes AI Smarter: A Visual Guide
Discover how Retrieval-Augmented Generation combines information retrieval with AI to create more accurate, up-to-date answers.
RAG & Retrieval
Feed Your Own Documents to Local LLMs
How to integrate your documents with LLMs using retraining, RAG pipelines, and context window uploads — trade-offs explained.
Foundations
CNNs Demystified: From Image Recognition to ECG Analysis
How convolutional neural networks work, why they excel at spatial data, and how they translate to biomedical signal processing.