Technical
Writing

Deep dives into LLMs, quantization, on-device AI, inference runtimes, and digital health — written for engineers who want the full picture.

14

Featured

Quantization in LLMs

A 70B model needs 140 GB at float16. Your GPU has 24 GB. Quantization is the art of making models smaller without making them dumber — from the basics of bits and buckets to GPTQ, AWQ, NF4 and beyond.

All Posts

Google solved the KV cache memory bottleneck — 6× compression, 8× speedup, zero accuracy loss, no retraining.

NemoClaw — NVIDIA's Security Layer for LLM AgentsNew

OpenClaw became the fastest-growing open source project in history. Then the security incidents started.

LLM Inference Runtimes

GGUF, TensorRT, QNN & everything in between. A deep dive into every major runtime across CPUs, NVIDIA GPUs, Qualcomm NPUs, and Apple Silicon.

Knowledge Distillation in LLMs

How do you fit a trillion-parameter mind into a phone? Temperature, soft labels, and teaching a small model to think like a large one.

SSM vs Mamba vs Transformers

Attention is quadratic. The world is sequential. How three paradigms try to model time, and why it matters more than ever.

The Brain Behind ChatGPT: How AI Actually Learns

From a blank model to production AI — plain-English deep dive into every concept, technique, and piece of hardware behind modern LLMs.

Sarvam AI: India's Own LLMs

From a 2B parameter Indic language model to a sovereign 105B MoE flagship — how India built its first full-stack foundation models.

The Big LLM Architecture Comparison

A deep dive into 20+ modern LLM architectures — from DeepSeek V3 to GLM-5 — comparing MoE, MLA, sliding window attention, and more.

RAG & Retrieval

Vector Databases: The Complete Guide

From concept to implementation: indexing strategies, similarity search, HNSW, IVF, and choosing the right vector store for your RAG pipeline.

RAG & Retrieval

Building a RAG System with SQLite

From concept to implementation: a comprehensive guide to building retrieval-augmented generation systems with SQLite as the vector store.

RAG & Retrieval

How RAG Makes AI Smarter: A Visual Guide

Discover how Retrieval-Augmented Generation combines information retrieval with AI to create more accurate, up-to-date answers.

RAG & Retrieval

Feed Your Own Documents to Local LLMs

How to integrate your documents with LLMs using retraining, RAG pipelines, and context window uploads — trade-offs explained.

CNNs Demystified: From Image Recognition to ECG Analysis

How convolutional neural networks work, why they excel at spatial data, and how they translate to biomedical signal processing.