Rudrite Research — AI & ML papers, made legible
Rudrite Research — the frontier, made legible<br>Interactive, animated, visual explainers of landmark AI & ML papers — the systems and ideas behind the models you use, redrawn and made legible. Free and open.<br>Browse all 100 explainers · Guided reading tracks<br>Attention Is All You Need<br>FlashAttention<br>PagedAttention (vLLM)<br>Megatron-LM<br>DeepSeek-R1<br>GPT-3: Language Models are Few-Shot Learners<br>ZeRO: Zero Redundancy Optimizer<br>Mixtral of Experts<br>Training Compute-Optimal Large Language Models<br>Mamba: Linear-Time Sequence Modeling with Selective State Spaces<br>BERT: Pre-training of Deep Bidirectional Transformers<br>DeepSeek-V3<br>Qwen3<br>OLMo 2<br>MiniMax-01<br>Gemma 4<br>Scaling Laws for Neural Language Models<br>Adam: A Method for Stochastic Optimization<br>Deep Residual Learning for Image Recognition<br>Denoising Diffusion Probabilistic Models<br>Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity<br>LoRA: Low-Rank Adaptation of Large Language Models<br>GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism<br>GSPMD: General and Scalable Parallelization for ML Computation Graphs<br>Pathways: Asynchronous Distributed Dataflow for ML<br>Ring Attention with Blockwise Transformers for Near-Infinite Context<br>Efficiently Scaling Transformer Inference<br>Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving<br>Fast Inference from Transformers via Speculative Decoding<br>Chain-of-Thought Prompting Elicits Reasoning in Large Language Models<br>Training language models to follow instructions with human feedback<br>Direct Preference Optimization: Your Language Model is Secretly a Reward Model<br>DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models<br>Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters<br>Constitutional AI: Harmlessness from AI Feedback<br>DAPO: An Open-Source LLM Reinforcement Learning System at Scale<br>Tree of Thoughts: Deliberate Problem Solving with Large Language Models<br>ReAct: Synergizing Reasoning and Acting in Language Models<br>FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision<br>Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality<br>DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model<br>EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty<br>AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration<br>RoFormer: Enhanced Transformer with Rotary Position Embedding<br>An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale<br>Learning Transferable Visual Models From Natural Language Supervision<br>High-Resolution Image Synthesis with Latent Diffusion Models<br>Scalable Diffusion Models with Transformers<br>Robust Speech Recognition via Large-Scale Weak Supervision<br>Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention<br>Group Sequence Policy Optimization<br>DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving<br>CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion<br>GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding<br>GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints<br>YaRN: Efficient Context Window Extension of Large Language Models<br>Efficient Streaming Language Models with Attention Sinks<br>Generative Adversarial Networks<br>Segment Anything<br>Visual Instruction Tuning<br>s1: Simple test-time scaling<br>Tülu 3: Pushing Frontiers in Open Language Model Post-Training<br>Let's Verify Step by Step<br>Self-Consistency Improves Chain of Thought Reasoning in Language Models<br>Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks<br>SWE-bench: Can Language Models Resolve Real-World GitHub Issues?<br>The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits<br>KAN: Kolmogorov–Arnold Networks<br>Differential Transformer<br>Mixture-of-Depths: Dynamically allocating compute in transformer-based language models<br>RWKV: Reinventing RNNs for the Transformer Era<br>Titans: Learning to Memorize at Test Time<br>Byte Latent Transformer: Patches Scale Better Than Tokens<br>The Llama 3 Herd of Models<br>Mistral 7B<br>Phi-4 Technical Report<br>FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning<br>Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads<br>Scaling Rectified Flow Transformers for High-Resolution Image Synthesis<br>Flow Matching for Generative Modeling<br>Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty<br>Rewarding Doubt: Calibrated Confidence Expression of LLMs<br>Why Language Models Hallucinate<br>τ-bench: Tool-Agent-User Interaction in Real-World Domains<br>ToolRL: Reward is All Tool Learning Needs<br>Group-in-Group Policy Optimization for LLM Agent Training<br>MiniMax-M1: Scaling Test-Time Compute with Lightning Attention<br>ProRL: Prolonged RL Expands Reasoning Boundaries<br>The Entropy Mechanism...