← Library
source: "raw/articles/github-karpathyautoresearch-ai-agents-running-research-on-single-gpu-nanochat-tr.md"
Summary: Autoresearch - AI Agents for Autonomous ML Research
TL;DR: Karpathy's project gives AI agents a small LLM training setup to experiment autonomously overnight, modifying code and hyperparameters to improve model performance within fixed 5-minute training budgets.
Key Points
- AI agents autonomously modify
train.py(contains full GPT model, optimizer, training loop) while humans programprogram.mdinstruction files - Fixed 5-minute wall clock training budget per experiment, allowing ~12 experiments/hour or ~100 overnight
- Uses validation bits per byte (val_bpb) as the optimization metric - lower is better and vocab-size independent
- Built on simplified single-GPU implementation of nanochat training code
- Requires single NVIDIA GPU (tested on H100), Python 3.10+, and uv package manager
- Agent only touches one file (
train.py) whileprepare.pyhandles data prep and utilities (unchangeable) - Self-contained design with no external dependencies beyond PyTorch
- Includes platform-specific forks for MacOS, Windows, and AMD systems
- Designed for overnight autonomous research: "You wake up in the morning to a log of experiments and (hopefully) a better model"
Concepts Covered
- Autonomous AI Research — core premise of agents conducting ML research without human intervention
- GPT Model Architecture — implementation includes full GPT model that agents can modify
- Hyperparameter Optimization — agents experiment with architecture, batch size, optimizer settings
- Training Loop Optimization — agents modify the core training logic and procedures
- Validation Metrics — uses bits per byte metric for fair comparison across different model configurations
- Single-GPU Training — simplified training setup designed for individual researchers
- Muon Optimizer — mentions specific optimizer implementation alongside AdamW
- BPE Tokenization — includes tokenizer training as part of setup process
Images/Figures
img-0.png: Teaser image showing progress visualization (referenced as progress.png in repository)