Skip to main content
All Arcs
ongoing5 chapterssince Feb 2026

Model Context Length Race

5 articles, oldest first

1

Claude Opus 4.6 released with 1M token context window

Anthropic released Claude Opus 4.6 on February 5, 2026, featuring a 1 million token context window. The model recorded top scores on coding and agentic reasoning benchmarks at release. A 1M context limit in a production model removes the architectural constraint that made RAG pipelines necessary for large codebases and document sets.

2
shipped

OpenAI launches GPT-5.4 with 1M-token context window

OpenAI released GPT-5.4 on March 5, 2026, featuring a 1-million-token context window and native multi-step autonomous workflow execution. The model scored 75% on the OSWorld-V desktop task benchmark, surpassing average human performance. The release brings GPT to parity with Claude on long-context tasks and advances autonomous agent reliability significantly.

3
research

Ulysses Sequence Parallelism: Training with Million-Token Contexts

Ulysses Sequence Parallelism: Training with Million-Token Contexts

4
shipped

Google launches Gemini 3.1 Ultra with 2M-token context

Google released Gemini 3.1 Ultra on March 20, 2026, with a 2-million-token context window, doubling the capacity of any current competitor. The model supports native multimodal reasoning across text, images, and audio without intermediate conversion or transcription steps. Sandboxed code execution is included natively, positioning the model for agentic and developer-facing workflows.

5
research

Researchers from MIT, NVIDIA, and Zhejiang University Propose TriAttention: A KV Cache Compression Method That Matches Full Attention at 2.5× Higher Throughput

Long-chain reasoning is one of the most compute-intensive tasks in modern large language models. When a model like DeepSeek-R1 or Qwen3 works through a complex math problem, it can generate tens of thousands of tokens before arriving at an answer. Every one of those tokens must be stored in what is