OpenAI GPT-5 Launch 2025: Features, Benchmarks, and What It Means

OpenAI released GPT-5 on August 7, 2025, with sweeping upgrades: a single unified architecture, expert-level reasoning, and a 400,000-token context window, according to “Introducing GPT-5 for developers”. The 400,000-token context alone is a game-changer — it means you can feed the entire script of a TV series or a massive codebase into a single prompt. Industry figures confirm this marks a shift toward more capable, versatile AI systems. Developers and enterprises will push these limits hard. You can see that ambition reflected in the OpenAI roadmap.

Unified System & How GPT-5 Decides

OpenAI designed GPT-5 as a unified system with multiple operational modes: a fast responsive model and a deeper “reasoning_effort” mode for complex tasks, According to OpenAI. When users specify high reasoning effort or insert cues like “think through this,” the router activates reasoning mode, increasing depth, precision, safety while demanding more tokens. All three sizes—GPT-5, GPT-5 Mini, and GPT-5 Nano—support these modes simultaneously. Such design gives users fine control. Developers now pick speed or depth trade-offs intentionally.

Benchmarks & Safety Improvements

GPT-5 benchmarks show a 74.9% score on SWE-bench Verified versus o3’s 69.1%, and 88% on Aider Polyglot, representing a one-third reduction in error rate compared to o3. Accuracy jumps. Experts note this leap signals growing confidence in automated code generation. The model achieves 84.2% on MMMU (multimodal understanding) and 46.2% on HealthBench Hard, up from 31.6% for o3, indicating hefty gains in medical reasoning. With web search enabled on anonymized prompts, GPT-5 delivers about 45% fewer factual errors than GPT-4o; in thinking mode, roughly 80% fewer than o3. Its 400,000-token context window improves performance on long documents dramatically. Users get safer, smarter responses under pressure. The evidence demands confidence.

Models, Sizes & Pricing Tiers

OpenAI’s pricing for GPT-5 is $1.25 per 1 million input tokens and $10 per 1 million output tokens. GPT-5 Mini charges $0.25 input / $2 output per million tokens; GPT-5 Nano costs $0.05 input / $0.40 output. All variants share API features like reasoning_effort, verbosity controls, custom tools, streaming, and built-in tools including web search and image generation. This tiered pricing creates clear cost scaling: high precision comes with higher spend. Lightweight tasks stay affordable.

API Capabilities & Enterprise Reach

OpenAI says GPT-5 powers agentic tasks by chaining tool calls reliably in sequence or parallel, handling tool errors, and scaling reasoning across long dialogues. In coding benchmarks, GPT-5 beats o3 at frontend web development 70% of the time. Enterprises including BNY, Lowe’s, SoftBank, T-Mobile, and Figma already use GPT-5 via ChatGPT Business and API products. Paid business-tier users number around 5 million. Platform rollout spans Microsoft 365 Copilot, GitHub Copilot, Azure AI Foundry among others. Tooled intelligence for real workflows has arrived.

What Developers Should Watch

GPT-5 offers API parameters like reasoning_effort (minimal, low, medium, high) and verbosity that directly influence performance cost trade-offs. For example, SWE-bench Verified tests show GPT-5 uses 22% fewer output tokens and 45% fewer tool calls than o3 under high reasoning effort while delivering higher accuracy. Prompt quality affects outputs; complex, well-structured prompts unlock reasoning strength and accuracy. In high-risk domains like health and legal, human verification remains essential. Precision still demands responsibility.

According to Michael Truell, co-founder & CEO at Cursor: “GPT-5 is the smartest coding model we’ve used… remarkably intelligent, easy to steer. Even to have a personality we haven’t seen in any other model. It has become our daily driver for everything from scoping and planning PRs to completing end-to-end builds.”

Release Timeline & Access

August 7, 2025 – OpenAI launches GPT-5 API for developers, introducing unified architecture and reasoning modes alongside GPT-5 Mini and Nano tiers. OpenAI launch post.
February 13, 2026 – OpenAI retires older models including GPT-4o, GPT-4.1, OpenAI o-series mini versions; GPT-5 Instant and GPT-5 Thinking are no longer available. OpenAI Help Center.
April 23, 2026 – OpenAI releases GPT-5.5; it becomes default in ChatGPT and Codex for paid users. TechCrunch coverage.

Additional Domains: Health & Clinical Strength

In biomedical NLP evaluation across 12 datasets including named entity recognition, relation extraction, classification, summarization, GPT-5 achieves a macro-average score of 0.557 under five-shot prompting against GPT-4o’s 0.508, per “Benchmarking GPT-5 for biomedical natural language processing”. That edge in text-based medical reasoning doesn’t extend to imaging. Specialized mammography models exceed 80% accuracy, but the GPT-5 family reaches only 52-64% in similar tasks, as shown by cross-sectional evaluation “Evaluating GPT-5 as a Multimodal Clinical Reasoner”. Experts note this gap underscores the need for specialist models in medical imaging, even as text-based performance improves.

Performance on Economically Valuable Jobs

GPT-5 evaluations include tests across over 40 occupations—law, logistics, sales, engineering—where under reasoning effort settings, GPT-5 matches or surpasses expert performance in roughly half the cases. Comparison across economically weighted tasks highlights GPT-5’s readiness for knowledge work automation. That also sets high expectations—and regulatory scrutiny—for its deployment in jobs involving trust, safety, and ethics.