
Claude Opus 4.5 costs $75 per million input tokens and $150 per million output tokens. That makes it roughly 5x more expensive than Claude Sonnet and 15x more expensive than Claude Haiku. At those prices, a careless integration can burn through thousands of dollars in a single day. So why do we use it for certain client workloads, and when is it a waste of money?
We have been running production workloads on every Claude model tier since early 2025. Here is the decision framework we use to choose the right model for each use case, backed by actual cost data and quality measurements.
The first thing to understand is that model cost is not just the per-token price. It is the per-token price multiplied by the number of tokens you need to get an acceptable result. This is where Opus 4.5 starts to justify its price for certain workloads. In our testing across 6 production applications, Opus 4.5 required an average of 40% fewer retry loops to produce acceptable output compared to Sonnet. For complex reasoning tasks like code review, legal document analysis, and multi-step data extraction, the retry reduction was even higher at 60%.
Let us put real numbers on this. One of our clients runs an automated contract analysis pipeline. With Sonnet, the pipeline processes a typical 20-page contract in 3.2 passes on average, consuming about 150,000 tokens total at a cost of roughly $0.90 per contract. With Opus 4.5, the same contract is processed in 1.3 passes on average, consuming about 95,000 tokens total at a cost of roughly $10.50 per contract.
On raw cost per contract, Sonnet wins easily. But here is the catch: 12% of Sonnet's outputs required human review and correction, compared to 2% for Opus 4.5. When you factor in the cost of a paralegal spending 20 minutes reviewing and correcting an extraction at $45 per hour, the math changes. Sonnet's true cost per contract becomes $0.90 plus $1.80 in human review overhead, totaling $2.70. Opus 4.5's true cost becomes $10.50 plus $0.30 in human review overhead, totaling $10.80.
Sonnet still wins on cost in that scenario. But for this particular client, the 2% error rate versus 12% error rate mattered more than the cost difference because errors in contract analysis have downstream legal implications. They happily pay the Opus premium for the reliability.
Here is where model selection gets interesting. Not every task in a pipeline needs the same model. We use a pattern we call "tiered inference" where different steps in a workflow use different model tiers based on the complexity and criticality of each step.
For example, in a document processing pipeline, step one is document classification, determining what type of document you are looking at. Haiku handles this with 98% accuracy at a fraction of a cent per document. Step two is key information extraction, pulling out names, dates, and amounts. Sonnet handles this with 94% accuracy for about $0.15 per document. Step three is complex reasoning, determining whether contract terms are favorable, identifying conflicting clauses, or summarizing the document's implications. This is where Opus 4.5 earns its keep.
By using tiered inference, the same pipeline that would cost $10.50 per document with Opus end-to-end costs about $4.20 per document with comparable quality. The trick is identifying which steps genuinely need the most capable model and which steps are just expensive busywork for a model that is overqualified.
We have identified four categories of tasks where Opus 4.5 consistently outperforms cheaper models enough to justify its cost.
The first is multi-step reasoning with long context. When you need the model to hold a complex problem in working memory across a 50,000 plus token context window, Opus makes significantly fewer reasoning errors. We see this most clearly in code review tasks where the model needs to understand the full architecture of a module to identify a subtle bug.
The second is nuanced text generation. When the output needs to be indistinguishable from expert human writing, particularly for client-facing content, Opus produces noticeably better prose. The difference is most apparent in tone, structure, and the ability to maintain a consistent voice across long documents.
The third is ambiguous instructions. When the prompt is not perfectly specified, because real-world inputs are never perfectly specified, Opus is much better at inferring intent. This matters enormously for user-facing applications where you cannot control the quality of input.
The fourth is one-shot complex tasks. When you need to get it right the first time because retries are expensive or impossible, like generating a database migration script or writing a critical email, the higher reliability of Opus pays for itself.
Categories where Opus is a waste of money include simple classification tasks, structured data extraction from well-formatted inputs, code generation for straightforward CRUD operations, and summarization of clearly written text. For these tasks, Sonnet or even Haiku will give you equivalent results at a fraction of the cost.
Our production cost data across all client workloads in Q4 2025 breaks down as follows. Total LLM spend was approximately $47,000 per month. Of that, Opus 4.5 accounted for 31% of spend but only 8% of total tokens. Sonnet accounted for 52% of spend and 45% of tokens. Haiku accounted for 17% of spend and 47% of tokens.
If we had run everything on Opus, our monthly spend would have been approximately $180,000. If we had run everything on Sonnet, it would have been approximately $35,000, but our error rates would have increased by an estimated 15 to 20 percentage points on complex tasks, creating downstream costs in human review.
The practical advice: default to Sonnet for everything, measure quality, and upgrade specific pipeline steps to Opus only when you can demonstrate that the quality improvement justifies the cost. Never start with Opus and optimize down because you will never get around to the optimization. Start cheap, find the pain points, and upgrade surgically.
Claude Opus 4.5 is not overpriced. It is a premium tool for premium problems. The mistake is using it for problems that do not need a premium solution.
About the Author
Fordel Studios
AI-native app development for startups and growing teams. 14+ years of experience shipping production software.
We love talking shop. If this article resonated, let's connect.
Start a ConversationTell us about your project. We'll give you honest feedback on scope, timeline, and whether we're the right fit.
Start a Conversation