Skip to main content
Back to Pulse
SSRN

Prompt Coupling is the New Vendor Lock-in

Read the full articlePrompt Coupling: Formalizing Cross-Model Prompt Dependencies in LLM Systems on SSRN

What Happened

A new paper formalizes prompt coupling — the invisible switching cost that makes your prompts inseparable from specific LLMs. Format-level dependencies cause up to 78.3pp accuracy variation. No existing tool addresses it.

Our Take

Prompts are not portable. A formatting change as minor as removing colons from field delimiters causes up to 78.3 percentage points of accuracy swing on LLaMA-2-13B. The same prompt performs 300% differently on GPT-4-32k depending on whether it is JSON or plain text. Abhishek Sharma's paper formalizes this as prompt coupling — borrowing Constantine's 1974 software engineering coupling taxonomy and applying it to LLM-based systems.

The practical implication: every agentic tool you use (Claude Code, Cursor, Aider) deepens coupling at four simultaneous layers — encoding, interpretation, system wrapping, and execution. Analysis of Aider's 134 production model configurations confirms the pattern. None of the 10+ prompt optimization tools surveyed (DSPy, Prompty, PromptLayer) performs format-level cross-model compilation. The gap is explicit and unaddressed.

The paper introduces promptc — a transparent HTTP proxy that compiles prompts to model-specific formats without code changes to existing tools. With 75% of enterprises running multi-model strategies by mid-2026 (Gartner), prompt coupling is moving from theoretical to a real switching cost.

What To Do

Use DSPy for content optimization but account for format coupling separately — run identical tasks on both Claude and GPT-4 with model-native formatting before picking a primary model, because switching later costs more than you think.

Builder's Brief

Who

teams building multi-LLM or model-agnostic pipelines

What changes

switching cost between models is now quantifiable; architectural decisions around prompt abstraction layers become higher-stakes

When

weeks

Watch for

prompt portability tooling appearing in LangChain or LlamaIndex changelogs

What Skeptics Say

78.3pp accuracy swings likely reflect prompt engineering debt, not an irreducible structural problem. Teams that abstract prompts behind an interface layer already neutralize most of what this paper formalizes.

1 comment

T
Tariq Osei

every clever output format hack you built is now technical debt tied to one vendor. congrats

Cited By

React

Newsletter

Get the weekly AI digest

The stories that matter, with a builder's perspective. Every Thursday.

Loading comments...