Powering the agents: Workers AI now runs large models, starting with Kimi K2.5

Read the full articlePowering the agents: Workers AI now runs large models, starting with Kimi K2.5 on Cloudflare

What Happened

Kimi K2.5 is now on Workers AI, helping you power agents entirely on Cloudflare’s Developer Platform. Learn how we optimized our inference stack and reduced inference costs for internal agent use cases.

Our Take

honestly? they're just shoving large models onto cheap infrastructure and calling it innovation. kimi k2.5 running on workers ai shows the industry moving past centralized GPU farms for simple agent tasks, which is smart infrastructure work. it's about cost per token, not just model size. we're finally seeing practical, low-cost ways to run inference without setting up a dedicated cluster.

What To Do

start experimenting with cloud-based inference platforms for your internal agent workloads

Builder's Brief

Who

teams building agent pipelines on Cloudflare Workers

What changes

can run LLM inference without leaving the Cloudflare ecosystem, eliminating a third-party GPU provider hop

When

now

Watch for

latency and token-per-second benchmarks published by independent teams vs. dedicated GPU providers

What Skeptics Say

Cloudflare's edge inference stack is still catching up to dedicated GPU clouds on throughput and pricing; Kimi K2.5 is a mid-tier model and the real test—whether agent workloads tolerate the latency and cost profile of CDN-native inference at scale—hasn't happened yet.

Cited By

Cloudflare Powering the agents: Workers AI now runs large models, starting with Kimi K2.5