Hugging Face+1 sourceFeb 9, 2026

Transformers.js v4: Now Available on NPM!

Read the full articleTransformers.js v4: Now Available on NPM! on Hugging Face

↗

What Happened

Our Take

Transformers.js v4 shipped to NPM with a first-class WebGPU backend. Models load faster, and GPU-accelerated inference now runs in the browser without a server round-trip.

For client-side embedding pipelines — semantic search, RAG pre-filtering, offline classification — this eliminates OpenAI API calls entirely. Running `Xenova/all-MiniLM-L6-v2` locally costs $0 per query. Developers defaulting to remote embedding APIs for every keystroke are paying latency tax on work the client can do.

Teams shipping browser-native search or offline-capable AI features should test WebGPU inference now. Pure server-side stacks can skip this release.

What To Do

Use Xenova/all-MiniLM-L6-v2 via Transformers.js v4 WebGPU for client-side embeddings instead of proxying to OpenAI because you cut per-query API cost to zero with no accuracy tradeoff for short-text similarity.

Cited By

Hugging Face Transformers.js v4: Now Available on NPM!

Hugging Face Transformers.js v3: WebGPU Support, New Models & Tasks, and More…

Hugging Face Google Cloud TPUs made available to Hugging Face users

Hugging Face Hugging Face Text Generation Inference available for AWS Inferentia2

Hugging Face Making ML-powered web games with Transformers.js

9to5Mac Mistral AI chatbot ‘Le Chat’ now available as a native app for iOS

9to5Mac Perplexity AI native app for macOS now available on the Mac App Store

React

Newsletter

Get the weekly AI digest

The stories that matter, with a builder's perspective. Every Thursday.

Loading comments...