Skip to main content
All Arcs
ongoing32 chapterssince Jan 2026

Multimodal Arms Race

32 articles, oldest first

1

With Apple’s new Creator Studio Pro, AI is a tool to aid creation, not replace it

Apple’s Creator Studio Pro leverages AI to help creators with tedious tasks — like finding clips or building slides — without trying to do the work for them.

2

Apple buys Israeli startup Q.ai as the AI race heats up

Q.ai is an Israeli startup specializing in imaging and machine learning, particularly technologies that enable devices to interpret whispered speech and enhance audio in noisy environments.

3

Training Design for Text-to-Image Models: Lessons from Ablations

Training Design for Text-to-Image Models: Lessons from Ablations

4

ElevenLabs CEO: Voice is the next interface for AI

ElevenLabs CEO argued at Web Summit Qatar that voice is the next interface for AI, as OpenAI, Google, and Apple push conversational systems into wearables, new hardware, and everyday interactions.

5
opinion

Last Week in AI #335 - Opus 4.6, Codex 5.3, Gemini 3 Deep Think, GLM 5, Seedance 2.0

A crazy packed edition of Last Week in AI! Plus some small updates.

6
shipped

Alibaba releases Qwen 3.5 with 2-hour video analysis

Alibaba released Qwen 3.5, a multimodal model capable of processing videos up to two hours in length. The model is released under an open-weights strategy, making it freely deployable without per-token API costs. The release expands accessible long-context video understanding to developers who previously relied on closed commercial APIs.

7
opinion

LWiAI Podcast #234 - Opus 4.6, GPT-5.3-Codex, Seedance 2.0, GLM-5

An action-packed episode!

8
shipped

WordPress.com adds an AI Assistant that can edit, adjust styles, create images, and more

The feature is designed to work inside the website to understand its content and layout, allowing site owners to make changes with natural language commands. The WordPress AI assistant doesn't need precisely tailored prompts, either.

9
shipped

Google adds music-generation capabilities to the Gemini app

Users will be able to use text, images, and videos as a reference to generate music.

10
data-backed

Seven frontier models released in 28 days

Seven frontier-class AI models launched within a 28-day window spanning late January to late February 2026, including Opus 4.6, GPT-5.3-Codex, Gemini 3 Deep Think, Sonnet 4.6, and Gemini 3.1 Pro. The release cadence represents a marked acceleration compared to prior years, where comparable capability jumps arrived quarterly at most. The pace raises practical questions for engineering teams about how to structure model dependencies in production applications.

11
opinion

Last Week in AI #336 - Sonnet 4.6, Gemini 3.1 Pro, Anthropic vs Pentagon

Anthropic releases Sonnet 4.6, Google Rolls Out Latest AI Model Gemini 3.1 Pro, Pentagon threatens to cut off Anthropic in AI safeguards dispute

12
shipped

Anthropic launches Claude Memory for all users

Anthropic launched Claude Memory on March 1, 2026, enabling persistent context across sessions for all users. The feature leverages the 1-million-token context windows available in Opus and Sonnet models. Integration with Microsoft PowerPoint and Excel is included at launch.

13
research

PRX Part 3 — Training a Text-to-Image Model in 24h!

PRX Part 3 — Training a Text-to-Image Model in 24h!

14
shipped

Luma launches creative AI agents powered by its new ‘Unified Intelligence’ models

Luma introduced Luma Agents, powered by its new “Unified Intelligence” models, designed to coordinate multiple AI systems and generate end-to-end creative work across text, images, video and audio.

15
shipped

ChatGPT can now create interactive visuals to help you understand math and science concepts

Instead of just reading an explanation or looking at a static diagram, users can now engage directly with interactive visuals.

16
opinion

LWiAI Podcast #236 - GPT 5.4, Gemini 3.1 Flash Lite, Supply Chain Risk

OpenAI launches GPT-5.4 with Pro and Thinking versions, Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro, Where things stand with the Department of War Anthropic

17
shipped

Gamma adds AI image-generation tools in bid to take on Canva and Adobe

The company's new product, called Gamma Imagine, will let users employ text prompts to create brand-specific assets like interactive charts and visualizations, marketing collateral, social graphics, and infographics.

18
opinion

Last Week in AI #339 - DLSS 5, OpenAI Superapp, MiniMax M2.7

DLSS 5 looks like a real-time generative AI filter for video games, OpenAI Reportedly Pivoting to a Focus on Business and Productivity Only, and more!

19
shipped

Gemini 3.1 Flash Live: Making audio AI more natural and reliable

Gemini 3.1 Flash Live is now available across Google products.

20
shipped

Build with Veo 3.1 Lite, our most cost-effective video generation model

Veo 3.1 Lite is now available in paid preview through the Gemini API and for testing in Google AI Studio.

21
shipped

Google now lets you direct avatars through prompts in its Vids app

Google is adding a way to customize and instruct avatars for video creation in the Vids app.

22
shipped

Microsoft takes on AI rivals with three new foundational models

MAI released models that can transcribe voice into text as well as generate audio and images after the group's formation six months ago.

23
shipped

Meta debuts Muse Spark AI model from Superintelligence Labs

Meta released Muse Spark, the first model from its Superintelligence Labs division led by Alexandr Wang, on April 8, 2026. The model, developed codenamed Avocado, scores 86.4% on figure understanding benchmarks but 42.5% on abstract reasoning tasks. The release follows Meta's $14 billion investment in Wang's team and highlights that perception and reasoning capabilities remain difficult to develop in tandem.

24
research

Multimodal Embedding & Reranker Models with Sentence Transformers

Multimodal Embedding & Reranker Models with Sentence Transformers

25
funding

Alibaba leads $290 million investment for building a new kind of AI model as LLM limits emerge

Startup Shengshu plans to use the money for a "general world model," paving the way for more practical robot applications.

26
shipped

Alibaba just revealed it’s behind a viral AI video model dominating leaderboards

A mysterious AI video model that has ascended global leaderboards has been confirmed as a project under Alibaba.

27
shipped

Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Vision-Language Model with Bounding Box Prediction, Multilingual Support, and Sub-250ms Edge Inference

Liquid AI just released LFM2.5-VL-450M, an updated version of its earlier LFM2-VL-450M vision-language model. The new release introduces bounding box prediction, improved instruction following, expanded multilingual understanding, and function calling support — all within a 450M-parameter footprint

28
research

A Coding Implementation of MolmoAct for Depth-Aware Spatial Reasoning, Visual Trajectory Tracing, and Robotic Action Prediction

In this tutorial, we walk through MolmoAct step by step and build a practical understanding of how action-reasoning models can reason in space from visual observations. We set up the environment, load the model, prepare multi-view image inputs, and explore how MolmoAct produces depth-aware reasoning

29
announcement

Apple is building smart glasses without a display to serve as an AI wearable

According to Bloomberg reporter Mark Gurman, Apple is developing smart glasses that skip the display entirely and instead function as an AI wearable. The article Apple is building smart glasses without a display to serve as an AI wearable appeared first on The Decoder.

30
research

New AI model generates 45-minute lip-synced video from one photo and runs in real time

A single image becomes a talking character: LPM 1.0 generates real-time video with lip sync, facial expressions, and emotional reactions. For now, it remains a research project. The article New AI model generates 45-minute lip-synced video from one photo and runs in real time appeared first on The D

31
announcement

OpenAI's leaked memo says new "Spud" model will make all its products "significantly better"

An internal OpenAI memo lays out five strategic priorities for the enterprise business - including a new model codenamed "Spud," a platform play for AI agents, and the blunt accusation that Anthropic is overstating its revenue by 8 billion dollars. The article OpenAI's leaked memo says new "Spud" mo

32
opinion

I tested ChatGPT Plus vs. Gemini Pro to see which is better - and if it's worth switching

Considering ditching ChatGPT Plus for Gemini Pro? I tested both on the same 10 tasks. Here's which came out on top.