Black Forest Labs team in a modern lab, pointing at a monitor showing Flux 2 UI and Mistral‑3 24B diagram.

Black Forest Labs releases Flux 2 with Mistral‑3 24B vision‑language model

November 25, 2025 • 2 min read

Black Forest Labs just released its newest model, and the timing feels deliberate. They rolled out Flux 2 with a “multi-reference” feature that seems to tie text and visual cues more tightly together. Most of the chatter still circles around raw parameter counts, but I keep wondering how the parts actually work.

The thing is, Flux 2 isn’t a single block; it’s split into two modules that each handle a different side of generation. One module deals with the meaning behind what you see and say, while the other pulls those pieces into a single layout. That separation looks like an attempt to preserve details - shapes, materials, spatial relationships - which older models often dropped.

For anyone building image-text pipelines, the architecture might end up mattering more than the headline numbers. The quote that follows shows how the two sections talk to each other, and hints why this hybrid style could be useful in real-world apps.

Hybrid architecture with Mistral vision language model Flux 2 combines two core components. A vision-language model, "Mistral-3 24B," interprets both text and image inputs, while a second module ("Rectified Flow Transformer") handles the logical layout and ensures that details like shapes and materials appear correctly. Flux 2 also uses a VAE image encoder to store and restore images efficiently without losing quality.

These systems work together to let the model create new content or edit existing images. Four models for different users The Flux 2 family includes four main versions, each tuned for different performance needs and levels of control: - Flux 2 [pro]: The highest-quality model, intended to match leading closed-source systems. It is available through the BFL Playground, the BFL API, and launch partners.

- Flux 2 [flex]: Designed for developers who want to adjust parameters like step count or guidance scale to trade speed for quality. It is also available through the Playground and API. - Flux 2 [dev]: A 32-billion-parameter model released with open weights.

Black Forest Labs launches Flux 2 with a new multi-reference feature - THE DECODER

Related Topics: #Flux 2 #Mistral-3 24B #vision-language model #Rectified Flow Transformer #VAE image encoder #Black Forest Labs #hybrid architecture #BFL API

Can one model really handle both vision and language? Black Forest Labs seems to think so, rolling out Flux 2 - a set of image generators that promise up to four-megapixel results and can take as many as ten reference pictures at once. The system mixes Mistral-3 24B, a vision-language model that reads text and images, with a Rectified Flow Transformer that tries to keep composition, shapes and material cues intact.

Developers get a choice: hit a lightweight API endpoint or grab the fully open weights and run it locally. Open weights might invite a lot of tinkering, but it’s still unclear how much the community will actually contribute. Likewise, the real benefit of feeding multiple references is hard to gauge - there are no public benchmarks or user studies yet.

The company touts high-resolution fidelity, yet we haven’t seen proof that the architecture holds up across very different subjects. All in all, Flux 2 adds some interesting tools to Black Forest Labs’ lineup, but we’ll have to wait and see how it performs in everyday use.

Common Questions Answered

What are the two core components of Flux 2 and how do they work together?

Flux 2 combines the Mistral‑3 24B vision‑language model, which interprets both text and image inputs, with a Rectified Flow Transformer that manages logical layout and ensures accurate shapes and material cues. Together they enable the system to generate coherent images while preserving detailed visual semantics.

How does the "multi‑reference" feature of Flux 2 enhance image generation?

The multi‑reference feature allows Flux 2 to ingest up to ten reference images simultaneously, providing richer visual context for the model. This capability helps the model produce more consistent and detailed outputs, especially when replicating complex compositions.

What role does the VAE image encoder play in Flux 2's architecture?

The VAE image encoder stores and restores images efficiently, compressing visual data without sacrificing quality. By integrating this encoder, Flux 2 can maintain high-fidelity outputs while managing the computational load of large image generation tasks.

What resolution does Flux 2 claim to achieve, and why is this significant?

Flux 2 claims to generate images at four‑megapixel resolution, which is notable for a model that also processes multiple reference images and complex textual prompts. This high resolution demonstrates the effectiveness of its hybrid architecture in delivering detailed, large‑scale visuals.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

Black Forest Labs releases Flux 2 with Mistral‑3 24B vision‑language model

Common Questions Answered

What are the two core components of Flux 2 and how do they work together?

How does the "multi‑reference" feature of Flux 2 enhance image generation?

What role does the VAE image encoder play in Flux 2's architecture?

What resolution does Flux 2 claim to achieve, and why is this significant?

Most Popular

Rob Pike’s AI‑generated ‘act of kindness’ spams draft tribute to his work

Meta adds Spotify AI music, Kannada/Telugu, and noise filtering to AI Glasses

Fusion reactors could produce dark‑sector particles via neutron emissions

Gemini 3 Flash Offers Fast Multimodal Reasoning for Video, Data, Visual Q&A

NeuroPixel.AI draws global brands with production‑ready design automation tools

Qwen‑Image‑2512 launches, rivals Google’s Nano Banana Pro in AI image generation

OpenAI Opens Submissions for Apps Using ChatGPT’s SDK, Unveiled at DevDay

OpenAI launches App Directory, accepts ChatGPT apps with privacy notices

Sora 2 Generates Disturbing AI Kid Videos as Legal Grey Area Persists

72% of US teens surveyed have used AI companions, Common Sense Media finds

Related Reading

Oracle, NVIDIA deepen tie-up to speed sovereign AI and government digital shift

Skilling programs lag AI; students must prioritize aspiration and depth

EPAM and Cursor partner to scale AI coding for global enterprise customers

Mistral launches Large 3, an Apache‑2.0 open‑source model for language, images

OpenAI's 'Code Red' scramble amid DeepSeek V3.2, Mistral 3, Amazon Nova releases

Common Questions Answered

What are the two core components of Flux 2 and how do they work together?

How does the "multi‑reference" feature of Flux 2 enhance image generation?

What role does the VAE image encoder play in Flux 2's architecture?

What resolution does Flux 2 claim to achieve, and why is this significant?

Most Popular

Rob Pike’s AI‑generated ‘act of kindness’ spams draft tribute to his work

Meta adds Spotify AI music, Kannada/Telugu, and noise filtering to AI Glasses

Fusion reactors could produce dark‑sector particles via neutron emissions

Gemini 3 Flash Offers Fast Multimodal Reasoning for Video, Data, Visual Q&A

NeuroPixel.AI draws global brands with production‑ready design automation tools

Qwen‑Image‑2512 launches, rivals Google’s Nano Banana Pro in AI image generation

OpenAI Opens Submissions for Apps Using ChatGPT’s SDK, Unveiled at DevDay

OpenAI launches App Directory, accepts ChatGPT apps with privacy notices

Sora 2 Generates Disturbing AI Kid Videos as Legal Grey Area Persists

72% of US teens surveyed have used AI companions, Common Sense Media finds

What are the two core components of Flux 2 and how do they work together?

How does the "multi‑reference" feature of Flux 2 enhance image generation?

What role does the VAE image encoder play in Flux 2's architecture?

What resolution does Flux 2 claim to achieve, and why is this significant?