Mistral Medium 3.5: Open-Weight 128B Performance and Free Usage Deep Dive
Mistral's April 2026 flagship is a single 128B model that retired two specialist models, opened its weights to the world, and showed that Europe's AI challengers aren't bluffing.
By Free AI News Editorial · · · 9 min read
On April 30, 2026, Mistral AI dropped a model that made a quiet but significant argument: you do not need to be locked into a proprietary API to get frontier-class performance on real-world coding tasks. Mistral Medium 3.5 is a single 128-billion-parameter dense model that consolidates three previously separate Mistral products into one set of freely downloadable weights. For developers who have been weighing open-source models against closed-weight offerings from OpenAI and Anthropic, this release moved the comparison meaningfully closer. Here is everything you need to know about what shipped, how it performs, and how to access it for free.
What Is Mistral Medium 3.5 and Why Does It Matter?
Mistral Medium 3.5 is Mistral AI's first flagship merged model. Rather than shipping separate dedicated models for reasoning, coding, and general instruction-following, the company merged all three capability sets into a single 128B dense architecture. The model was released on April 29-30, 2026 with weights published on Hugging Face under a modified MIT license.
The "medium" name is a bit misleading at this scale. At 128 billion parameters, this is not a lightweight model by any ordinary definition. What Mistral means by "medium" is its position in their product line -- between their smaller edge-optimized models (including Mistral Small 4, which launched earlier in 2026 as a 22B Apache-licensed model) and a theoretical larger flagship. Medium 3.5 sits at the top of what Mistral currently offers in the open-weight category.
The timing also signals ambition. The release followed Mistral's announcement of an 830 million dollar debt financing round to build a 13,800-GPU data center outside Paris. Medium 3.5 is widely understood to be the first flagship model trained substantially on that expanded compute footprint. For an AI lab that positions itself as Europe's primary challenger to U.S. and Chinese AI dominance, this release is both a technical statement and a strategic one.
How Does Mistral Medium 3.5 Perform on Key Benchmarks?
Mistral led their launch announcement with two specific benchmarks, which tells you a lot about what the model was engineered to do well:
| Benchmark | Score | What It Measures |
|---|---|---|
| SWE-Bench Verified | 77.6% | Real-world bug fixes on open-source Python repos |
| tau3-Telecom | 91.4% | Multi-turn agentic tool calling (telecom domain) |
| MMLU (Mixtral 8x7B reference) | 70.6% | Broad academic knowledge across 57 subjects |
The 77.6% SWE-Bench Verified result deserves particular attention. SWE-Bench Verified is the de facto industry standard for real-world coding ability. Models are given actual bug reports from open-source GitHub repositories and graded on whether their generated patches pass the actual test suites -- no partial credit, no self-reported evaluation. A score in the high 70s means the model produces patches that work in real codebases, not just synthetic coding exercises. Medium 3.5's 77.6% exceeds its own predecessor Devstral 2 and outperforms Qwen3.5 397B A17B -- a sparse MoE model with vastly more total parameters -- on the same benchmark.
That said, the proprietary frontier models from Anthropic and OpenAI have been reporting higher SWE-Bench Verified scores in their 2026 releases. Mistral is not claiming to beat Claude Opus 4 or GPT-5.4 here. The story they are telling is: "we got competitive enough at this size that you can actually run it yourself." That framing resonates with the developers and enterprises who have been waiting for an open-weight model that can handle serious agentic coding workloads without requiring a closed-source API subscription. You can explore how this stacks up across the full landscape in our AI model comparison section.
Can You Use Mistral Medium 3.5 for Free?
Yes -- in multiple ways. Here is a breakdown of your free and low-cost access options:
- Self-hosting (completely free): The model weights are freely available on Hugging Face at mistralai/Mistral-Medium-3.5-128B. Download, run locally, build commercial products -- no per-token cost, no usage caps, no API key required. The modified MIT license permits all of this. Hardware requirements are discussed below.
- Le Chat free tier: Mistral's Le Chat chat interface offers a free tier with access to Medium 3.5 for basic conversations. Free users do not get Work Mode (the advanced agentic feature), but the core chat capability is accessible without a subscription.
- Le Chat Pro (paid, $14.99/month): Unlocks Vibe CLI with Medium 3.5, Work Mode for multi-step agentic tasks, and higher usage limits. A student discount brings this to $5.99/month with a verified student email.
- Le Chat Team ($24.99/seat/month): Includes all Pro features with team collaboration and higher rate limits for professional deployments.
- API (metered, not free): Via the Mistral API or OpenRouter at $1.50 per million input tokens and $7.50 per million output tokens. Expensive for large-scale use, but cheap for low-volume testing and experimentation.
For context on how this pricing fits into the broader landscape of free and paid AI models, see our AI free tier tracker, which monitors pricing changes across all major providers.
How Do You Self-Host Mistral Medium 3.5?
Self-hosting is the most cost-effective path for teams running high-volume workloads or who need air-gapped, on-premises deployment. According to Mistral AI's official release and the Hugging Face model card, the minimum hardware requirement is four GPUs. At Q4 quantization, the model fits in approximately 70GB of VRAM, which puts it within reach of quad-GPU server setups or high-memory professional workstations.
Mistral publishes vLLM configuration examples directly on the Hugging Face page. The recommended setup for production use is an eight-GPU configuration with tensor parallelism, which maximizes throughput for concurrent requests. The configuration also exposes a configurable reasoning effort setting -- a key feature that lets you dial up or down how much compute the model spends on each request. Short chat replies get fast responses; complex multi-step agentic tasks get deeper reasoning. All from the same weights.
The Hacker News community noted at launch that this is approaching consumer-level territory: at Q4 quantization, you can theoretically run Mistral Medium 3.5 on a Mac Studio with 192GB of unified memory, though this is not a production-recommended path. For a realistic enterprise self-host, expect four to eight A100 or H100 GPUs in a standard server chassis. For teams exploring local AI options across different scales, our open source AI hub covers the full spectrum from 7B models upward.
What Is Le Chat Work Mode and the Vibe CLI Agent?
Alongside the model release, Mistral shipped two new product features that run on Medium 3.5 weights:
- Le Chat Work Mode: A new agentic mode in Le Chat that enables multi-step cross-tool workflows. Work Mode can handle email triage, research synthesis, and multi-step project execution by chaining tool calls together. It was previously available only to enterprise users and shipped broadly to Le Chat Pro subscribers with the Medium 3.5 launch.
- Vibe CLI Remote Agents: Mistral's Vibe is a command-line coding agent that now supports remote agent spawning. You can kick off a Vibe task from the CLI or from Le Chat, let it run asynchronously in the cloud, and come back to a finished pull request on GitHub. The coding agent is powered by Medium 3.5 and replaces Devstral 2 in the Vibe stack.
Both features are described in detail in Mistral's official announcement at mistral.ai. InfoQ's coverage of the release also provides a good technical summary of the deployment architecture behind the remote agent feature. These capabilities position Mistral Medium 3.5 not just as a model but as a platform for agentic workloads -- directly competing with offerings from Anthropic's Claude.ai and OpenAI's ChatGPT with operator tools.
How Does Mistral Medium 3.5 Compare to Open-Source Rivals?
The open-weight model landscape in mid-2026 is genuinely competitive. Medium 3.5 is not the only strong option, and understanding where it sits relative to alternatives matters for choosing the right model for your use case:
- vs. Llama 4 Scout/Maverick: Meta's Llama 4 family (released earlier in 2026) uses a sparse MoE architecture with dramatically higher total parameter counts. Llama 4 Maverick is highly capable but requires more hardware to self-host at full precision. Medium 3.5's dense architecture is generally more predictable in latency.
- vs. Qwen 3.6 Apache (27B): Alibaba's Qwen 3.6 27B ships under Apache 2.0 -- a more permissive license than Medium 3.5's modified MIT. At 27B vs. 128B, it requires far less hardware to self-host. Qwen 3.6 scores approximately 72.4% on SWE-Bench Verified, meaningfully below Medium 3.5's 77.6%. The trade-off is cost vs. capability.
- vs. Mistral Small 4 (22B, Apache 2.0): Mistral's own smaller model, which we covered in a dedicated article, targets edge and lightweight deployment. Small 4 is the right choice when hardware budget is limited; Medium 3.5 is for teams that need maximum capability within the open-weight category.
- vs. DeepSeek V4 (open weights): DeepSeek's open-weight releases remain strong competition, particularly for teams with access to high-memory hardware. DeepSeek models tend to have very strong math and reasoning scores; Medium 3.5 is stronger on agentic and tool-calling benchmarks like tau3-Telecom.
The honest framing is this: if you need a fully free license (Apache 2.0, no attribution requirements), options like Qwen 3.6 and Mistral Small 4 are stronger picks. If you need the best SWE-Bench Verified score in the open-weight category as of mid-2026, and can live with the modified MIT license, Mistral Medium 3.5 wins that comparison. Check out our AI news feed for ongoing updates as competing models release new benchmarks.
🔑 Key Takeaways
- Mistral Medium 3.5 is a 128B dense open-weight model released April 29, 2026 under a modified MIT license, meaning you can download and self-host it for free from Hugging Face with no per-token cost.
- It scores 77.6% on SWE-Bench Verified -- the leading real-world coding benchmark -- outperforming larger sparse models like Qwen3.5 397B A17B, making it the strongest open-weight option for coding agent workloads as of this writing.
- The model consolidates three separate Mistral products (Magistral for reasoning, Devstral 2 for coding, and general instruction-following) into a single set of weights with configurable reasoning effort per request.
- Self-hosting is achievable on as few as four GPUs (approximately 70GB VRAM at Q4 quantization), which brings frontier-class coding performance within reach of enterprise hardware without per-token API costs.
- Free access via Le Chat's free tier lets anyone try the model in chat without a subscription, while the paid Le Chat Pro tier ($14.99/month, or $5.99/month for students) unlocks Work Mode and the Vibe CLI coding agent.
Frequently Asked Questions
Is Mistral Medium 3.5 open source?
Mistral Medium 3.5 is released as open weights under a modified MIT license, meaning the model weights are freely downloadable from Hugging Face. You can self-host, fine-tune, and build commercial products on top of it. It is not fully open source in the Apache 2.0 sense -- the modified MIT license has attribution requirements -- but for most practical purposes it is free to use and deploy.
What license does Mistral Medium 3.5 use?
Mistral Medium 3.5 uses a modified MIT license. This allows free use, modification, and distribution including for commercial applications. The modified MIT license requires attribution to Mistral AI and forbids removing the license text. Derivatives and production deployments are permitted, which makes it substantially more permissive than many proprietary model licenses while stopping short of full Apache 2.0 openness.
How many GPUs do you need to self-host Mistral Medium 3.5?
According to Mistral AI, self-hosting requires as few as four GPUs. At Q4 quantization, the model fits in approximately 70GB of VRAM, which is achievable on a quad-GPU server or high-memory workstation. Mistral's Hugging Face model card includes vLLM configuration examples for eight-GPU setups for higher throughput production workloads. An eight-GPU H100 configuration is recommended for enterprise deployment.
What models did Mistral Medium 3.5 replace?
Mistral Medium 3.5 replaces two previous specialist models: Devstral 2 (Mistral's dedicated coding model, which powered the Vibe CLI) and Magistral (the dedicated reasoning model available in Le Chat). Both are retired into the unified 128B weights. Medium 3.5 handles instruction-following, reasoning, and coding with a single model and configurable reasoning effort per request, simplifying deployment considerably.
How does Mistral Medium 3.5 compare to GPT-4o and Claude?
On SWE-Bench Verified, Mistral Medium 3.5 scores 77.6%, which is competitive but below the reported scores of Anthropic's and OpenAI's top closed-weight frontier models in 2026. Its key advantage is openness: weights are free to download and self-host, while GPT-4o and Claude models are API-only. For teams needing on-premises deployment or wanting to avoid per-token costs at scale, Medium 3.5 offers a strong and practical alternative.