Skip to main content
Faster, Cheaper, Smarter: Google Launches Gemini 3.1 Flash-Lite
Gemini

Google Unveils Gemini 3.1 Flash-Lite

For developers and enterprises, the "Holy Grail" of AI has always been the perfect balance between high intelligence and low cost. On March 3, 2026, Google DeepMind officially delivered on that promise with the launch of Gemini 3.1 Flash-Lite.

This new addition to the Gemini 3 series is Google’s most cost-effective model to date, specifically engineered to handle high-volume developer workloads at scale without sacrificing the reasoning power the Gemini brand is known for.

1. Speed Meets Savings

Gemini 3.1 Flash-Lite is designed for speed. When compared to the previous 2.5 Flash model, it boasts a 2.5X faster Time to First Token and a 45% increase in overall output speed.

But the real headline is the pricing:

  • Input Tokens: $0.25 per 1 million tokens
  • Output Tokens: $1.50 per 1 million tokens

This aggressive pricing makes it significantly cheaper than larger-tier models, allowing companies to deploy AI-driven features,like real-time content moderation or high-volume translation, at a fraction of their previous budget.

2. Adaptive Intelligence: You Control the "Thinking."

One of the standout features of 3.1 Flash-Lite is the integration of Thinking Levels in AI Studio and Vertex AI. Developers can now manually adjust how much "reasoning" the model performs for a specific task.

  • Low Thinking: Ideal for rapid-fire tasks like sorting images or simple text translation.
  • High Thinking: Perfect for complex instruction-following, such as generating dynamic dashboards, creating UI wireframes, or building multi-step SaaS agents.

3. Performance That Punches Above Its Weight

Despite being a "Lite" model, it outperforms its predecessors and competitors in key benchmarks:

  • Arena.ai Leaderboard: Achieved an impressive Elo score of 1432.
  • Reasoning Power: Scored 86.9% on GPQA Diamond and 76.8% on MMMU Pro, surpassing even some larger-tier models from the previous generation.

4. Real-World Use Cases

Early adopters are already putting Flash-Lite to the test:

  • Latitude: Using the model’s strict instruction-following for complex gaming narratives.
  • Cartwheel: Leveraging its speed for multimodal labeling and sorting of vast image libraries.
  • Whering: Utilizing the model for consistent, high-speed item tagging in fashion tech.

Availability: Get Started Today

Gemini 3.1 Flash-Lite is rolling out in preview starting today. Developers can access it via the Gemini API in Google AI Studio, while enterprise customers can find it within Google Cloud’s Vertex AI.

Innovation at scale requires a model that can keep up with your ambition without breaking your bank. With Gemini 3.1 Flash-Lite, the barrier to high-performance AI just got a whole lot lower.