How to Integrate Real-Time AI Chatbots in Flutter Using OpenAI, Gemini & Local Models
If you’re looking for the best Flutter app development company for your mobile application then feel free to contact us at — support@flutterdevs.com.
Table of Contents
OpenAI, Gemini & Local Self-Hosted Models
Why Real-Time AI Chatbots Matter
Architectural Thinking: One UI, Multiple AI Engines
Context, Memory & Conversation Flow
Performance, Cost & Privacy Trade-Offs
Local vs On-Device: A Reality Check

Introduction
Real-time AI chatbots are no longer just a “cool feature” — they’ve become a core expectation in modern apps. Whether it’s customer support, personal assistants, fitness apps, learning platforms, or productivity tools, users want instant, natural, and intelligent interactions. And as Flutter continues to dominate cross-platform development, integrating AI-powered chat experiences has become one of the most in-demand skills for mobile developers today.
But while building a chatbot that “works” is easy, building one that feels alive — streaming responses token-by-token, handling context, switching between multiple AI engines, and even running fully offline using local self hosted models — is the real challenge.
That’s where today’s AI ecosystem shines.
With powerful LLMs like OpenAI GPT-4.1, Google Gemini 2.0, and lightweight local self hosted models running self hosted cloud or desktops via Ollama, LM Studio, or GPT4All, developers now have multiple ways to bring intelligent, real-time conversational AI directly into Flutter apps.
In this article, we’ll explore how to integrate real-time AI chatbots in Flutter using:
- OpenAI for cloud-powered intelligence
- Gemini for fast and flexible generative AI
Local Self-hosted LLMs for privacy-first, cost-efficient AI processing
The goal isn’t just to teach you how to make API calls — but to show you how to build a production-ready, low-latency chat experience with streaming, token-by-token UI updates, and clean architecture patterns that scale.
If you’re planning to build an AI assistant, a chatbot UI, or integrate conversational intelligence into your existing product — this guide will help you build it the right way.
OpenAI, Gemini & Local Self-hosted Models
When integrating real-time AI chat into a Flutter app, you can choose from three broad categories of models. Each one brings its own style of intelligence and its own way of working inside your app.
OpenAI (Cloud Models)
OpenAI provides powerful cloud-based language models that deliver high-quality reasoning and natural conversations. These models stream responses smoothly and are ideal when you want polished, human-like chat. They’re easy to integrate and work well for most production apps.
Google Gemini (Cloud Models)
Gemini models from Google are fast, capable, and built for multimodal experiences — meaning they can understand text, images, and more. They fit naturally into the Google ecosystem, making them a strong choice when your app needs both intelligence and context-awareness.
Local Self Hosted Models
Self-hosted local models such as Phi-3, Mistral, or LLaMA can be used without relying on third-party cloud APIs. Using tools like Ollama or LM Studio, these models run locally as a self-hosted service and are accessed from Flutter via a lightweight HTTP interface. This approach improves privacy, eliminates cloud usage costs, and gives you full control over model behavior — making it ideal for internal tools, desktop apps, and privacy-sensitive use cases.
Why Real-Time AI Chatbots Matter
Real-time chat isn’t just about fast replies — it’s about creating a natural, human-like interaction. When users see responses appearing instantly (or token-by-token), they feel like they’re talking to a real assistant, not waiting for a server.
Some key benefits you can highlight:
- Instant Feedback: Users don’t stare at a spinner.
- More Human Interaction: Streaming responses feel conversational.
- Better UX for Long Answers: Content appears as it’s generated.
- Lower Perceived Latency: Even if the model is slow, streaming makes it feel fast.
- Smarter App Features: Great for customer support, health apps, learning apps, and productivity tools.
Architectural Thinking: One UI, Multiple AI Engines
A scalable AI chatbot architecture separates UI concerns from AI intelligence.
Rather than tightly coupling Flutter widgets to a specific provider like OpenAI or Gemini, treating AI engines as interchangeable backends allows you to:
- Switch providers without rewriting UI
- Add fallbacks when one service fails
- Experiment with multiple models
- Mix cloud and self-hosted solutions
This architectural mindset is what turns a prototype into a production-ready system.
Context, Memory & Conversation Flow
A chatbot is only as good as its memory. Effective AI chat systems carefully manage:
- Short-term context (recent messages)
- System-level instructions
- Long-term conversation summaries
Sending the entire conversation history on every request is expensive and unnecessary. A more scalable approach involves keeping recent messages verbatim while summarizing older context — preserving intelligence without inflating costs or latency.
This principle applies equally to OpenAI, Gemini, and self-hosted models.
Performance, Cost & Privacy Trade-Offs
Each AI approach comes with trade-offs:
- Cloud models offer the highest quality but introduce usage-based costs and data sharing concerns.
- Self-hosted models reduce cost and improve privacy but require infrastructure and hardware planning.
- Streaming responses don’t reduce token usage but dramatically improve user experience.
Choosing the right approach depends on your product goals, audience, and constraints — not just model capability.
Local vs On-Device: A Reality Check
There’s an important distinction worth making:
- Self-hosted local models run as services you control.
- True on-device models are embedded directly into the mobile app and run fully offline.
While on-device inference is possible using native integrations (such as llama.cpp), it remains complex and is not yet common in mainstream Flutter production apps. Most teams today successfully use self-hosted local models as a practical middle ground.
Being clear about this distinction builds trust with both users and fellow developers.
Implementation
Dependencies Used
dependencies:
openai_dart: ^0.5.0
google_generative_ai: ^0.4.7
http: ^1.2.0
Project Structure
Each provider is isolated in its own service file:
lib/
├── services/
│ ├── openai_service.dart
│ ├── gemini_service.dart
│ ├── ollama_service.dart
OpenAI Service (Cloud-based)
import 'package:openai_dart/openai_dart.dart';
class OpenAIService {
final String clientKey;
late final OpenAIClient client;
OpenAIService(this.clientKey) {
client = OpenAIClient(apiKey: clientKey);
}
Future<String> sendMessage(List<Map<String, String>> messages) async {
final convertedMessages = messages.map((m) {
if (m["role"] == "user") {
return ChatCompletionMessage.user(
content: ChatCompletionUserMessageContent.string(m["content"]!),
);
} else {
return ChatCompletionMessage.assistant(
content: m["content"]!,
);
}
}).toList();
final res = await client.createChatCompletion(
request: CreateChatCompletionRequest(
model: ChatCompletionModel.modelId("gpt-5.1"), // or gpt-4o-mini
messages: convertedMessages,
maxTokens: 600,
),
);
return res.choices.first.message.content ?? "";
}
}
Gemini Service (Google Generative AI)
import 'package:google_generative_ai/google_generative_ai.dart';
class GeminiService {
final String apiKey;
late final GenerativeModel model;
GeminiService(this.apiKey) {
model = GenerativeModel(
model: "gemini-2.0-flash",
apiKey: apiKey,
);
}
Future<String> sendMessage(List<Map<String, String>> messages) async {
final contents = messages.map((m) {
return Content(
m["role"] == "user" ? "user" : "model",
[TextPart(m["content"]!)],
);
}).toList();
final response = await model.generateContent(contents);
return response.text ?? "";
}
}
Ollama Service (Local Self-Hosted LLM)
To use Ollama, you first need to install it on your local machine. You can download the installer from the official website:
Once installed, Ollama runs a local server automatically and exposes an HTTP API on:
http://localhost:11434
You can interact with locally hosted models by making simple HTTP requests to this endpoint. For example, you can generate a response from a model using curl:
curl http://localhost:11434/api/generate -d '{
"model": "gpt-oss:20b",
"prompt": "Hi, how are you?"
}'
This request sends a prompt to the locally running model and returns a generated response — all without calling any external cloud API. This makes Ollama a powerful option for offline usage, privacy-sensitive applications, and cost-efficient AI integration.
For flutter you can hosted it on own cloud services
import 'dart:convert';
import 'dart:io';
class OllamaService {
final String baseUrl;
final String model;
OllamaService({
this.baseUrl = 'http://localhost:11434', //sould be cloud hosted url here
this.model = 'gpt-oss:20b',
});
Future<String> sendMessage(List<Map<String, String>> messages) async {
// Convert chat-style messages into a single prompt
final prompt = _buildPrompt(messages);
final client = HttpClient();
final request = await client.postUrl(
Uri.parse('$baseUrl/api/generate'),
);
request.headers.contentType = ContentType.json;
request.write(jsonEncode({
"model": model,
"prompt": prompt,
"stream": false,
}));
final response = await request.close();
final body = await response.transform(utf8.decoder).join();
final json = jsonDecode(body);
return json["response"] ?? "";
}
String _buildPrompt(List<Map<String, String>> messages) {
final buffer = StringBuffer();
for (final msg in messages) {
final role = msg["role"];
final content = msg["content"];
if (role == "user") {
buffer.writeln("User: $content");
} else {
buffer.writeln("Assistant: $content");
}
}
buffer.writeln("Assistant:");
return buffer.toString();
}
}
Final Thoughts
Real-time AI chatbots are not about sending prompts and waiting for responses — they’re about experience. You can build AI-powered chat experiences that feel fast, intelligent, and genuinely conversational.
Whether you choose OpenAI, Gemini, or self-hosted local models, the key is designing for flexibility and scalability from day one. The future of Flutter apps is conversational — and building it well starts with the right foundation.
❤ ❤ Thanks for reading this article ❤❤
If I need to correct something? Let me know in the comments. I would love to improve.
Clap 👏 If this article helps you.
From Our Parent Company Aeologic
Aeologic Technologies is a leading AI-driven digital transformation company in India, helping businesses unlock growth with AI automation, IoT solutions, and custom web & mobile app development. We also specialize in AIDC solutions and technical manpower augmentation, offering end-to-end support from strategy and design to deployment and optimization.
Trusted across industries like manufacturing, healthcare, logistics, BFSI, and smart cities, Aeologic combines innovation with deep industry expertise to deliver future-ready solutions.
Feel free to connect with us:
And read more articles from FlutterDevs.com.
FlutterDevs team of Flutter developers to build high-quality and functionally-rich apps. Hire a Flutter developer for your cross-platform Flutter mobile app project hourly or full-time as per your requirement! For any flutter-related queries, you can connect with us on Facebook, GitHub, Twitter, and LinkedIn.
Wewelcome feedback and hope that you share what you’re working on using #FlutterDevs. We truly enjoy seeing how you use Flutter to build beautiful, interactive web experiences.

