Flutterexperts

Empowering Vision with FlutterExperts' Expertise
How to Integrate Real-Time AI Chatbots in Flutter Using OpenAI, Gemini & Local Models

If you’re looking for the best Flutter app development company for your mobile application then feel free to contact us at — support@flutterdevs.com.


Table of Contents

Introduction

OpenAI, Gemini & Local Self-Hosted Models

OpenAI (Cloud Models)

Google Gemini (Cloud Models)

Local Self-Hosted Models

Why Real-Time AI Chatbots Matter

Architectural Thinking: One UI, Multiple AI Engines

Context, Memory & Conversation Flow

Performance, Cost & Privacy Trade-Offs

Local vs On-Device: A Reality Check

Project Setup & Dependencies

Integrating OpenAI in Flutter

Integrating Gemini in Flutter

Using Local LLMs with Ollama

Final Thoughts



Introduction

Real-time AI chatbots are no longer just a “cool feature” — they’ve become a core expectation in modern apps. Whether it’s customer support, personal assistants, fitness apps, learning platforms, or productivity tools, users want instant, natural, and intelligent interactions. And as Flutter continues to dominate cross-platform development, integrating AI-powered chat experiences has become one of the most in-demand skills for mobile developers today.

But while building a chatbot that “works” is easy, building one that feels alive — streaming responses token-by-token, handling context, switching between multiple AI engines, and even running fully offline using local self hosted models — is the real challenge.

That’s where today’s AI ecosystem shines.

With powerful LLMs like OpenAI GPT-4.1, Google Gemini 2.0, and lightweight local self hosted models running self hosted cloud or desktops via Ollama, LM Studio, or GPT4All, developers now have multiple ways to bring intelligent, real-time conversational AI directly into Flutter apps.

In this article, we’ll explore how to integrate real-time AI chatbots in Flutter using:

  • OpenAI for cloud-powered intelligence
  • Gemini for fast and flexible generative AI

Local Self-hosted LLMs for privacy-first, cost-efficient AI processing

The goal isn’t just to teach you how to make API calls — but to show you how to build a production-ready, low-latency chat experience with streaming, token-by-token UI updates, and clean architecture patterns that scale.

If you’re planning to build an AI assistant, a chatbot UI, or integrate conversational intelligence into your existing product — this guide will help you build it the right way.

OpenAI, Gemini & Local Self-hosted Models

When integrating real-time AI chat into a Flutter app, you can choose from three broad categories of models. Each one brings its own style of intelligence and its own way of working inside your app.

OpenAI (Cloud Models)

OpenAI provides powerful cloud-based language models that deliver high-quality reasoning and natural conversations. These models stream responses smoothly and are ideal when you want polished, human-like chat. They’re easy to integrate and work well for most production apps.

Google Gemini (Cloud Models)

Gemini models from Google are fast, capable, and built for multimodal experiences — meaning they can understand text, images, and more. They fit naturally into the Google ecosystem, making them a strong choice when your app needs both intelligence and context-awareness.

Local Self Hosted Models

Self-hosted local models such as Phi-3, Mistral, or LLaMA can be used without relying on third-party cloud APIs. Using tools like Ollama or LM Studio, these models run locally as a self-hosted service and are accessed from Flutter via a lightweight HTTP interface. This approach improves privacy, eliminates cloud usage costs, and gives you full control over model behavior — making it ideal for internal tools, desktop apps, and privacy-sensitive use cases.

Why Real-Time AI Chatbots Matter

Real-time chat isn’t just about fast replies — it’s about creating a natural, human-like interaction. When users see responses appearing instantly (or token-by-token), they feel like they’re talking to a real assistant, not waiting for a server.

Some key benefits you can highlight:

  • Instant Feedback: Users don’t stare at a spinner.
  • More Human Interaction: Streaming responses feel conversational.
  • Better UX for Long Answers: Content appears as it’s generated.
  • Lower Perceived Latency: Even if the model is slow, streaming makes it feel fast.
  • Smarter App Features: Great for customer support, health apps, learning apps, and productivity tools.

Architectural Thinking: One UI, Multiple AI Engines

A scalable AI chatbot architecture separates UI concerns from AI intelligence.

Rather than tightly coupling Flutter widgets to a specific provider like OpenAI or Gemini, treating AI engines as interchangeable backends allows you to:

  • Switch providers without rewriting UI
  • Add fallbacks when one service fails
  • Experiment with multiple models
  • Mix cloud and self-hosted solutions

This architectural mindset is what turns a prototype into a production-ready system.

Context, Memory & Conversation Flow

A chatbot is only as good as its memory. Effective AI chat systems carefully manage:

  • Short-term context (recent messages)
  • System-level instructions
  • Long-term conversation summaries

Sending the entire conversation history on every request is expensive and unnecessary. A more scalable approach involves keeping recent messages verbatim while summarizing older context — preserving intelligence without inflating costs or latency.

This principle applies equally to OpenAI, Gemini, and self-hosted models.

Performance, Cost & Privacy Trade-Offs

Each AI approach comes with trade-offs:

  • Cloud models offer the highest quality but introduce usage-based costs and data sharing concerns.
  • Self-hosted models reduce cost and improve privacy but require infrastructure and hardware planning.
  • Streaming responses don’t reduce token usage but dramatically improve user experience.

Choosing the right approach depends on your product goals, audience, and constraints — not just model capability.

Local vs On-Device: A Reality Check

There’s an important distinction worth making:

  • Self-hosted local models run as services you control.
  • True on-device models are embedded directly into the mobile app and run fully offline.

While on-device inference is possible using native integrations (such as llama.cpp), it remains complex and is not yet common in mainstream Flutter production apps. Most teams today successfully use self-hosted local models as a practical middle ground.

Being clear about this distinction builds trust with both users and fellow developers.

Implementation

Dependencies Used

dependencies:
openai_dart: ^0.5.0
google_generative_ai: ^0.4.7
http: ^1.2.0

Project Structure

Each provider is isolated in its own service file:

lib/
├── services/
│ ├── openai_service.dart
│ ├── gemini_service.dart
│ ├── ollama_service.dart

OpenAI Service (Cloud-based)

import 'package:openai_dart/openai_dart.dart';

class OpenAIService {
final String clientKey;
late final OpenAIClient client;

OpenAIService(this.clientKey) {
client = OpenAIClient(apiKey: clientKey);
}

Future<String> sendMessage(List<Map<String, String>> messages) async {
final convertedMessages = messages.map((m) {
if (m["role"] == "user") {
return ChatCompletionMessage.user(
content: ChatCompletionUserMessageContent.string(m["content"]!),
);
} else {
return ChatCompletionMessage.assistant(
content: m["content"]!,
);
}
}).toList();

final res = await client.createChatCompletion(
request: CreateChatCompletionRequest(
model: ChatCompletionModel.modelId("gpt-5.1"), // or gpt-4o-mini
messages: convertedMessages,
maxTokens: 600,
),
);

return res.choices.first.message.content ?? "";
}
}

Gemini Service (Google Generative AI)

import 'package:google_generative_ai/google_generative_ai.dart';

class GeminiService {
final String apiKey;
late final GenerativeModel model;

GeminiService(this.apiKey) {
model = GenerativeModel(
model: "gemini-2.0-flash",
apiKey: apiKey,
);
}

Future<String> sendMessage(List<Map<String, String>> messages) async {
final contents = messages.map((m) {
return Content(
m["role"] == "user" ? "user" : "model",
[TextPart(m["content"]!)],
);
}).toList();

final response = await model.generateContent(contents);

return response.text ?? "";
}
}

Ollama Service (Local Self-Hosted LLM)

To use Ollama, you first need to install it on your local machine. You can download the installer from the official website:

https://ollama.com

Once installed, Ollama runs a local server automatically and exposes an HTTP API on:

http://localhost:11434

You can interact with locally hosted models by making simple HTTP requests to this endpoint. For example, you can generate a response from a model using curl:

curl http://localhost:11434/api/generate -d '{
"model": "gpt-oss:20b",
"prompt": "Hi, how are you?"
}'

This request sends a prompt to the locally running model and returns a generated response — all without calling any external cloud API. This makes Ollama a powerful option for offline usage, privacy-sensitive applications, and cost-efficient AI integration.

For flutter you can hosted it on own cloud services

import 'dart:convert';
import 'dart:io';

class OllamaService {
final String baseUrl;
final String model;

OllamaService({
this.baseUrl = 'http://localhost:11434', //sould be cloud hosted url here
this.model = 'gpt-oss:20b',
});

Future<String> sendMessage(List<Map<String, String>> messages) async {
// Convert chat-style messages into a single prompt
final prompt = _buildPrompt(messages);

final client = HttpClient();
final request = await client.postUrl(
Uri.parse('$baseUrl/api/generate'),
);

request.headers.contentType = ContentType.json;

request.write(jsonEncode({
"model": model,
"prompt": prompt,
"stream": false,
}));

final response = await request.close();
final body = await response.transform(utf8.decoder).join();

final json = jsonDecode(body);
return json["response"] ?? "";
}

String _buildPrompt(List<Map<String, String>> messages) {
final buffer = StringBuffer();

for (final msg in messages) {
final role = msg["role"];
final content = msg["content"];

if (role == "user") {
buffer.writeln("User: $content");
} else {
buffer.writeln("Assistant: $content");
}
}

buffer.writeln("Assistant:");
return buffer.toString();
}
}

Final Thoughts

Real-time AI chatbots are not about sending prompts and waiting for responses — they’re about experience. You can build AI-powered chat experiences that feel fast, intelligent, and genuinely conversational.

Whether you choose OpenAI, Gemini, or self-hosted local models, the key is designing for flexibility and scalability from day one. The future of Flutter apps is conversational — and building it well starts with the right foundation.

❤ ❤ Thanks for reading this article ❤❤

If I need to correct something? Let me know in the comments. I would love to improve.

Clap 👏 If this article helps you.


From Our Parent Company Aeologic

Aeologic Technologies is a leading AI-driven digital transformation company in India, helping businesses unlock growth with AI automationIoT solutions, and custom web & mobile app development. We also specialize in AIDC solutions and technical manpower augmentation, offering end-to-end support from strategy and design to deployment and optimization.

Trusted across industries like manufacturing, healthcare, logistics, BFSI, and smart cities, Aeologic combines innovation with deep industry expertise to deliver future-ready solutions.

Feel free to connect with us:
And read more articles from FlutterDevs.com.

FlutterDevs team of Flutter developers to build high-quality and functionally-rich apps. Hire a Flutter developer for your cross-platform Flutter mobile app project hourly or full-time as per your requirement! For any flutter-related queries, you can connect with us on Facebook, GitHub, Twitter, and LinkedIn.

Wewelcome feedback and hope that you share what you’re working on using #FlutterDevs. We truly enjoy seeing how you use Flutter to build beautiful, interactive web experiences.


Leave comment

Your email address will not be published. Required fields are marked with *.