Google search engine
HomeDevelopersReal-Time Object Detection in Flutter Using On-Device ML

Real-Time Object Detection in Flutter Using On-Device ML

Real-Time Object Detection in Flutter Using On-Device ML

Introduction

Who This Guide Is For

What You Will Learn

Why On-Device ML?

SSD MobileNet V1 — Architecture in Brief

The 80 COCO Classes

App Architecture

Conclusion

Reference

Introduction

Imagine pointing your phone at a cluttered desk and watching it instantly draw labelled boxes around your coffee cup, laptop, keys, and phone — no internet connection, no server, no API fees. That experience is now achievable in production Flutter apps, and this guide walks you through every line of code required to make it happen.

We combine three powerful technologies: TensorFlow Lite for on-device neural inference, the BLoC pattern for clean, testable state management, and Flutter’s CustomPainter for pixel-perfect bounding box rendering. The result is a fully offline, privacy-preserving detector that recognises 80 everyday object categories at real-time frame rates.

What You Will Build

A Flutter camera app that: streams live YUV420 frames → converts to RGB → runs SSD MobileNet V1 TFLite inference in a background Dart isolate → applies Non-Maximum Suppression → emits BLoC states → renders bounding boxes via CustomPainter. All 100% on-device, zero network calls.

Who This Guide Is For

This article is aimed at Flutter developers with basic Dart knowledge who want to go beyond simple widgets and build production-quality ML-powered applications. No prior machine learning experience is assumed.

What You Will Learn

• On-device ML fundamentals — how TFLite works, why it is the right choice for mobile

• BLoC architecture for ML — events, states, and the full data pipeline

• YUV420 colour space conversion — the right BT.601 coefficients and why they matter

• Isolate-based inference — keeping the UI at 60fps while the model runs

• CustomPainter bounding boxes — scaling normalised coordinates, drawing corner accents

• Non-Maximum Suppression — eliminating duplicate detections with IoU

• Common bugs and fixes — label off-by-one, wrong normalisation, misaligned boxes

Why On-Device ML?

Before writing a single line of code, it is worth understanding why we run the model on the device rather than calling a cloud vision API. The trade-offs are significant:

Dimension

Cloud API vs On-Device TFLite

Latency

Cloud: 200–800ms round-trip. On-device: 50–120ms on CPU, 15–40ms with GPU delegate

Privacy

Cloud: every frame leaves the device. On-device: no pixel ever transmitted

Cost

Cloud: charged per request (~600/min at 10fps). On-device: zero variable cost

Offline

Cloud: fails without connectivity. On-device: works in a tunnel, airplane, basement

Model size

Cloud: full-size model. On-device: quantized ~4MB model — fits in an app bundle

Accuracy

Cloud: higher (larger models). On-device: very good for 80-class detection at production quality

For most real-time camera use-cases, on-device wins on every dimension that matters to users. The SSD MobileNet V1 quantized model we use here is 4 MB, achieves 22+ mAP on COCO, and runs comfortably in real time on any phone released after 2019.

03 Understanding the Model

SSD MobileNet V1 — Architecture in Brief

Single Shot MultiBox Detector (SSD) is a one-stage object detection architecture. Unlike two-stage detectors (e.g. Faster R-CNN) that first propose regions then classify them, SSD predicts bounding boxes and class probabilities in a single forward pass — making it ideal for real-time mobile applications.

MobileNet V1 is the backbone feature extractor. It replaces standard convolutions with depthwise separable convolutions that reduce computation by 8–9× with minimal accuracy loss — perfectly matched to mobile hardware.

The 80 COCO Classes

The model was trained on the COCO dataset and can recognise 80 everyday categories including:

Category

Examples

Typical Use-Case

People & vehicles

person, car, bus, truck, bicycle

Traffic analysis, pedestrian detection

Animals

dog, cat, bird, elephant, horse

Wildlife monitoring, pet apps

Household objects

chair, couch, bed, dining table, toilet

Home automation, AR furniture

Electronics

laptop, tv, phone, keyboard, mouse

Desk organiser, asset tracking

Kitchen items

bottle, cup, fork, knife, banana, apple

Recipe apps, food logging

Sports & outdoor

sports ball, kite, skateboard, surfboard

Sports tracking, activity apps

App Architecture

The app is built on strict unidirectional data flow. A camera frame enters as a ProcessFrame event, flows through the BLoC, gets processed in a background isolate, and exits as a DetectionRunning state containing bounding boxes ready to paint.

Step 1 — Create the Flutter Project

Terminal

flutter create object_detection_app

cd object_detection_app

mkdir -p assets/models assets/labels

Step 2 — Download the TFLite Model

Download SSD MobileNet V1 quantized from the TensorFlow Lite model zoo. This is the uint8-quantized version — smaller and faster than float32, with negligible accuracy loss.

Terminal

# Download SSD MobileNet V1 quantized (2018 release, 4.3 MB)

wget https://storage.googleapis.com/download.tensorflow.org/models/tflite/

coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip

unzip coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip

# Copy into your Flutter asset folders

cp detect.tflite assets/models/ssd_mobilenet_v1.tflite

cp labelmap.txt assets/labels/labelmap.txt

Step 3 — pubspec.yaml

pubspec.yaml

dependencies:flutter:sdk: flutter# State managementflutter_bloc: ^8.1.6bloc: ^8.1.4equatable: ^2.0.5# Cameracamera: ^0.10.5+9# On-device MLtflite_flutter: ^0.10.4# Permissionspermission_handler: ^11.3.1# UI polishflutter_animate: ^4.5.0google_fonts: ^6.2.1gap: ^3.0.1flutter:uses-material-design: trueassets:- assets/models/- assets/labels/

Step 4 — Platform Permissions

Android — AndroidManifest.xml

android/app/src/main/AndroidManifest.xml

<uses-permission android:name="android.permission.CAMERA" /><uses-permission android:name="android.permission.FLASHLIGHT" /><uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"android:maxSdkVersion="28" /><uses-feature android:name="android.hardware.camera" android:required="true" /><uses-feature android:name="android.hardware.camera.autofocus"android:required="false" />Android – build.gradleandroid/app/build.gradleandroid {compileSdkVersion 34ndkVersion "25.1.8937393" // required by tflite_flutterdefaultConfig {minSdkVersion 21targetSdkVersion 34abiFilters "arm64-v8a", "armeabi-v7a", "x86_64"}}dependencies {// Optional: GPU delegate for ~3x speedupimplementation "org.tensorflow:tensorflow-lite-gpu:2.14.0"}

iOS — Info.plist

ios/Runner/Info.plist

<key>NSCameraUsageDescription</key><string>Used for real-time on-device object detection.</string><key>NSPhotoLibraryAddUsageDescription</key><string>Saves detection snapshots to your photo library.</string>

Data Models

Good architecture starts with well-defined data models. We use Equatable for value equality, which is essential for BLoC’s change detection.

DetectedObject

lib/models/detection_model.dart

// lib/models/detection_model.dartclass DetectedObject extends Equatable {final String label; // "person", "bottle", "car", …final double confidence; // 0.0–1.0final Rect boundingBox; // normalised [0.0, 1.0] coordinatesfinal Color color; // assigned per class indexconst DetectedObject({required this.label,required this.confidence,required this.boundingBox,required this.color,});String get confidencePercent =>"${(confidence * 100).toStringAsFixed(1)}%";@overrideList<Object?> get props => [label, confidence, boundingBox];}

DetectionResult & DetectionConfig

lib/models/detection_model.dart

// Wraps a full frame's detections with timing metadataclass DetectionResult extends Equatable {final List<DetectedObject> objects;final Duration inferenceTime;final DateTime timestamp;int get objectCount => objects.length;String get inferenceTimeMs => "${inferenceTime.inMilliseconds}ms";static DetectionResult empty() => DetectionResult(objects: const [], inferenceTime: Duration.zero,timestamp: DateTime.now(),);}// User-configurable thresholdsclass DetectionConfig {final double confidenceThreshold; // default 0.5final double iouThreshold; // default 0.5final int maxDetections; // default 10final int inputSize; // default 300const DetectionConfig({this.confidenceThreshold = 0.5,this.iouThreshold = 0.5,this.maxDetections = 10,this.inputSize = 300,});}

The Detection BLoC

Initialization — Loading the Model

The InitializeDetector event drives a sequential loading sequence: TFLite model → labels → camera. Each step emits a DetectionLoading state with a descriptive message so the UI can show progress.

Future<void> _onInitialize(InitializeDetector event,Emitter<DetectionState> emit,) async {emit(const DetectionLoading(message: "Loading ML model…"));try {// Load interpreter with 4 threads for faster CPU inference_interpreter = await Interpreter.fromAsset("assets/models/ssd_mobilenet_v1.tflite",options: InterpreterOptions()..threads = 4,);emit(const DetectionLoading(message: "Loading labels…"));_labels = await LabelUtils.loadLabels("assets/labels/labelmap.txt");emit(const DetectionLoading(message: "Setting up camera…"));_cameras = await availableCameras();await _initCamera(_cameras[0]);emit(DetectionRunning(cameraController: _cameraController!,result: DetectionResult.empty(),config: _config,));add(const StartDetection()); // auto-start} catch (e) {emit(DetectionError(message: "Failed to initialize: $e", error: e));}}

Frame Processing — The Inference Pipeline

This is the heart of the app. When a camera frame arrives, we check if we’re already processing one (the _isDetecting guard). If not, we dispatch the frame to a background isolate via compute() so the UI thread is never blocked.

Future<void> _onProcessFrame(ProcessFrame event,Emitter<DetectionState> emit,) async {if (_interpreter == null) return;if (_isDetecting) return; // skip this frame – previous still processingif (state is! DetectionRunning) return;_isDetecting = true;final s = state as DetectionRunning;final stopwatch = Stopwatch()..start();try {// Run inference off the main threadfinal result = await compute(_runInference,_InferenceInput(cameraImage: event.image,interpreterAddress: _interpreter!.address,inputSize: _config.inputSize,confidenceThreshold: _config.confidenceThreshold,labels: _labels,),);stopwatch.stop();_updateFps();if (!isClosed) {emit(s.copyWith(result: DetectionResult(objects: result,inferenceTime: stopwatch.elapsed,timestamp: DateTime.now(),),fps: _currentFps,));}} catch (e) {debugPrint("Inference error: $e");} finally {_isDetecting = false;}}

Isolate Inference — The Technical Core

The _runInference function runs inside a Dart isolate spawned by compute(). It cannot capture variables from the enclosing scope, so we pass everything it needs through the _InferenceInput data class. The interpreter is reconstructed from a memory address rather than passing the object directly.

Step 1 — Detect Model Type

final interpreter = Interpreter.fromAddress(input.interpreterAddress);// Inspect the input tensor to detect uint8 (quantized) vs float32final isQuantized =interpreter.getInputTensor(0).type == TensorType.uint8;// This matters enormously:// – uint8 model expects raw pixel bytes: [0, 255]// – float32 model expects normalised: [0.0, 1.0]// Sending uint8 data to a float32 model → completely wrong outputs

Step 2 — Build Input Tensor

// Convert camera frame: YUV420 -> RGB uint8

final rgbBytes = ImageUtils.convertYUV420ToRGB(input.cameraImage, input.inputSize);dynamic inputTensor;if (isQuantized) {// Quantized: feed raw uint8 bytes directlyinputTensor = rgbBytes.reshape([1, input.inputSize, input.inputSize, 3]);} else {// Float32: normalise to [0.0, 1.0]final floatPixels =Float32List(input.inputSize * input.inputSize * 3);for (int i = 0; i < rgbBytes.length; i++) {floatPixels[i] = rgbBytes[i] / 255.0;}inputTensor = floatPixels.reshape([1, input.inputSize, input.inputSize, 3]);}

Step 3 — Run the Model

// Query actual tensor shape from the model (do not hardcode “10”)

final numDetections = interpreter.getOutputTensor(0).shape[1];final outputBoxes = List.generate(1, (_) =>List.generate(numDetections, (_) => List.filled(4, 0.0)));final outputClasses = List.generate(1, (_) =>List.filled(numDetections, 0.0));final outputScores = List.generate(1, (_) =>List.filled(numDetections, 0.0));final outputCount = List.filled(1, 0.0);interpreter.runForMultipleInputs([inputTensor], {0: outputBoxes,1: outputClasses,2: outputScores,3: outputCount,});

Step 4 — Parse Detections with Label Fix

final count = outputCount[0].toInt().clamp(0, numDetections);for (int i = 0; i < count; i++) {final score = outputScores[0][i];if (score < input.confidenceThreshold) continue;final rawClassIndex = outputClasses[0][i].toInt();// Safe label lookup – handle ??? dummy entries gracefullyString label;if (rawClassIndex < input.labels.length) {label = input.labels[rawClassIndex];if (label == "???" && rawClassIndex + 1 < input.labels.length) {label = input.labels[rawClassIndex + 1]; // shift past dummy}} else {label = "unknown";}// SSD box order: [top, left, bottom, right] – NOT [x, y, w, h]final box = outputBoxes[0][i];final rect = Rect.fromLTRB(box[1].clamp(0.0, 1.0), // leftbox[0].clamp(0.0, 1.0), // topbox[3].clamp(0.0, 1.0), // rightbox[2].clamp(0.0, 1.0), // bottom);detections.add(DetectedObject(label: label,confidence: score,boundingBox: rect,color: colors[rawClassIndex % colors.length],));}return NMSUtils.applyNMS(detections, 0.5);

09 YUV420 to RGB Conversion

The camera delivers frames in YUV420 format — a colour encoding where Y is luminance and U/V are chroma channels sampled at half resolution. TFLite needs RGB. Getting this conversion wrong is the most common cause of garbage detections.

Why the Coefficients Matter

The conversion from YUV to RGB uses the BT.601 full-range standard. Using incorrect coefficients produces a colour-shifted image that looks normal to human eyes but confuses the neural network significantly.

// CORRECT: BT.601 full-range (what this guide uses)R = Y + 1.402 × (V − 128)G = Y − 0.34414 × (U−128) − 0.71414 × (V − 128)B = Y + 1.772 × (U−128)// WRONG: old incorrect coefficients seen in many tutorials// R = Y + 1.370705 × Vd ← wrong// G = Y − 0.698001 × Vd − 0.337633 × Ud ← wrong// B = Y + 1.732446 × Ud ← wrong// The error is ~2–5% per channel – invisible to humans but// enough to drop detection accuracy by 10–20 percentage points

Handling NV12 and I420 Plane Layouts

On Android, the camera typically delivers I420 (three separate planes with uvPixelStride = 1). On iOS it delivers NV12/NV21 (interleaved UV, uvPixelStride = 2). The uvPixelStride field handles this transparently:

final int uvPixelStride = uPlane.bytesPerPixel ?? 1;// uvIndex calculation handles both I420 and NV12/NV21:final int uvIndex =uvRow * uvRowStride + uvCol * uvPixelStride;// For I420: uvPixelStride=1, U and V are separate planes// For NV12: uvPixelStride=2, U and V interleaved (UVUVUV…)// For NV21: uvPixelStride=2, V and U interleaved (VUVUVU…)// (swap uPlane/vPlane references for NV21)// Always mask with 0xFF to handle signed byte values on Android:final int yVal = yBytes[yIndex] & 0xFF;final int uVal = (uBytes[uvIndex] & 0xFF) – 128;final int vVal = (vBytes[uvIndex] & 0xFF) – 128;

010 Non-Maximum Suppression

SSD produces multiple overlapping boxes for the same object. Non-Maximum Suppression (NMS) is the post-processing step that reduces these to a single best box per object. Without NMS, you would see five boxes around every coffee cup.

CustomPainter — Bounding Boxes

The BoundingBoxPainter is a CustomPainter that overlays bounding boxes directly on the camera preview. The model outputs normalised coordinates in the range [0, 1]. We scale these to canvas pixels in the paint() method — no pre-processing needed.

Coordinate Scaling

Rect _scaleRect(Rect normalised, Size canvasSize) {// normalised: left/top/right/bottom all in [0, 1]double left = normalised.left * canvasSize.width;double top = normalised.top * canvasSize.height;double right = normalised.right * canvasSize.width;double bottom = normalised.bottom * canvasSize.height;// Mirror horizontally for front cameraif (isFrontCamera) {final tmp = left;left = canvasSize.width – right;right = canvasSize.width – tmp;}return Rect.fromLTRB(left.clamp(0, canvasSize.width),top.clamp(0, canvasSize.height),right.clamp(0, canvasSize.width),bottom.clamp(0, canvasSize.height),);}

Drawing Corner Accents

Instead of a plain rectangle, we draw corner brackets. This gives the UI a professional AR feel and keeps the interior of the box visible:

void _drawCorners(Canvas canvas, Rect rect, Color color) {const len = 14.0; // corner bracket length in pixelsfinal paint = Paint()..color = color..style = PaintingStyle.stroke..strokeWidth = 3.5..strokeCap = StrokeCap.round;// Top-left cornercanvas.drawLine(rect.topLeft,rect.topLeft + const Offset(len, 0), paint);canvas.drawLine(rect.topLeft,rect.topLeft + const Offset(0, len), paint);// Top-right cornercanvas.drawLine(rect.topRight,rect.topRight + const Offset(-len, 0), paint);canvas.drawLine(rect.topRight,rect.topRight + const Offset(0, len), paint);// … (repeat for bottomLeft and bottomRight)}

Common Bugs and How to Fix Them

These are the bugs that virtually every developer hits when building their first TFLite object detection app:

Conclusion

Real-time on-device object detection is no longer a research project. It is a production-ready Flutter feature you can ship today in an app that fits in an 8 MB package, runs offline, never transmits a single frame to a server, and detects 80 categories of everyday objects in real time.

The combination of TensorFlow Lite for neural inference, BLoC for predictable state management, and Dart isolates for background processing gives you a system that is fast, testable, and maintainable. The five bugs covered in this guide — label off-by-one, wrong YUV coefficients, missing normalisation, wrong box order, and main-thread inference — are the exact issues you’ll encounter, now with clear solutions.

The architecture is deliberately model-agnostic. Swapping SSD MobileNet for YOLOv8 or EfficientDet only requires changing the inference function — the BLoC events, states, UI, and CustomPainter remain unchanged. Build once, swap models freely.

References

Object detection and tracking | ML Kit | Google for DevelopersML Kit's on-device API enables detection and tracking of objects within images or live camera feeds, working…developers.google.com

How do I do flutter object detection?How can I detect an object in the image coming from rtsp using flutter tensorflow? I tried to connect the rtsp…discuss.ai.google.dev

https://www.dhiwise.com/post/implementing-flutter-real-time-object-detection-with-tensorflow-lite

Feel free to connect with us:And read more articles from FlutterDevs.com.

FlutterDevs team of Flutter developers to build high-quality and functionally-rich apps. Hire a Flutter developer for your cross-platform Flutter mobile app project hourly or full-time as per your requirement! For any flutter-related queries, you can connect with us on Facebook, GitHub, Twitter, and LinkedIn.

We welcome feedback and hope that you share what you’re working on using #FlutterDevs. We truly enjoy seeing how you use Flutter to build beautiful, interactive web experiences.


Need help building production-grade Flutter apps? FlutterDevs helps teams ship faster with solid architecture, better UX, and practical AI features. Reach us at support@flutterdevs.com.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments