Real-Time Object Detection in Flutter Using On-Device ML
Introduction
Who This Guide Is For
What You Will Learn
Why On-Device ML?
SSD MobileNet V1 — Architecture in Brief
The 80 COCO Classes
App Architecture
Conclusion
Reference
Introduction
Imagine pointing your phone at a cluttered desk and watching it instantly draw labelled boxes around your coffee cup, laptop, keys, and phone — no internet connection, no server, no API fees. That experience is now achievable in production Flutter apps, and this guide walks you through every line of code required to make it happen.
We combine three powerful technologies: TensorFlow Lite for on-device neural inference, the BLoC pattern for clean, testable state management, and Flutter’s CustomPainter for pixel-perfect bounding box rendering. The result is a fully offline, privacy-preserving detector that recognises 80 everyday object categories at real-time frame rates.
What You Will Build
A Flutter camera app that: streams live YUV420 frames → converts to RGB → runs SSD MobileNet V1 TFLite inference in a background Dart isolate → applies Non-Maximum Suppression → emits BLoC states → renders bounding boxes via CustomPainter. All 100% on-device, zero network calls.
Who This Guide Is For
This article is aimed at Flutter developers with basic Dart knowledge who want to go beyond simple widgets and build production-quality ML-powered applications. No prior machine learning experience is assumed.
What You Will Learn
• On-device ML fundamentals — how TFLite works, why it is the right choice for mobile
• BLoC architecture for ML — events, states, and the full data pipeline
• YUV420 colour space conversion — the right BT.601 coefficients and why they matter
• Isolate-based inference — keeping the UI at 60fps while the model runs
• CustomPainter bounding boxes — scaling normalised coordinates, drawing corner accents
• Non-Maximum Suppression — eliminating duplicate detections with IoU
• Common bugs and fixes — label off-by-one, wrong normalisation, misaligned boxes
Why On-Device ML?
Before writing a single line of code, it is worth understanding why we run the model on the device rather than calling a cloud vision API. The trade-offs are significant:
Dimension
Cloud API vs On-Device TFLite
Latency
Cloud: 200–800ms round-trip. On-device: 50–120ms on CPU, 15–40ms with GPU delegate
Privacy
Cloud: every frame leaves the device. On-device: no pixel ever transmitted
Cost
Cloud: charged per request (~600/min at 10fps). On-device: zero variable cost
Offline
Cloud: fails without connectivity. On-device: works in a tunnel, airplane, basement
Model size
Cloud: full-size model. On-device: quantized ~4MB model — fits in an app bundle
Accuracy
Cloud: higher (larger models). On-device: very good for 80-class detection at production quality
For most real-time camera use-cases, on-device wins on every dimension that matters to users. The SSD MobileNet V1 quantized model we use here is 4 MB, achieves 22+ mAP on COCO, and runs comfortably in real time on any phone released after 2019.
03 Understanding the Model
SSD MobileNet V1 — Architecture in Brief
Single Shot MultiBox Detector (SSD) is a one-stage object detection architecture. Unlike two-stage detectors (e.g. Faster R-CNN) that first propose regions then classify them, SSD predicts bounding boxes and class probabilities in a single forward pass — making it ideal for real-time mobile applications.
MobileNet V1 is the backbone feature extractor. It replaces standard convolutions with depthwise separable convolutions that reduce computation by 8–9× with minimal accuracy loss — perfectly matched to mobile hardware.
The 80 COCO Classes
The model was trained on the COCO dataset and can recognise 80 everyday categories including:
Category
Examples
Typical Use-Case
People & vehicles
person, car, bus, truck, bicycle
Traffic analysis, pedestrian detection
Animals
dog, cat, bird, elephant, horse
Wildlife monitoring, pet apps
Household objects
chair, couch, bed, dining table, toilet
Home automation, AR furniture
Electronics
laptop, tv, phone, keyboard, mouse
Desk organiser, asset tracking
Kitchen items
bottle, cup, fork, knife, banana, apple
Recipe apps, food logging
Sports & outdoor
sports ball, kite, skateboard, surfboard
Sports tracking, activity apps
App Architecture
The app is built on strict unidirectional data flow. A camera frame enters as a ProcessFrame event, flows through the BLoC, gets processed in a background isolate, and exits as a DetectionRunning state containing bounding boxes ready to paint.
Step 1 — Create the Flutter Project
Terminal
flutter create object_detection_app
cd object_detection_app
mkdir -p assets/models assets/labels
Step 2 — Download the TFLite Model
Download SSD MobileNet V1 quantized from the TensorFlow Lite model zoo. This is the uint8-quantized version — smaller and faster than float32, with negligible accuracy loss.
Terminal
# Download SSD MobileNet V1 quantized (2018 release, 4.3 MB)
wget https://storage.googleapis.com/download.tensorflow.org/models/tflite/
coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
unzip coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
# Copy into your Flutter asset folders
cp detect.tflite assets/models/ssd_mobilenet_v1.tflite
cp labelmap.txt assets/labels/labelmap.txt
Step 3 — pubspec.yaml
pubspec.yaml
dependencies:flutter:sdk: flutter# State managementflutter_bloc: ^8.1.6bloc: ^8.1.4equatable: ^2.0.5# Cameracamera: ^0.10.5+9# On-device MLtflite_flutter: ^0.10.4# Permissionspermission_handler: ^11.3.1# UI polishflutter_animate: ^4.5.0google_fonts: ^6.2.1gap: ^3.0.1flutter:uses-material-design: trueassets:- assets/models/- assets/labels/
Step 4 — Platform Permissions
Android — AndroidManifest.xml
android/app/src/main/AndroidManifest.xml
<uses-permission android:name="android.permission.CAMERA" /><uses-permission android:name="android.permission.FLASHLIGHT" /><uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"android:maxSdkVersion="28" /><uses-feature android:name="android.hardware.camera" android:required="true" /><uses-feature android:name="android.hardware.camera.autofocus"android:required="false" />Android – build.gradleandroid/app/build.gradleandroid {compileSdkVersion 34ndkVersion "25.1.8937393" // required by tflite_flutterdefaultConfig {minSdkVersion 21targetSdkVersion 34abiFilters "arm64-v8a", "armeabi-v7a", "x86_64"}}dependencies {// Optional: GPU delegate for ~3x speedupimplementation "org.tensorflow:tensorflow-lite-gpu:2.14.0"}
iOS — Info.plist
ios/Runner/Info.plist
<key>NSCameraUsageDescription</key><string>Used for real-time on-device object detection.</string><key>NSPhotoLibraryAddUsageDescription</key><string>Saves detection snapshots to your photo library.</string>
Data Models
Good architecture starts with well-defined data models. We use Equatable for value equality, which is essential for BLoC’s change detection.
DetectedObject
lib/models/detection_model.dart
// lib/models/detection_model.dartclass DetectedObject extends Equatable {final String label; // "person", "bottle", "car", …final double confidence; // 0.0–1.0final Rect boundingBox; // normalised [0.0, 1.0] coordinatesfinal Color color; // assigned per class indexconst DetectedObject({required this.label,required this.confidence,required this.boundingBox,required this.color,});String get confidencePercent =>"${(confidence * 100).toStringAsFixed(1)}%";@overrideList<Object?> get props => [label, confidence, boundingBox];}
DetectionResult & DetectionConfig
lib/models/detection_model.dart
// Wraps a full frame's detections with timing metadataclass DetectionResult extends Equatable {final List<DetectedObject> objects;final Duration inferenceTime;final DateTime timestamp;int get objectCount => objects.length;String get inferenceTimeMs => "${inferenceTime.inMilliseconds}ms";static DetectionResult empty() => DetectionResult(objects: const [], inferenceTime: Duration.zero,timestamp: DateTime.now(),);}// User-configurable thresholdsclass DetectionConfig {final double confidenceThreshold; // default 0.5final double iouThreshold; // default 0.5final int maxDetections; // default 10final int inputSize; // default 300const DetectionConfig({this.confidenceThreshold = 0.5,this.iouThreshold = 0.5,this.maxDetections = 10,this.inputSize = 300,});}
The Detection BLoC
Initialization — Loading the Model
The InitializeDetector event drives a sequential loading sequence: TFLite model → labels → camera. Each step emits a DetectionLoading state with a descriptive message so the UI can show progress.
Future<void> _onInitialize(InitializeDetector event,Emitter<DetectionState> emit,) async {emit(const DetectionLoading(message: "Loading ML model…"));try {// Load interpreter with 4 threads for faster CPU inference_interpreter = await Interpreter.fromAsset("assets/models/ssd_mobilenet_v1.tflite",options: InterpreterOptions()..threads = 4,);emit(const DetectionLoading(message: "Loading labels…"));_labels = await LabelUtils.loadLabels("assets/labels/labelmap.txt");emit(const DetectionLoading(message: "Setting up camera…"));_cameras = await availableCameras();await _initCamera(_cameras[0]);emit(DetectionRunning(cameraController: _cameraController!,result: DetectionResult.empty(),config: _config,));add(const StartDetection()); // auto-start} catch (e) {emit(DetectionError(message: "Failed to initialize: $e", error: e));}}
Frame Processing — The Inference Pipeline
This is the heart of the app. When a camera frame arrives, we check if we’re already processing one (the _isDetecting guard). If not, we dispatch the frame to a background isolate via compute() so the UI thread is never blocked.
Future<void> _onProcessFrame(ProcessFrame event,Emitter<DetectionState> emit,) async {if (_interpreter == null) return;if (_isDetecting) return; // skip this frame – previous still processingif (state is! DetectionRunning) return;_isDetecting = true;final s = state as DetectionRunning;final stopwatch = Stopwatch()..start();try {// Run inference off the main threadfinal result = await compute(_runInference,_InferenceInput(cameraImage: event.image,interpreterAddress: _interpreter!.address,inputSize: _config.inputSize,confidenceThreshold: _config.confidenceThreshold,labels: _labels,),);stopwatch.stop();_updateFps();if (!isClosed) {emit(s.copyWith(result: DetectionResult(objects: result,inferenceTime: stopwatch.elapsed,timestamp: DateTime.now(),),fps: _currentFps,));}} catch (e) {debugPrint("Inference error: $e");} finally {_isDetecting = false;}}
Isolate Inference — The Technical Core
The _runInference function runs inside a Dart isolate spawned by compute(). It cannot capture variables from the enclosing scope, so we pass everything it needs through the _InferenceInput data class. The interpreter is reconstructed from a memory address rather than passing the object directly.
Step 1 — Detect Model Type
final interpreter = Interpreter.fromAddress(input.interpreterAddress);// Inspect the input tensor to detect uint8 (quantized) vs float32final isQuantized =interpreter.getInputTensor(0).type == TensorType.uint8;// This matters enormously:// – uint8 model expects raw pixel bytes: [0, 255]// – float32 model expects normalised: [0.0, 1.0]// Sending uint8 data to a float32 model → completely wrong outputs
Step 2 — Build Input Tensor
// Convert camera frame: YUV420 -> RGB uint8
final rgbBytes = ImageUtils.convertYUV420ToRGB(input.cameraImage, input.inputSize);dynamic inputTensor;if (isQuantized) {// Quantized: feed raw uint8 bytes directlyinputTensor = rgbBytes.reshape([1, input.inputSize, input.inputSize, 3]);} else {// Float32: normalise to [0.0, 1.0]final floatPixels =Float32List(input.inputSize * input.inputSize * 3);for (int i = 0; i < rgbBytes.length; i++) {floatPixels[i] = rgbBytes[i] / 255.0;}inputTensor = floatPixels.reshape([1, input.inputSize, input.inputSize, 3]);}
Step 3 — Run the Model
// Query actual tensor shape from the model (do not hardcode “10”)
final numDetections = interpreter.getOutputTensor(0).shape[1];final outputBoxes = List.generate(1, (_) =>List.generate(numDetections, (_) => List.filled(4, 0.0)));final outputClasses = List.generate(1, (_) =>List.filled(numDetections, 0.0));final outputScores = List.generate(1, (_) =>List.filled(numDetections, 0.0));final outputCount = List.filled(1, 0.0);interpreter.runForMultipleInputs([inputTensor], {0: outputBoxes,1: outputClasses,2: outputScores,3: outputCount,});
Step 4 — Parse Detections with Label Fix
final count = outputCount[0].toInt().clamp(0, numDetections);for (int i = 0; i < count; i++) {final score = outputScores[0][i];if (score < input.confidenceThreshold) continue;final rawClassIndex = outputClasses[0][i].toInt();// Safe label lookup – handle ??? dummy entries gracefullyString label;if (rawClassIndex < input.labels.length) {label = input.labels[rawClassIndex];if (label == "???" && rawClassIndex + 1 < input.labels.length) {label = input.labels[rawClassIndex + 1]; // shift past dummy}} else {label = "unknown";}// SSD box order: [top, left, bottom, right] – NOT [x, y, w, h]final box = outputBoxes[0][i];final rect = Rect.fromLTRB(box[1].clamp(0.0, 1.0), // leftbox[0].clamp(0.0, 1.0), // topbox[3].clamp(0.0, 1.0), // rightbox[2].clamp(0.0, 1.0), // bottom);detections.add(DetectedObject(label: label,confidence: score,boundingBox: rect,color: colors[rawClassIndex % colors.length],));}return NMSUtils.applyNMS(detections, 0.5);
09 YUV420 to RGB Conversion
The camera delivers frames in YUV420 format — a colour encoding where Y is luminance and U/V are chroma channels sampled at half resolution. TFLite needs RGB. Getting this conversion wrong is the most common cause of garbage detections.
Why the Coefficients Matter
The conversion from YUV to RGB uses the BT.601 full-range standard. Using incorrect coefficients produces a colour-shifted image that looks normal to human eyes but confuses the neural network significantly.
// CORRECT: BT.601 full-range (what this guide uses)R = Y + 1.402 × (V − 128)G = Y − 0.34414 × (U−128) − 0.71414 × (V − 128)B = Y + 1.772 × (U−128)// WRONG: old incorrect coefficients seen in many tutorials// R = Y + 1.370705 × Vd ← wrong// G = Y − 0.698001 × Vd − 0.337633 × Ud ← wrong// B = Y + 1.732446 × Ud ← wrong// The error is ~2–5% per channel – invisible to humans but// enough to drop detection accuracy by 10–20 percentage points
Handling NV12 and I420 Plane Layouts
On Android, the camera typically delivers I420 (three separate planes with uvPixelStride = 1). On iOS it delivers NV12/NV21 (interleaved UV, uvPixelStride = 2). The uvPixelStride field handles this transparently:
final int uvPixelStride = uPlane.bytesPerPixel ?? 1;// uvIndex calculation handles both I420 and NV12/NV21:final int uvIndex =uvRow * uvRowStride + uvCol * uvPixelStride;// For I420: uvPixelStride=1, U and V are separate planes// For NV12: uvPixelStride=2, U and V interleaved (UVUVUV…)// For NV21: uvPixelStride=2, V and U interleaved (VUVUVU…)// (swap uPlane/vPlane references for NV21)// Always mask with 0xFF to handle signed byte values on Android:final int yVal = yBytes[yIndex] & 0xFF;final int uVal = (uBytes[uvIndex] & 0xFF) – 128;final int vVal = (vBytes[uvIndex] & 0xFF) – 128;
010 Non-Maximum Suppression
SSD produces multiple overlapping boxes for the same object. Non-Maximum Suppression (NMS) is the post-processing step that reduces these to a single best box per object. Without NMS, you would see five boxes around every coffee cup.
CustomPainter — Bounding Boxes
The BoundingBoxPainter is a CustomPainter that overlays bounding boxes directly on the camera preview. The model outputs normalised coordinates in the range [0, 1]. We scale these to canvas pixels in the paint() method — no pre-processing needed.
Coordinate Scaling
Rect _scaleRect(Rect normalised, Size canvasSize) {// normalised: left/top/right/bottom all in [0, 1]double left = normalised.left * canvasSize.width;double top = normalised.top * canvasSize.height;double right = normalised.right * canvasSize.width;double bottom = normalised.bottom * canvasSize.height;// Mirror horizontally for front cameraif (isFrontCamera) {final tmp = left;left = canvasSize.width – right;right = canvasSize.width – tmp;}return Rect.fromLTRB(left.clamp(0, canvasSize.width),top.clamp(0, canvasSize.height),right.clamp(0, canvasSize.width),bottom.clamp(0, canvasSize.height),);}
Drawing Corner Accents
Instead of a plain rectangle, we draw corner brackets. This gives the UI a professional AR feel and keeps the interior of the box visible:
void _drawCorners(Canvas canvas, Rect rect, Color color) {const len = 14.0; // corner bracket length in pixelsfinal paint = Paint()..color = color..style = PaintingStyle.stroke..strokeWidth = 3.5..strokeCap = StrokeCap.round;// Top-left cornercanvas.drawLine(rect.topLeft,rect.topLeft + const Offset(len, 0), paint);canvas.drawLine(rect.topLeft,rect.topLeft + const Offset(0, len), paint);// Top-right cornercanvas.drawLine(rect.topRight,rect.topRight + const Offset(-len, 0), paint);canvas.drawLine(rect.topRight,rect.topRight + const Offset(0, len), paint);// … (repeat for bottomLeft and bottomRight)}
Common Bugs and How to Fix Them
These are the bugs that virtually every developer hits when building their first TFLite object detection app:
Conclusion
Real-time on-device object detection is no longer a research project. It is a production-ready Flutter feature you can ship today in an app that fits in an 8 MB package, runs offline, never transmits a single frame to a server, and detects 80 categories of everyday objects in real time.
The combination of TensorFlow Lite for neural inference, BLoC for predictable state management, and Dart isolates for background processing gives you a system that is fast, testable, and maintainable. The five bugs covered in this guide — label off-by-one, wrong YUV coefficients, missing normalisation, wrong box order, and main-thread inference — are the exact issues you’ll encounter, now with clear solutions.
The architecture is deliberately model-agnostic. Swapping SSD MobileNet for YOLOv8 or EfficientDet only requires changing the inference function — the BLoC events, states, UI, and CustomPainter remain unchanged. Build once, swap models freely.
References
Object detection and tracking | ML Kit | Google for DevelopersML Kit's on-device API enables detection and tracking of objects within images or live camera feeds, working…developers.google.com
How do I do flutter object detection?How can I detect an object in the image coming from rtsp using flutter tensorflow? I tried to connect the rtsp…discuss.ai.google.dev
https://www.dhiwise.com/post/implementing-flutter-real-time-object-detection-with-tensorflow-lite
Feel free to connect with us:And read more articles from FlutterDevs.com.
FlutterDevs team of Flutter developers to build high-quality and functionally-rich apps. Hire a Flutter developer for your cross-platform Flutter mobile app project hourly or full-time as per your requirement! For any flutter-related queries, you can connect with us on Facebook, GitHub, Twitter, and LinkedIn.
We welcome feedback and hope that you share what you’re working on using #FlutterDevs. We truly enjoy seeing how you use Flutter to build beautiful, interactive web experiences.
Need help building production-grade Flutter apps? FlutterDevs helps teams ship faster with solid architecture, better UX, and practical AI features. Reach us at support@flutterdevs.com.


