Face Detection in Flutter using ML Kit (No Backend Required)
A production-grade guide to building real-time on-device face detection in Flutter using clean architecture and GetX.
Introduction
Face detection is one of the most widely used computer vision capabilities in modern mobile applications. From camera filters and augmented reality to identity verification and accessibility tools, detecting faces in real time is a foundational building block for many advanced features.
While many tutorials demonstrate how to detect a face in Flutter, most stop once the demo works on an emulator. Real applications require much more: stable camera streaming, correct coordinate transformations, efficient frame processing, and a maintainable architecture that can evolve with the application.
In this guide we will build a production-grade Flutter application that performs real-time face detection entirely on the device using Google ML Kit.
The application runs without:
• a backend server• API keys for inference• network connectivity
All processing occurs locally on the device.
Along the way we will explore several production concerns:
Structuring ML features with clean architecture
Managing camera streams and throttling frame processing
Mapping camera coordinates to screen coordinates correctly
Avoiding common state-management pitfalls with GetX
Debugging camera applications on physical devices
All code examples in this article were tested on a Samsung Galaxy M13 running Android 13, which revealed several edge cases that do not appear in emulators.
Project Demo
The final application performs real-time face detection directly from the device camera.
Features included in the demo:
Live camera preview
Real-time face detection
Bounding boxes over detected faces
Facial landmarks and contours
Smile and eye-open probability classification
Face tracking across frames
Front and rear camera switching
Because detection runs entirely on-device, the feature works offline with extremely low latency.
Typical performance on a mid-range Android device:
MetricObserved ValueDetection latency10–30 msCamera preview30 fpsDetection pipeline5–8 fpsNetwork usage0
Why On-Device Face Detection?
A traditional approach to computer vision involves sending camera frames to a cloud API for processing.
This architecture introduces several drawbacks:
Latency — every frame must travel to a server and back.
Offline failure — the feature stops working without connectivity.
Privacy concerns — captured images are transmitted to third-party infrastructure.
Google ML Kit provides an alternative: on-device machine learning inference.
The face detection model runs locally within the mobile application process. This eliminates network overhead and ensures that user data never leaves the device.
ApproachLatencyOfflinePrivacyCloud API80–400 msNoImages leave deviceML Kit On-Device10–30 msYesImages stay in memory
For camera-based experiences that require real-time interaction, on-device inference is the only practical solution.
Application Architecture
The application follows a three-layer clean architecture where dependencies always flow inward.
Presentation Layer│├── FaceDetectionScreen├── FaceDetectionController (GetX)└── FaceOverlayPainterDomain Layer│├── Entities├── Repository Interfaces└── Use CasesData Layer│├── CameraDataSource├── FaceDetectorDataSource└── FaceDetectionRepositoryImpl
Each layer has a clearly defined responsibility:
Data Layer
Responsible for interacting with external frameworks and services.This is the only layer that imports ML Kit and camera libraries.
Domain Layer
Contains pure Dart business logic including entities, repository interfaces, and use cases.The domain layer has no dependency on Flutter or platform APIs.
Presentation Layer
Responsible for user interaction, UI rendering, and state management through GetX.
This separation ensures the detection pipeline can be tested independently of camera hardware.
Setting Up ML Kit
Dependencies
Add the following packages to your pubspec.yaml.
dependencies: google_mlkit_face_detection: ^0.11.0 camera: ^0.10.5 permission_handler: ^11.3.0 get: ^4.6.6 get_it: ^7.6.7
Android Configuration
ML Kit requires a minimum SDK version of 21 and camera permission.
AndroidManifest.xml
<uses-permission android:name="android.permission.CAMERA" />
build.gradle
android { defaultConfig { minSdkVersion 21 }}
iOS Configuration
Add camera permission to Info.plist.
<key>NSCameraUsageDescription</key><string>Camera access is required to detect faces on this device.</string>
Initializing the Face Detector
The ML Kit detector should be created once and reused across frames.
final detector = FaceDetector( options: FaceDetectorOptions( enableLandmarks: true, enableClassification: true, enableTracking: true, minFaceSize: 0.1, performanceMode: FaceDetectorMode.fast, ),);
Enabling tracking allows ML Kit to maintain a stable ID for each detected face across frames.
Camera Setup
The camera stream provides raw frames that are processed by the detection pipeline.
One critical detail is selecting the correct image format depending on the platform.
CameraController( camera, ResolutionPreset.medium, enableAudio: false, imageFormatGroup: Platform.isAndroid ? ImageFormatGroup.nv21 : ImageFormatGroup.bgra8888,);
Using an incorrect format results in zero detections with no visible error, making it one of the most confusing bugs during development.
Converting Camera Frames to InputImage
The camera frame must be converted into an ML Kit InputImage.
InputImage _convertCameraImage(CameraImage image) { final WriteBuffer allBytes = WriteBuffer(); for (final plane in image.planes) { allBytes.putUint8List(plane.bytes); } final bytes = allBytes.done().buffer.asUint8List(); final Size imageSize = Size( image.width.toDouble(), image.height.toDouble(), ); final camera = cameras[currentCameraIndex]; final rotation = InputImageRotationValue.fromRawValue( camera.sensorOrientation) ?? InputImageRotation.rotation0deg; final format = InputImageFormatValue.fromRawValue( image.format.raw) ?? InputImageFormat.nv21; final metadata = InputImageMetadata( size: imageSize, rotation: rotation, format: format, bytesPerRow: image.planes.first.bytesPerRow, ); return InputImage.fromBytes( bytes: bytes, metadata: metadata, ); }
Frame Throttling
Camera streams often deliver frames at 30 frames per second.
Running detection on every frame can overload mid-range devices.
A simple flag ensures only one detection runs at a time.
if (_isProcessing) return;_isProcessing = true;await detectFaces(image);_isProcessing = false;
This maintains smooth camera preview while keeping inference latency under control.
The Coordinate Transform Problem
Face coordinates returned by ML Kit are expressed in image space, not screen space.
The overlay must therefore transform coordinates from the camera image buffer to the device display.
Three operations are required:
Rotate coordinates based on the camera sensor orientation.
Scale coordinates to match the screen dimensions.
Mirror the x-axis when using the front camera.
Failing to apply these transformations correctly leads to bounding boxes appearing offset, stretched, or mirrored.
Handling this mapping correctly is essential for building reliable camera overlays.
State Management with GetX
GetX provides a lightweight approach to managing application state.
However, several pitfalls can cause runtime issues.
Observable Values
Reactive variables must be wrapped using Rx types.
final rawImageSize = Rxn<Size>();
Assignments update the .value property:
rawImageSize.value = Size(width, height);
Using Obx Correctly
Obx widgets rebuild automatically when observables change.
Obx(() { final imgSize = controller.rawImageSize.value; if (imgSize == null) return const SizedBox.shrink(); return CustomPaint( painter: FaceOverlayPainter(imageSize: imgSize), );})
Async Cleanup
Camera streams must be stopped before disposing the controller.
@overrideFuture<void> onClose() async { await cameraController?.stopImageStream(); await cameraController?.dispose(); super.onClose();}
Failing to await cleanup often causes platform exceptions during hot reload or navigation.
Drawing the Face Overlay
The overlay is rendered using a CustomPainter.
Performance considerations are important when drawing overlays on every frame.
Paint objects should be reused instead of created inside the paint() method to avoid unnecessary garbage collection.
The shouldRepaint method must also compare actual face data rather than list references to ensure updates occur correctly.
Performance on Real Devices
Testing on a Samsung Galaxy M13 (Exynos 850) produced the following measurements:
MetricValueSingle face detection12–18 msThree faces detection20–28 msDetection pipeline5–8 fpsCamera preview30 fps
Because detection runs on a background thread internally, the UI thread remains responsive during inference.
For mid-range hardware, ResolutionPreset.medium provides the best balance between image clarity and detection speed.
Testing the Detection Pipeline
The architecture allows the face detection pipeline to be tested without camera hardware.
A mock repository can simulate detection results.
class MockFaceDetectionRepository extends Mock implements FaceDetectionRepository {}
Unit tests can verify controller behavior using predetermined detection outputs.
This significantly improves maintainability and confidence when refactoring.
Conclusion
On-device face detection is now practical on modern mobile hardware. Using ML Kit, Flutter applications can implement real-time computer vision features without relying on external infrastructure.
However, building a reliable camera feature requires careful attention to several details: camera image formats, sensor orientation, coordinate transforms, state management, and asynchronous lifecycle handling.
By structuring the application around clean architecture principles and isolating ML dependencies in the data layer, the resulting system becomes easier to maintain, test, and extend.
The patterns presented in this article provide a solid foundation for integrating advanced computer vision capabilities into Flutter applications.
Future Improvements
Possible extensions for this project include:
Face mesh rendering for augmented reality effects
Face recognition using embedding models
Real-time emotion detection
GPU-accelerated inference pipelines
Recording annotated video streams
These additions would transform the demo into a fully featured real-time computer vision toolkit for Flutter applications.
Source Code
The full source code is available in the project repository, accompanying this article.
https://github.com/RitutoshAeologic/face_detection
Thanks for reading this article
If I got something wrong? Let me know in the comments. I would love to improve.
Clap
If this article helps you.
Feel free to connect with us:And read more articles from FlutterDevs.com.
FlutterDevs team of Flutter developers to build high-quality and functionally-rich apps. Hire a Flutter developer for your cross-platform Flutter mobile app project hourly or full-time as per your requirement! For any flutter-related queries, you can connect with us on Facebook, GitHub, Twitter, and LinkedIn.
We welcome feedback and hope that you share what you’re working on using #FlutterDevs. We truly enjoy seeing how you use Flutter to build beautiful, interactive web experiences.
Need help building production-grade Flutter apps? FlutterDevs helps teams ship faster with solid architecture, better UX, and practical AI features. Reach us at support@flutterdevs.com.


