Introducing Faster Objects More Objects aka FOMO

Peter Ing
8 min readMar 29, 2022

--

Object detection has been one of the corner stones of Computer Vision and a subject of ongoing investigation for decades. Not too long ago Deep Learning and Neural Networks were not practical or feasible for most Computer Vision applications and techniques that relied more on digital signal processing and feature based methods dominated the field.

Efforts such as those by Viola and Jones and others spurred on the development of many popular algorithms including Haar Cascades, SIFT, SURF, HOG and others. Many of which found their way into OpenCV in the form of reference implementations that became the defacto basis methods used for Object Detection applications for a while.

Deep Learning began to blossom in the early 2010’s and in the last decade we have seen Deep Learning enable more accurate and sometimes unbelievable results starting with Image Classification and then progressing on to Object Detection and beyond. Much like with the earlier approaches there were particular algorithms or more specifically in the case of Deep Learning, Model Architectures that became popular in the field of Deep Learning based Object Detection.

There are numerous options but if you have done exploration of the topic you may have heard of or even used the likes of You Only Look Once (YOLO) or the variants of Region Based Convolutional Neural Networks (R-CNN’s). You would most likely also have come across Single Shot Detector (SSD) and architectures such as MobileNet and ResNet amongst others.

All of these approaches rely on Convolutional Neural Networks which are computationally very expensive and require accelerated hardware to implement any kind of real time usage. If you have any experience working with Object Detection, the thought of running real time object detection on an embedded system specifically a microcontroller may seem implausible.

A clear distinction needs to be made at this point between a microcontroller and an Edge Device or Single Board Computer such as a Raspberry Pi , Jetson Nano or Google Coral. These types of devices are more capable Edge Devices and have acceleration and larger amounts of memory (Gigabytes) making it possible to run Object Detection inference and even training directly on these systems. Microcontrollers on the other hand are constrained and only have at most a few kilobytes to a few megabytes of memory and usually no operating system. Microcontrollers are the domain of tinyML.

tinyML has brought the world the possibility of running Deep Learning directly on constrained microcontrollers, something that in itself seems as unbelievable as it is real. tinyML has shown that its possible to do a lot of DL from signals directly on the devices that detect them. In terms of Computer Vision, image classification has been possible using tinyML while it has not been feasible to easily run algorithms for full robust multi object detection in real time on microcontrollers. All of that changes now with FOMO.

Edge Impulse has launched Faster Objects More Objects aka FOMO, more just a new feature. Remember that name because it will come to be known as the goto approach for doing Object Detection on MCU’s.

The name Faster Objects More Objects reflects the purpose of FOMO in that it it implements real time multi object detection. You are able to detect objects in milliseconds and it scales to multiple objects without impacting performance, so if you have lots of objects in your scene you it will perform equally as well as with one object, all on a constrained microcontroller. For more details on the inner workings of FOMO check out the official announcement https://www.edgeimpulse.com/blog/announcing-fomo-faster-objects-more-objects

The best part is that its available for you to use right now with boards such as the Arduino Portenta, OpenMV, Himax WE-I and Sony Spresense. If you have the Arduino TinyML Kit you can also use it with the Arduino Nano 33 BLE Sense and the camera add on and more boards will be supported in the future. If you don't have a devkit then you can also run it directly in a web browser thanks to WASM support.

At this point you are probably wondering how to get started with FOMO…If you are a seasoned Edge Impulse user I wont go through the details of the Object detection workflow here but instead dive into the specifics for using FOMO (constrained object detection) in comparison to using normal Object Detection for Linux devices.

If you are new to Edge Impulse then check out my earlier article that takes you through the end to end process of building Object Detection for Edge (Linux) devices on Edge Impulse here. Once you are done with that you will be up to speed to continue with this tutorial.

The official Edge Impulse documentation has been updated https://docs.edgeimpulse.com/docs/fomo-object-detection-for-constrained-devices

First things first you need to start an Object Detection project much like before.

Data Acquisition

Data acquisition and labelling is performed in the same way as with normal object detection by drawing bounding boxes and applying labels to the training and test sets. Try to aim for around 80–100 images per class for good performance with as much background variation as possible to ensure good background suppression. Try to avoid white only backgrounds as this will prevent your model from learning to distinguish against backgrounds which you don't need to explicitly label. You may need to experiment with this number as with all other parameters and Data Augmentation as described below, will also assist in effectively increasing your training set size.

With FOMO you are also able to have multiple objects of different classes in the same training image but its important to make sure bounding boxes do not overlap and that there is a bit of space between objects. Where possible try yo ensure you pick training images where the objects are as far apart as possible so there is no overlap. AI assisted labelling can also streamline the labelling process, watch this video to see an overview of that feature:

https://www.youtube.com/watch?v=wnHrpTbCUYc

Impulse Design

Next when creating your impulse, your input image size needs be a factor of 8 and the resolution needs to be lower than the 320x320 size applied to unconstrained object detection. The MobileNet Model used in the standard object detection’s input layer is specifically sized for 320x320 images.

For FOMO, good options are 160x160 or 96x96 (default), selecting these will cause the Studio to automatically resize the images for you as part of the feature generation process.

Before generating the features one fundamental difference between FOMO and the standard Object Detection is that FOMO uses Grayscale features. This helps save memory and generally doesn't have much impact on your application but does mean you need to make sure you feed grayscale images into your classifier at run time.

Once you have saved the Color Depth setting as Grayscale you can generate your features as per usual.

The Object Detection block also has some subtle key differences with the addition of FOMO:

The hyperparameter default for Learning Rate is lower than the normal default of 0.1 and the default for FOMO is set to 0.001. This is an important parameter affecting the performance of FOMO and tends to be lower than normal object detection. Learning Rate determines the amount that the weights are updated during backpropagation on each training cycle and has a fundamental impact on performance.

Experiment with different values to see how your model performs but keep in mind you have to wait each time for training to complete. The default value of 0.001 is a good default. This ties in directly with the number of epoch’s set to 60 which is a good starting point but as you lower your learning rate you may need to increase this till you see the model converge. You should see this happen around 100 cycles but again you need to experiment as is the nature of Deep Learning. Training takes longer than with other types of projects and can take 20 -30 min or more. Note the standard plan on Edge Impulse has a 20 min limit so keep this in mind when training if you get out of time errors.

Data Augmentation should be checked as this handy feature takes your training set and transforms the images during training before running these additional transformed images through the training process, effectively amplifying your training set. One of the types of transformations is rotation so you don't need to label multiple rotated copies of the same image as this feature has the same net effect.

Finally the Neural Network Architecture section has the option to select between the normal Object Detection(MobileNetV2 SSD FPN-Lite 320x320) and two FOMO options(constrained object detection). You will notice one had 0.1 at the end the other has 0.35 at the end, this is not a minor version number but instead reflects the width parameter of the network. You should experiment with both options while building your model. If you change models you will need to retrain your model again.

Those are the most significant things to look out for and the rest is up to you in terms of designing a good dataset and experimenting with it.

Once you are done building your model head over to the Deployment Page and test it on one of the supported boards or even your browser or you can build the model into your own custom applications.

When using FOMO one key thing to take note of is that the detector outputs the anchor points of the centroids objects and not bounding boxes.

If you are coming over from Data Science and want a quick way to jump in and explore FOMO in custom applications, a good devkit for exploring the FOMO in custom applications is the Arduino Portenta H7 + Vision Shield.

With the Arduino Library Deployment option you can easily build custom sketches incorporating FOMO without having to worry too much about the lower level details of embedded platform, while leveraging the rich high level Arduino API’s on a powerful platform.

Realtime object detection available on microcontrollers opens up a whole new set of potential use cases in IoT and Embedded applications limited only by how far your camera can see.

For a preview of what's possible check out the link below

https://www.youtube.com/watch?v=o2-o3wEmxaU

--

--

Responses (1)