Beginners guide to Object Detection with Edge Impulse

22 min readApr 17, 2021

Edge Impulse is a user friendly machine learning development platform that makes it super easy for anyone with no background knowledge to get started building and deploying machine learning applications while learning along the way. By following a set of straightforward steps its possible train and test models in minimal time and with little effort.

The initial scope of Edge Impulse was embedded systems and as of April 2021 this was expanded to include support for Linux based single board computers starting with access to the hardware acceleration for inference on the Raspberry Pi 4 and the Nvidia Jetson Nano. This has opened up potential for the platform to become the go to development environment for building production class machine learning applications across the entire spectrum of edge hardware. Not to mention that it makes the platform an ideal choice for rapid experimentation with Object Detection.

The primary means of using the platform is a web based user interface known as the Edge Impulse Studio which provides a visual workflow where the processing pipeline is referred to as an Impulse. There is also a HTTP API available should you prefer to forgo the UI and build your own workflows to leverage underlying features of the the platform.

I will be showing you how you can use just the Studio if you are starting from scratch with unlabelled data as well as how you can use the API if you already have labelled training images specifically in the Pascal VOC format. We will be going through an Object Detection workflow along the way. First lets jump into the Studio.

The Edge Impulse Studio

When you first log in you are greeted with the Dashboard which you can navigate back to at any time. The Dashboard actually shows you the general steps to help quickly guide you along the path to building your own model.

Within the Studio users can design what's termed an Impulse to solve common tasks by making use of the available function blocks with the structure in place to ensure you are as productive as possible. An Impulse a no code overlay on top of the learning framework in this case Tensorflow Lite, streamlining the process while making sure everything just fits together. No code, no fuss, no setting up environments or moving data around etc. This is just the tip of the iceberg as there are a lot more ingredients baked into the underlying platform.

Edge Impulse has a wealth of information and many excellent tutorials located on their documentation portal https://docs.edgeimpulse.com/docs should you wish to know more. For now we wont delve into what everything is or means but we will instead go through steps of how to get Object Detection done from within the Studio and environment and making use of the API for bringing in bounding box information from existing label data.

Object Detection

This is made easy within Edge Impulse by the Object Detection Block that was been added to the image processing options on April 12. Prior to this the standard neural network Transfer Learning block supported single object detection, whereas as Object detection brings true multi object detection to Edge Impulse providing those who are already familiar with building Object detection workflows a new easy to use to be more productive.

The typical Object Detection Deep Learning workflow typically consists of some variation of the following sequence of logical steps:

Choose object classes to be detected
Label bounding boxes and annotate images
Preprocess image data / feature extraction
Choose a Model Architecture and Deep Learning Framework
Setup training environment
Setup hyperparameters and start training
Test and Validate the model and tune and retrain as required
Deploy

Implementing the process on your own can become time consuming and tedious as data often needs to be moved between different tools and applications and this can also require a fair amount of custom code to manage the process debugging the data handling. Then there is finding ways to stay within free tier limits also adds additional steps, figuring out how to optimize for inference etc etc. Sure there are some readily available Jupyter Notebooks aimed at helping you go through the process relatively quickly but these still require a lot of fiddling and manual intervention and its easy to mess up and break the process.

If this is your first time experimenting with Object detection I do encourage you to try do all of the steps separately out side of Edge Impulse using the traditional approaches to get an understanding of the process and also an appreciation of what Edge Impulse brings to the table.

Lets now go through those steps in Edge Impulse

1. Choose object classes to be detected

The classes I am choosing for the project are an apple and a mug so that we will build an object detector that can classify between these two classes purely for illustrative purposes. The chosen labels we need to predict are therefore “mug” or “apple” and we want to create a model that will analyze an image frame and ideally highlight all apples and mugs in the scene together with the probability of it being one of those objects. Object detection should not be confused with image classification where the goal is to classify an image as a whole into classes. An object detector outputs the coordinates of an object together with its probability of that object being from a particular class. So keep in mind that when we talk about the output features of an Object Detector we are not just talking about the classification into categories but also locations in an image. This is typically visualized as boxes drawn around detected objects with text labels attached to those boxes which are also known as bounding boxes, something we will discuss in a bit more detail shortly.

As you follow along please feel free to try other examples of objects readily available to you the data set I have used is located here if you want to duplicate what I have done.

In terms of data collection with Edge Impulse you have many options here, you could either upload or collect live data by directly connecting a development board with a camera, a Raspberry Pi or even a mobile phone! For the purposes of this guide however we are going to go the more traditional approach of working with image files to get a feel for the work flow.

The first thing you need to do is create a new project after signing in. This is easily done by clicking on your user icon in the top right of the window and selecting “Create new project”. This project will house everything you need in one place and you can move between different projects as you experiment.

You should see the wizard below automatically appear where you need to select “Images” and then “Classify multiple objects”.

Select “Image” as I have highlighted above then select “Classify multiple objects (object detection)” from the wizard.

Alternatively you could go to the Project info block on the right hand side and select the following.

Configuring Project for Object detection

Don't worry about the Latency calculations which you can leave as is for now.

For this demo I went online and sourced royalty free images to use for training and I needed to find a variety of pictures of different mugs and apples taken from different angles in different lighting. The aim with all detection problems is to try and find images that are a good representation of the variation in what you are trying to detect. There is large body of knowledge out there on how to approach this which I again encourage you to research further.

Once you have the images what then needs to be done is to label the images i.e. indicate the bounding boxes on the images where the objects are together with their labels (classes). Through the training process the Neural Network will learn to understand what types of features in the images as are associated with which classes and will be able to use that to recognize unseen examples of mugs and apples if everything works out well. Fortunately we don't need to worry about how all of that works even less so with Edge Impulse.

Before proceeding there is some terminology that needs to be clarified. When you source images with the objects you want to train on, these objects are located on a background and we need a way to segment the objects out of the image for training purposes. This is achieved by determining the coordinates of the bounding box that surrounds the object together and associating the target label with those coordinates. That is best illustrated with an example of a hypothetical empty scene that contains one object illustrated as a hexagon below

The blue box indicates the bounding box and is represented by the (x,y) coordinates of its top left corner and its width and height all in pixels. Note that the origin of the coordinate system is the upper left corner as is standard in image processing. The target label then needs to be associated with the bounding box.

Since images formats don't have a way to encode this information, it has to be stored as meta data in a separate file and associated with the image. There are numerous approaches to this and I have chosen the Pascal VOC format which is widely supported. Pascal VOC has a meta data file for each image where the file has the same name as the image but is in fact an XML file. Within this XML file the coordinates to bounding box together with the labels are stored as well as other relevant information that we don’t need to concern ourselves with.

2. Label bounding boxes and annotate images

The labelling process is done using labelling software that allows you to visually draw the bounding box and indicate the label. There are both online and offline labelling tools available and the good news is that Edge Impulse includes a labelling tool built into the Studio.

You can and may already have used an external labelling tool and not want to repeat this process especially for large data sets and as long as your annotations have been exported or to converted to Pascal VOC I will show you how you can bring that into your project within Edge Impulse using the API.

Labelling within the Studio

Once you have the images ready you can simply go to “Data Acquisition” on the navigation menu and then proceed to “Upload data”

The process is quite intuitive at this point where you can chose the files and a label for the files selected.

You need to provide a label for each image that indicates the target object type in the image.

One approach you could use is to separate the files into different folders then upload them all at the same time and enter a label manually, so our case you could select all the mug images you have and then enter the label as above and upload after which you could do the same thing for the apple photos. Alternatively you could embed the labels in the image name so long as the text precedes the file extension. You also can let Edge Impulse automatically split the data between a training and test set or you can control that yourself. Whatever you do here you can make changes later but it makes it easier if you do as much up front as possible. Typically you would split your data 80% training and 20% test and try to get a balance between training and test data. If you make changes and add or remove data its suggested you run “Rebalance dataset” on the Dashboard to let Edge Impulse split your data for you again to ensure that you have an 80/20 split between the training and test data. This is quite convenient and saves time. Note that 80/20 is a good option that has become someone what of a standard as with all the things machine learning there are many other approaches.

The data set I chose was intentionally small for illustrative purposes and consists of 12 photos of each class. I was also curious to see how far I could push the detector with such limited data. If in doubt the more data the better but make sure it separates well (more on that later).

https://github.com/aiot-africa/edge_impulse-objectdetection-annotation_importer/tree/main/images — Training dataset

Below shows how you can see by means of a pie chart the split in your data between the label classes in this case it is showing that the training data is split evenly down the two labels so there are 12 images of each class

At this point if you have worked with object detection before you might be wondering about the labelling or annotating of the images i.e. where you select the area of interest aka bounding box and generate meta data files to tell the model where the objects are located in the image. The good news is that the studio allows you do that right in the project by going to the “Labelling Queue” and manually adding the annotations.

When you click save label you will be prompted with a dialog to set the label of the bounding box you just drew and can you add more than one bounding box with different labels this way. You can proceed to do this for all the images in studio. If you make a mistake drawing the bounding box you can select “Edit labels” from the context menu accessed from the 3 dots next to each sample in the Collected data list.

You can then modify the bounding box or even add in more. You will see the labels update in the “labels” column once you save it.

Once you are done you can proceed to step 3.

Importing Labelled Data

Edge Impulse supports importing prelabelled data using a JSON format. A file called “bounding_boxes.labels” needs to be placed in the same folder as the images being uploaded so that when you upload the image data set, this file is part of the same batch of files. The Studio will then parse the JSON to extract the label data and import it for you so that once you are done uploading you will see the bounding boxes already added to your data as you browse through your data.

Details can be found at: https://docs.edgeimpulse.com/docs/cli-uploader#bounding-boxes and below is an example based on the dataset used for this guide to show the JSON structure in action.

The “boundingBoxes” value consists of a JSON object where each key is the file name and the associated values are an array of objects grouping the bounding box and label info. This allows multiple bounding boxes to be added to a single image as part of the value array for each image, an example with multiple bounding boxes in a single image is shown below with two bounding boxes to illustrate how the JSON is structured for multiple bounding boxes:

The “bounding_boxes.labels” files is also how you can also move data between projects using the Export tool within the Studio, which could prove handy when you want to start a fresh project after you have already labelled your data within the Studio and not have to go through the process again for example. The process you need to follow is to simply select the “Export” tab from the Dashboard:

This then creates a zipped download of your dataset split into a training and test folder as per how it is balanced within the studio. The “bounding_boxes.labels” file is also placed into each folder:

If you start a new project then you can just upload all the files from these folders as shown earlier.

A note on the export, you will see two export types that apply to image files namely “uploader compatible” and “original file names”

When you choose “original file names” it exports the file name using your original file name as the start of the filename, whereas when you choose “uploader compatible” it prepends the data label you applied to the files when uploading before drawing bounding boxes in the Studio. This could be useful for managing your data outside of the Studio.

The following section will detail how to import if you have data labelled in other formats before we move on to the next step, feel free to skip this if it doesn't apply to you.

Labelling and importing Pascal VOC

You may however have already done this step outside of Edge Impulse in a labelling tool and you don't want to redo it all again or you may just want to do this outside of the Studio for whatever reason, no problem. The labelling software I used was LabelImg which is an open source tool written in Python that runs locally on your machine and exports CreateML, YOLO and of course Pascal VOC formats. I went with Pascal VOC due to it being so common and widely used.

You may also have data already labelled in other formats such as YOLO which will require you to convert to Pascal VOC. Fortunately Pascal VOC is a very common format there are many resources available to convert this format.

In order to import Pascal VOC data you can use the following script: https://github.com/aiot-africa/edge_impulse-objectdetection-annotation_importer

To use this script it needs to be located in a directory that contains one sub directory called “images” with the image files and another called “labels” with the label files which you can easily setup in LabelImg.

Folder structure for importing Pascal VOC

This script makes use of the API and for this you need a way to authenticate against your account and link to your project. Open the file using your favourite code editor and insert the Project ID and API key as well as the size of the training and test sets and save.

To access your API key it is located on the from the “Keys” tab on top, and if your browser doesn't show the full string and you are using Chrome hit F12 to open the Chrome Developer tools and then use the inspector (shown) to inspect the text and reveal HMTL that you can copy the full API key string from

The Project ID is the the 5 digit number (xxxxx) in the URL: https://studio.edgeimpulse.com/studio/xxxxx

When you are done run the script using the command “python ei_annotation_import.py” and if you get no errors it will have successfully queried each image from the project and then pushed the associated bounding boxes and labels from the Pascal VOC files back to Edge Impulse.

When you are done and go back to your Collected data view you will see it has magically updated for you.

Now you are ready to move on to step 3.

3. Preprocess image data/feature extraction

Now its time to head to the Impulse design screen and create an Impulse that looks like below by clicking on the empty “add a processing block” fields, be sure to click “Save Impulse” and you will see an “Image” and “Object Detection” option appear under Impulse Design where you can tweak the parameters of the feature extraction and model respectively.

The Impulse shows you the general structure of how the Tensorflow Lite framework is orchestrated by the platform starting with preprocessing the training images that get passed to the model to the final output features which in this case are two labels one for each class. This replaces a lot of custom code, Jupyter Notebooks etc and everything just fits together.

Deep Neural Networks are comprised of layers of neurons which once designed have a fixed structure in that the number of input, hidden and output neurons doesn't change during the training and inference processes. The magic is in how the links(weights and biases) between them are modified through the training process. If the training images are all different sizes they need to be scaled to one size before being fed into the Neural network during training.

You will see that the first block gives you some predefined defaults such as an image size of 320x320 and a crop method. Leave it as the defaults so that you don't distort your input images. The 320x320 size is the optimal size for the predetermined architecture provided, here the Edge Impulse team has gone to great lengths to make some design decisions on your behalf to give a good object detection system. For a 3 channel(RGB) image this will give an input layer size of 320x320x3=307200 input neurons.

Analyzing the training set

Feature extraction helps compress the image into a set of representative features , this is known as dimensionality reduction which is actually unsupervised machine learning and this helps determine how the training data will influence the final trained model. You can use this to see how data separates between classes in other words how the images of the different classes are similar to other images of the same class yet different from the other classes.

Edge Impulse has a handy tool that allows you to generate features, first you need to select whether you want to do this for full colour (RGB)or Greyscale versions of the images in the collected data. The parameters tab allows you to only change this one parameter selecting between RGB and Grayscale under colour depth. Leave this as RGB so we can can generate a set of features related to the full colour image since we ideally want to run the detector with a colour camera.

The “Generate features” tab is where you click the button with the same name and after a few moments depending on the size of of the data set your features will be generated behind the scenes.

Here is where it you start to see how well your model will perform by means of the feature explorer which shows your feature set reduced to 3 dimensions with each image represented as a data point in this 3D space in an interactive form that you can rotate to view your data. You can clearly see at this point whether your different classes of images separate well which gives you an early indication of how your model will differentiate between the classes.

What I found with my data set was that the apples and mugs separated quite well except for 2 apples I highlighted with the black arrows. Any guess as to why? The feature explorer allows you click on each data point and view the associated image and these two images are the green apples whereas the rest of the apples are red and clustered more closely together. Just think about that for a second.

I would recommend finding more training images and adding them in the data acquisition step and regenerating features till you are satisfied that your data separates well, otherwise continue to see how it all works out. I was happy with the results above and proceeded to the next step.

4. Choose a model architecture and Deep Learning Framework

Once again Edge Impulse helps make some sound design decisions on your behalf. In terms of the model architecture the image detection block at launch uses the MobileNetV2 SSD architecture. In the world of Object Detection there are few architectures that are popular due to their performance and you will most likely encounter YOLO, SSD and others. SSD is arguably the better option for real-time object detection and that is about as much as you need to worry about this step. If model architectures are your thing and you want to delve deeper there are options to use the Keras expert mode but we wont be doing that and instead we will stick with the choices that have been made for us . Similarly with the deep learning framework the choice has been made for us and this Tensorflow Lite. Again you don't have to worry to much about the details at this point if you don't want to.

5. Setup training environment

There is no need to worry about things like Jupyter Notebooks, training environments etc as the Studio is your environment, lets move on.

6. Setup hyperparameters and start training

The only thing that you need to consider are the Hyperparameters which are shown as Training Settings which is accessible by selecting “Object detection” under “Impulse Design”.

The first option is how many epochs or training cycles which you can change, do take into account that increasing this increases training time. Learning rate refers to how fast the model weights are changed during back propogation and this parameter can greatly influence your training time and stability so if in doubt leave the first two settings at their defaults. The “Score threshold” allows you to set the probability threshold for successful detection. For example if you set it to 0,7 then only bounding boxes will be shown where the probability of being a class is 70% or greater. This can be useful for eliminating false positives but in our case the training set is small and the model wont have a very a high precision so leaving this as is at 0,5 or 50% should give us a reasonable working model. For this demo I left everything at their default values and then clicked the “Start training” button. Once done you can view the results of the training.

You can also view the precision score and memory usage for the int8 vs float32 optimization. This refers to whether fixed or floating point numbers are used with int8 being the optimized choice for embedded systems. The difference in model storage size and precision are notable in this case.

7. Test and Validate the model and tune and retrain as required

Now that you have your model trained its time to test it to see whether you are happy with its performance or not. The Live Classification tool as described in the next section is one such way to do quick testing but to do proper testing you need to run the model on the test data. Recall during the Data acquisition stage you split the training data between training and test sets using the 80/20 split, its now time to use that 20% to test the model. This is import to verify that your model can classify on previously unseen data and hasn't overfit to the training set.

This s done by simply clicking the “Classify all” button for the test data. I was satisfied with the results above which showed that the model was 71% accurate on the test data. I advise you to take the time to experiment with the hyperparameters and retrain the model by adjusting parameters in the impulse and training again to see how this affects accuracy. Also try and train adjust the training data set by adding more images that separate out and removing images that don’t.

8. Deploy

A useful tool is the Live Classification tool which allows you to quickly test your model but you first need to connect a device using the devices menu.

There are many options available and I chose Mobile Phone and scanned the QR which added it the available devices.

Then on the phone I selected “Switch to classification mode” which initiated the deployment of the model to the phone’s browser which took a few moments.

Once it was done you will need to give it permission to access your camera then you can classify by tapping “Classify”

The processing time was unfortunately very slow when running inference this way but allowed me to easily test the model.

I also tested the model using a Raspberry Pi 4 and it was able to perform inference within an average of 430ms which gave good realtime response.

The edge-impulse-linux-runner command provides you with classification information in real time showing the actual probability results for each class at greater precision whereas the bounding box overlays round off to two digits.

Interestingly the detector had no trouble differentiating between mugs but I did find that it struggled with apples. If you step back and think about this and go back to the data set its easy to see that mugs regardless of the pattern or markings have a lot of features that remain consistent across the range of mug examples. This includes the general shape and proportions as well as the presence of the handle. Any mug used for testing would therefor have a high probability of being consistent with the data set. Apples on the other hand being grown naturally exhibit a much larger variation in not only shapes but also texture and patterns. The apples I had were all red but perhaps of a different variety from the ones in the sample images which caused the detector to struggle with the apples from the batch I had.

We can see how important it is to select the right dataset and consider the nature of the objects you want to detect. Here is where you need to really experiment and develop a feel for what works and what doesnt.

In summary we have gone through the process of building an Object Detector with Edge Impulse and whether you are new to Deep Learning or not its hard not to appreciate how straight forward the process is and how productive you can be using Edge Impulse. You can rather focus on the problems you want to solve instead of struggling with how to solve them. I hope that if you are reading this you are thinking about the kinds of object detectors you can build and the potential problems you can solve in your business or community and that you now feel more empowered when it comes to machine learning.

Have fun.