advanced aerial imagery analysis with deep neural networks explained in 5 minutes

There is no secret that when dealing with aerial images the best state-of-the-art results are achieved with deep learning models which come at the cost of their complexity. At the same time thanks to the Open Data, we can explore in a creative way even the most sophisticated techniques.

At nam.R we are working hard to build a Digital Twin of France, and to achieve that we use a lot of sources of information. One of the richest of them being aerial images. For us, humans, “reading” images is easy, but to teach computers how to deal with it is sometimes a real challenge (and fun!). In this post, we will show how we extract a rich description of buildings’ roofs from aerial images, particularly detecting their slopes. In Computer Vision jargon this task is called “object segmentation”, and to do that we chose to use a deep learning approach.

One of the current state-of-the-art segmentation models is the Mask R-CNN model published by researchers from Facebook. And we used this architecture implemented with a Keras framework.

To sum up, our deep learning model should be able to analyze the aerial images and detect roof slopes. This can help us understand the solar energy potential of the roof, and ultimately lead to a progress in nam.R’s vision: accelerating the ecological transition.

What Data Do We Need?

First of all, we need to define what kind of data is suitable for this task. We have a choice between satellite and aerial images. The main difference between them, in the context of our work, is the image resolution. Openly accessible satellite images have a resolution of several meters per pixel, while one can find aerial images with resolutions around 15-20 cm per pixel.

Because we would like to find some fine details on the images, roof slopes and ridges, we have gone with aerial images.

Training a deep learning model to detect roof slopes is a “supervised learning” task, so we need not only the images but also the labels of the slopes. So we created some labels ourselves. This is not a very exciting task, but it is a necessary steps to train a decent model.

This way we obtained two types of data to train the machine learning model: images of roofs and the labels for roof slopes.

A train data “image-label” pair looks like this:

To train a good model we need as much data as possible. Of course, we can label more roofs by hand, but it also possible to generate new samples just with the use of some simple transformations of original images and labels (“data augmentation”). This could be a rotation, a vertical or a horizontal flip and so on.

This way we can obtain a big enough dataset to train our deep learning model.

Deep Learning Model Which Fits Our Goal

Last few years, many high-performance deep neural networks were developed, and achieved impressive results on tasks of object detection. We chose Mask RCNN, a high-performance object segmentation network that was released in 2017. We adapted Matterport’s implementation to be compatible with our aerial images and labels data source.

During the training, the model takes images and corresponding labels and learns its internal parameters to detect roof slopes on any new image. Because our dataset is quite small, a couple of hours of training already produce decent results.

One of the indicators that our model learned is the value of its loss function. The loss function is the criterion which the model tries to minimize, and it is usually an average of error between the real label and the predicted one.

Below are the loss function values for our model, which are constantly decreasing. The figures mean that the model learns well how to detect roof slopes:

Detect Roof Slopes on New Image

During the prediction phase, the model reads only aerial image and predicts the contours of the roof slopes in the image. Because the prediction step does not require complex calculations, it is possible, for example, to copy trained model to the production server and use it to analyze images in a real time.

We can see, that predicted labels for roof slopes are quite accurate, but, as always, there is some space for improvement. From the image above we can see how the roof slope detection doesn’t work well on roofs uncommon material such as metal.

One can imagine several ways to improve this model. For example, we can try to add more training samples for this type of roof material or do more data transformations to generate new samples.

But for the goal of our exploration of deep learning in advanced aerial image analysis, it is already a great result.

This post was just one example of a deep learning model used by nam.R to make France’s Digital Twin richer and closer to the reality. We will share our other techniques in future posts.

Stay tuned!

Plus d'articles