we use
original data
to create value

Our mission is to produce original and actionable data delivered in tailor made platforms.

read more

in a data-driven world

We are a company committed to a major revolution : using data as a raw material to bring about radical changes

in a data opening global movement

unique opportunities and new challenges arise every day

our mission is to produce original data

producing unique data is our way of enriching this huge pool of opportunities

by developing artificial intelligence for good

creating original data requires permanent investment in artificial intelligence

and building efficient and robust digital infrastructures

our objective is to build infrastructures from original data

In the energy, renovation, construction, insurance, green finance and culture fields, we gather technical and scientific partners in public and private projects to build digital platforms that enable environmentally friendly economic development.

our expertise

we produce original data

We collect available, non-personal data to generate a unique data library of original data using our AI tools.

Open data is an unprecedented opportunity for value creation. Building unique datasets and making them actionable allow for the emerging of new opportunities.

producing
original data

using our AI tools and from images, texts and structured data, we create original data sets

connecting
heterogeneous datasets

through our relationship engine and the creation of datasets, the data we produce becomes relevant, categorised by business sectors, with geo-tracking at the right level of granularity

making data actionable

Through integration in information systems of our clients or through development of tailored platforms, data is a tool for decision making or for building new opportunities.

Our datasets are actionable according to the constraints and needs of our clients and partners.

business opportunity
sheets and clusters

we deliver opportunity sheets and cluster sheets directly in the tools of our clients through tailored solutions and interfaces.

building
tailored platforms

we develop platforms allowing our partners to use by themselves our databases and AI tools to generate their own sheets and clusters to create value, in their current projects or in new ones.

  • using our AI tools and from images, texts and structured data, we create original data sets

  • through our relationship engine and the creation of datasets, the data we produce becomes relevant, categorised by business sectors, with geo-tracking at the right level of granularity

  • we deliver opportunity sheets and cluster sheets directly in the tools of our clients through tailored solutions and interfaces.

  • we develop platforms allowing our partners to use by themselves our databases and AI tools to generate their own sheets and clusters to create value, in their current projects or in new ones.

platforms

Our specific platforms allow us to gather complementary key actors around relevant data, in order to generate economies of scale and create new and better services. The platform hence becomes a genuine asset, a mutualized infrastructure dedicated to the exploitation and the creation of value upon the relevant data.

We accelerate exchange, reuse and valorisation of data through tailored platforms.

example ?

team

Grégory Labrousse
CEO

Pierre Lescure
Co-Founder & Board Member

Lila Tretikov
Co-Founder & Board Member

Emmanuel Bacry
Co-Founder & Board member

Pierre-Alain De Mallerey
Board member

Éric Euvrad
Board member

Raoul Saada
Senior Strategic Deployment Finance Sector

Sebastián Sachetti
Senior Strategic Deployment Public & Heritage Sector

Nicolas Berthelot
Lead Data Strategy

Louis Petros
Lead Knowledge Strategy

Gaël Grasset
Lead Product

Servane Khaouli
HR Business Partner

Alexandre Bacchus
Data Scientist

Duccio Piovani
Data Scientist

Charles Hutin-Persillon
Data Strategist

Aymeric Flegeo
Data Engineer

Guillaume Larcher
Developer Product

Vincent De Chillaz
Data Analyst - Quality

Alexander Usoltsev
Computer Vision Scientist

  • Grégory Labrousse

    CEO

    President of nam.R, serial entrepreneur, specialised in environment and cost reduction consulting. Founder of a group specialised in green finance, agro-ecology and eco-tourism.

  • Pierre Lescure

    Co-Founder & Board Member

    Author of the 2013 Digital Economy report, which underlined, among other things, the importance of open data, Pierre was involved in the creation of many companies, keeping in mind to always be at the forefront of the technologies involved (Molotov.tv, Canal+). Pierre is also a board member of the Kudelski cybersecurity group and the Lagardère group, and president of the Cannes Film Festival.

  • Lila Tretikov

    Co-Founder & Board Member

    Lila is Chief Executive Officer of the Terrawatt initiative, the “armed wing” of the International Solar Alliance, launched with the support of the President of the French Republic. Former President of the Wikimedia Foundation, where she initiated Wikipedia’s ambitious missions in digitalisation and artificial intelligence, Lila is a recognized expert in machine learning, with an impressive career (Chief Product Officer of SugarCRM, CEO of Raskspace...). Her name featured in Forbes list of The World's 100 Most Powerful Women (2014) and in San Francisco Chronicle’s “21 Most Powerful Women in Bay Area Technology”. She was awarded a Stevie Award for Woman in Business, and is a member of the Young Leaders of the World Economic Forum.

  • Emmanuel Bacry

    Co-Founder & Board member

    CNRS Research Director at Paris Dauphine University. Professor and head of the “Data Science & Big Data” Initiative at Ecole Polytechnique. In charge of the "Big Data" processing of the French Social Security database.

  • Pierre-Alain De Mallerey

    Board member

    A former student of the ENA (promotion Senghor) and Ecole Polytechnique, an Inspector of finance, Pierre-Alain was successively ministerial advisor and managing director of MutRé. A recognized data specialist and president of insurance broker Santiane, he joined nam.R’s board in 2017.

  • Éric Euvrad

    Board member

    Director and President of the Audit Committee of the Atari group, he began his career at Arthur Andersen where he participated in the development of the “Restructuring” practice. He then joined Lucien Deveaux for the takeover of the Bidermann Group where he directed the turnaround, before launching an Internet start-up, which he sold in 2002. It was then that he took over Gigastore, a non-food discount branch, through an LBO, that he directed until its sale in 2008. Erick manages a consulting firm specialising in mutation phases and co-leads a training group.

  • Raoul Saada

    Senior Strategic Deployment Finance Sector

    A graduate from Polytechnique, an expert in financial and restructuring intermediation, Raoul participated in the creation of the equity broker Finacor and the implementation of the EuroMTN market. A specialist of green finance, he has been actively involved in many structuring transactions in the sector.

  • Sebastián Sachetti

    Senior Strategic Deployment Public & Heritage Sector

    A graduate of the Ecole Nationale d’Administration, Sebastian is an outstanding polyglot (speaking French, English, Italian, Spanish and Portuguese) who has successively held leading positions in the private and the public sectors, combining the worlds of finance, culture et IT. Responsible for the conception and the implementation of the Pass Culture before joining our team, Sebastian is currently developing the new nam.R platforms.

  • Nicolas Berthelot

    Lead Data Strategy

    A recognized expert and passionate person, involved in the processing of socio-economic databases since his early childhood, Nicolas joined nam.R from the beginning to build his Data Library, a project that is unique in its size and innovation. A graduate of Sciences Po in Paris, Nicolas is one of the first in France to graduate in Data Strategy (Sorbonne University - UPMC). At nam.R, he leads the "Data Sourcing" and "Data Strategy" teams.

  • Louis Petros

    Lead Knowledge Strategy

    Passionate about economics, strategy and public affairs, Louis joined nam.R at the launch of the project, after working at EDF Trading in London, the French National Assembly and the French Ministry of Defense. A graduate in Political Science at the IEP in Strasbourg and holder of a Master's degree from the London School of Economics, Louis is responsible for the teams in nam.R's "Solutions" department, where he identifies and builds the solutions in which nam.R is involved.

  • Gaël Grasset

    Lead Product

    A Data Scientist and with an education in sociology, Gaël joined nam.R at the beginning of the project after graduating in statistical engineering at ENSAE and holding a Master's degree in sociology at Sciences Po Paris. Data Scientist, manager, product manager... Gaël has always been at the heart of nam.R's evolution, thanks to his technical expertise in Artificial Intelligence, which he developed at Oscaro. He is in charge of nam.R's “Product” teams.

  • Servane Khaouli

    HR Business Partner

    A graduate in history and project management, with many experiences in culture, communication and SME management, Servane has been involved with nam.R since its creation. At the heart of her responsibilities, combining human resources and office management, Servane supports the development of the various departments.

  • Alexandre Bacchus

    Data Scientist

    Holder of a doctorate in electrical engineering and a recognized expert in artificial intelligence, particularly in energy, Alexandre has worked successively on innovation and data projects at EDF and Enedis. Trained in agile project management, he joined nam.R's data science teams to help build the Digital Twin and organise team management.

  • Duccio Piovani

    Data Scientist

    After obtaining his PhD in complex systems at the prestigious Complexity Center of Imperial College London, Duccio has worked successively as a data scientist and researcher at the Center for Advance Spatial Analysis at Imperial College. Duccio joined nam.R to design the proprietary algorithms for nam.R.

  • Charles Hutin-Persillon

    Data Strategist

    A graduate of the IEP of Grenoble, Charles is pursuing a research career in Social Science, Political Science and International Relations. Passionate about environmental issues and the use of statistics, he has gradually turned to Data to become, in September 2017, Data Strategist at nam.R.

  • Aymeric Flegeo

    Data Engineer

    Aymeric graduated in Data Science at Télécom Paris, trained in Data Engineering at Renault, before joining nam.R. He joined the technical team in 2017 as a Data Engineer.

  • Guillaume Larcher

    Developer Product

    A graduate of the prestigious Master MVA program at ENS Paris-Saclay and Ecole Centrale of Lille, Guillaume joined nam.R in April 2017 for his final year internship. A jack of all trades, specialised in computer vision and development, he is working - after several months spent setting up image recognition algorithms - with the Product team to develop nam.R products.

  • Vincent De Chillaz

    Data Analyst - Quality

    An engineer from the Ecole Centrale of Lyon and a specialist in energy and climate, Vincent spent several years working for the renowned Carbone 4 firm, after working at Schlumberger, Vinci and Citeo. Lecturer at ESTP and passionate about data projects, Vincent joined nam.R as Product owner for the Digital Twin.

  • Alexander Usoltsev

    Computer Vision Scientist

    Alexander holds a Master's degree in Biometrics and first worked as a Research Engineer before joining nam.R in early 2018. As part of the Computer Vision team, Alexander is working on satellite images that he segments to identify different objects of interest to the start-up, particularly regarding solar energy.

  • And also...

    Discover the other great people at nam.R !

Clément Perny

Data Scientist

Clément is a final year Data Science Master student at Grenoble ENSIMAG engineering school and is currently working at nam.R as part of a NLP internship at the end of his studies. He works with the Data Science team on several projects.

Dina Khattab

Data Engineer

A Master 2 student in Data Science at Sorbonne University, Dina joined nam.R as part of a final year internship as a Data scientist. Dina has found in nam.R’s project and its use of open data in the context of the ecological transition a real opportunity to work on useful and concrete projects with the help of data science tools and technologies.

Hermes Martinez

Data Scientist

A graduate in language science from Paris Diderot University, Hermès is passionate about recent advances in Natural Language Processing. He puts his passion and expertise at the service of the nam.R project by manipulating unstructured textual data.

François Andrieux

Data Scientist

A graduate engineer from the ESIEA (École supérieure d'informatique, électronique, automatique), François is a true machine learning enthusiast, a passion he nurtures in a blog praised by amateurs, in parallel with his role as OpenClassRoom tutor. He joined nam.R in 2018 as a Data Scientist.

Paul-Louis Barbier

Full-stack Developer

A graduate in computer science and information systems, Paul-Louis was a Backend developer and Lead Dev in a startup before joining nam.R in May 2018. He is in charge of developing the Data Library for the Data Strategy team.

Sébastien Ohleyer

Computer Vision Scientist

A graduate of the Ecole Centrale of Lille in Data Analysis and holder of a Research Master in Mathematics from the University of Lille 1, Sébastien then joined the Master MVA of the Ecole Normale Supérieure Paris-Saclay. It is within the framework of this Master's degree that he joined nam.R as a Computer Vision Scientist.

Bastien Hell

Computer Vision Scientist

After graduating from an engineering school, Bastien joined the Institut national de l'information géographique et forestière (IGN) to carry out image processing and deep learning applied to geographical information. In early 2018, he joined nam.R as a Computer Vision Scientist.

Frédéric Maison

Office Manager

Frédéric is a development manager for Geo PLC. Frédéric uses his experience in office management to support nam.R's growth and the sustainability of its structure.

Florentin Fromont

HR Administrator

Florentin holds a Master's degree in HR Management and Sustainable Performance and joined nam.R in February 2018. He is the HR assistant. His role is to ensure the administrative management of HR, set up a follow-up of personnel files, and intervene upstream in the recruitment process.

Jules Robial

Art Director

A graduate of the Métiers d'Art at the Estienne school of art in Paris, Jules has a graphic designer and a typographer background. Working at nam.R since the end of 2017, Jules has been in charge of the startup's entire visual identity, on all its graphic or communication media, in order to develop the startup's image.

Adèle Bayart

Community Manager

After four years of studying communication and strategy at EFAP, Adèle had her first experience in community management in the luxury sector. She joined the nam.R project at the end of 2017 as Community Manager. As such, she’s responsible for communication on social networks and developing the visibility of the startup online.

Valentine Lambolez

Data Engineer

Valentine holds a Master's degree in Statistics and Socio-Economic Informatics from Université Lumière - Lyon II, where she specialised in Data Engineering. She joined Deepki, where she was in charge of the development and optimal implementation of customer-specific statistical models. She joined nam.R in 2018 where she is working as a Data Engineer.

Frédéric Kingue-Makongue

Data Strategist

A graduate of the CNAM and the University of Bordeaux, specialised in digital archiving, Frédéric strengthens the Data Strategy team thanks to his expertise in ontology construction, document set structuring and massive data corpus analysis.

Corentin Louison

Business Analyst

Currently in his first Master year of Science, International strategy & influence at SKEMA BS, Corentin joined nam.R in July 2018 for a 6-month gap year internship in the Solution division and is participating in technological survey and ecosystem missions.

Juliette Cocault

Business Analyst

After obtaining a Bachelor in Commerce from McGill University in Canada, Juliette is pursuing her studies at EDHEC. As part of her Master's degree, Juliette joined nam.R for a 6-month internship in Strategy. Within nam.R, she is working on the company’s strategic monitoring, the implementation of an automatic monitoring tool, and the development of performance indicators for energy retrofitting.

Hassen El Golli

System Administrator

Hassen graduated from Epitech in 2012 and ran through a lot of developer and system administration activities. First as a R&D engineer for Vianeos then as a freelancer in several ambitious projects. He found a home at nam.R as he was looking for a full time opportunity to use his knowledge within sustainable development and ecologic transition focused projects.

Mélisande Teng

Data Scientist intern – Computer Vision

Mélisande has nearly completed her degree in applied mathematics at CentraleSupelec Paris and her master's in MVA at ENS Paris-Saclay. Prior to this, she pursued an MSc in social entrepreneurship management at ESSEC. After participating in a Data Science for Social Good Europe project predicting the risk level associated with a child not receiving both doses of the MMR vaccine (Croatia) she joined nam.R as a Computer Vision intern working on Google Street View image analysis.

partners

For the development of our artificial intelligence tools and the definition of business rules applicable to various sectors, we rely on private and public structures such as research laboratories and business experts.

news

Advanced Aerial Imagery Analysis with Deep Neural Networks Explained in 5 Minutes.

There is no secret that when dealing with aerial images the best state-of-the-art results are achieved with deep learning models which come at the cost of their complexity. At the same time thanks to the Open Data, we can explore in a creative way even the most sophisticated techniques.

+

Developing a Complex Computer Vision System, a Case Study : Solar Panels Equipped Roofs.

At nam.R we are working hard to build the Digital Twin of France. To this end, we use a lot of sources of information such as aerial images.Extracting useful information from the unstructured data that are images? Sounds like a job for the computer vision team !

+

Merging Geo-Spatial Data on Twin Polygons.

One of the most important aspects of our work at nam.R is to find, clean, aggregate and organise large datasets of geo-localised data found in Open Data portals. Very often the same geographical area or object is described in many different datasets, each containing a different piece of information.

+

Rapport au gouvernement “Les données géographiques souveraines”, des propositions ambitieuses dans un cadre clarifié

Le récent rapport sur les données géographiques propose un nouvel angle d’analyse sur les données géographiques produites en France. Tout d’abord le rapport nous rappelle que les données géographiques sont extrêmement variées dans leurs sujets (description du territoire dans ses aspects physiques naturels, artificiels, visibles comme invisibles) mais ont pour particularité de toutes trouver leur utilité directement de leur composante géolocalisée.

+

Une Data Library au service de la valorisation des données externes

Open data issues des secteurs privé comme public, données des réseaux sociaux, plateformes de données privées, le web est une source infinie de données externes.

+

pictures

Vivatech 2018

Paris, France - 05/2018

  

Emmanuel Macron était à #vivatech2018. Grégory Labrousse a pu lui présenter nam.R qui incarne parfaitement les valeurs d’une startup d’intérêt général.

Vivatech 2018

Paris, France - 05/2018

  

Un stand aux couleurs de nam.R pour servir de scène aux pitch des 3 jours de #vivatech2018. #DIY

AI for Good 2018 – ITU

Geneva, Switzerland - 05/2018

  

Conférence #AIforgood2018

Data Science Summer School

Paris, France - 07/2018

  

Duccio et Sacha sont toute la semaine de la Data Science Summer School pour présenter les projets de nam.R. #DS3

GeoDataDays 2018

Le Havre, France - 07/2018

  

@namr_france était présent aux #GeoDataDays2018 organisés par @afigeo_asso_fr et @DecryptaGeo. Grand rendez-vous des producteurs et utilisateurs des données géographiques. #opendata #geodata

Web Summit 2017

Lisboa, Portugal

  

Président Hollande talks with Gregory Labrousse about the Open Data revolution. François Hollande was one of Open Data’s pioneers alongside President Barack Obama.

Web Summit 2017

Lisboa, Portugal

  

Meeting with Professor Mohan Munasinghe, Nobel Prize for Peace dedicated to his contribution to define the Sustainable Development Goals. nam.R is a key actor in the SDG working groups.

Data Science Summer School

Paris, France - 07/2018

  

Grégory rencontre ann LeCun à la Data Science Summer School. L’occasion de présenter le travail de nam.R pour mettre la #datascience au service de la transition écologique. #iaforgood

Hackathon DataEnergie 2017

Paris, France - 06/2017

  

nam.R est lauréat du Hackathon DataEnergie 2017 organisé par @rte_france @GRTgaz @enedis @GRDF @Etalab et @LIBERTE_LL. #hackathon #victory

Websummit 2017

Lisbon, Portugal - 11/2017

  

L’équipe nam.R au complet (ou presque !) s’était rassemblée à Lisbonne à l’occasion du #websummit2017.

Websummit 2017

Lisbon, Portugal - 11/2017

  

Le stand de nam.R au #websummit2017, un lieu pour découvrir le projet ambitieux de nam.R et son #digitaltwin.

Websummit 2017

Lisbon, Portugal - 11/2017

  

A la rencontre d’@enigma_data, société majeure de l’#opendata aux Etats-Unis. Une curiosité toute particulière pour l’immense dépôt de données qu’ils ont constitué : Enigma Public. #opendata

Websummit 2017

Lisbon, Portugal - 11/2017

  

Le stand en construction de nam.R, le projet prend vie au #websummit2017.

Dreamforce 2017

San Francisco, U.S.A - 11/2017

  

nam.R était à #dreamforce2017, la grand messe de Salesforce. Au programme, toutes les déclinaisons du préfixe my : #myEinstein, #mySalesforce, #myTrailhead !

World Efficiency Solutions 2017

Paris, France - 12/2017

  

@GrassetGael présente nam.R au #WorldEfficiencySolutions2017.

World Efficiency Solutions 2017

Paris, France - 12/2017

  

Le stand de nam.R au #WorldEfficiencySolutions2017, événement complémentaire au #OnePlanetSummit !

World Efficiency Solutions 2017

Paris, France - 12/2017

  

@LouisPetros est présent sur le stand pour présenter le projet de nam.R pour la #transitionécologique. #WorldEfficiencySolutions2017

World Efficiency Solutions 2017

Paris, France - 12/2017

  

@g_labrousse et @LouisPetros présentent à la ministre @brunepoirson le projet de nam.R pour la #transitionécologique. #WorldEfficiencySolutions2017

Big Data Paris 2018

Paris, France - 03/2018

  

@GrassetGael représente @namr_france pour les finales du Trophées Big Data à @BigDataParis

Big Data Paris 2018

Paris, France - 03/2018

  

@Nicolas_data présente la #DataLibrary et son usage à #BigDataParis2018. #opendata #satellite #aerialimagerey

Big Data Paris 2018

Paris, France - 03/2018

  

@g_labrousse présente le projet nam.R à une audience nombreuse rassemblée à #BigDataParis2018

Big Data Paris 2018

Paris, France - 03/2018

  

Lila Tretikov, CEO de Terrawatt Initiative & co-fondatrice de nam.R à Big Data Paris

Observatoire de l’Open Data

Paris, France - 04/2018

  

nam.R est fier d’avoir participé à la création de l’Observatoire de l’Open Data en partenariat avec @OpenDataFrance, @Etalab, @caissedesdepots, @sciencespo. #opendata

Vivatech 2018

Paris, France - 05/2018

  

La Team nam.R est présente en force à #vivatech2018. Les curieux de #DigitalTwin, d’#ODD et d’ #AIforgood sont venus nombreux pour discuter des projets de @namr_france.

AI for Good 2018 – ITU

Geneva, Switzerland - 05/2018

  

Gaël à la conférence de l’International Telecommunications Union pour l’#IAforGood

Meet-up Green Tech Verte

Paris, France - 05/2018

  

@g_labrousse présente nam.R au Meet-up de la GreenTechVerte

Conférence Comité21

Paris, France - 06/2018

  

Conférence #anthropocene & #ia du @Comite21, avec @Bettina_Laville & @LouisPetros de @namr_france

Toulouse Space Show 2018

Toulouse, France - 06/2018

  

@Charles_data et @GuillaumeLarch sont présents au #ToulouseSpaceShow2018. Ils peuvent y échanger sur notre usage de la donnée satellite aux experts du secteurs, producteurs comme utilisateurs. #satellite #newspace #opendata

Data Science Summer School

Paris, France - 07/2018

  

Présentation de Duccio et Sacha de l’utilisation de leur algorithme de shape matching à la Data Science Summer School. #DS3 #datascience #geomatics

contact

Contact Us

4 rue Foucault, 75116 Paris
(0033) (0)1 85 800 801
contact@namr.com

Advanced Aerial Imagery Analysis with Deep Neural Networks Explained in 5 Minutes.

There is no secret that when dealing with aerial images the best state-of-the-art results are achieved with deep learning models which come at the cost of their complexity. At the same time thanks to the Open Data, we can explore in a creative way even the most sophisticated techniques.

At nam.R we are working hard to build a Digital Twin of France, and to achieve that we use a lot of sources of information. One of the richest of them being aerial images. For us, humans, “reading” images is easy, but to teach computers how to deal with it is sometimes a real challenge (and fun!). In this post, we will show how we extract a rich description of buildings’ roofs from aerial images, particularly detecting their slopes. In Computer Vision jargon this task is called “object segmentation”, and to do that we chose to use a deep learning approach.

One of the current state-of-the-art segmentation models is the Mask R-CNN model published by researchers from Facebook. And we used this architecture implemented with a Keras framework.

To sum up, our deep learning model should be able to analyze the aerial images and detect roof slopes. This can help us understand the solar energy potential of the roof, and ultimately lead to a progress in nam.R’s vision: accelerating the ecological transition.

What Data Do We Need?

First of all, we need to define what kind of data is suitable for this task. We have a choice between satellite and aerial images. The main difference between them, in the context of our work, is the image resolution. Openly accessible satellite images have a resolution of several meters per pixel, while one can find aerial images with resolutions around 15-20 cm per pixel.

Because we would like to find some fine details on the images, roof slopes and ridges, we have gone with aerial images.

Training a deep learning model to detect roof slopes is a “supervised learning” task, so we need not only the images but also the labels of the slopes. So we created some labels ourselves. This is not a very exciting task, but it is a necessary steps to train a decent model.

This way we obtained two types of data to train the machine learning model: images of roofs and the labels for roof slopes.

A train data “image-label” pair looks like this:

To train a good model we need as much data as possible. Of course, we can label more roofs by hand, but it also possible to generate new samples just with the use of some simple transformations of original images and labels (“data augmentation”). This could be a rotation, a vertical or a horizontal flip and so on.

This way we can obtain a big enough dataset to train our deep learning model.

Deep Learning Model Which Fits Our Goal

Last few years, many high-performance deep neural networks were developed, and achieved impressive results on tasks of object detection. We chose Mask RCNN, a high-performance object segmentation network that was released in 2017. We adapted Matterport’s implementation to be compatible with our aerial images and labels data source.

During the training, the model takes images and corresponding labels and learns its internal parameters to detect roof slopes on any new image. Because our dataset is quite small, a couple of hours of training already produce decent results.

One of the indicators that our model learned is the value of its loss function. The loss function is the criterion which the model tries to minimize, and it is usually an average of error between the real label and the predicted one.

Below are the loss function values for our model, which are constantly decreasing. The figures mean that the model learns well how to detect roof slopes:

Detect Roof Slopes on New Image

During the prediction phase, the model reads only aerial image and predicts the contours of the roof slopes in the image. Because the prediction step does not require complex calculations, it is possible, for example, to copy trained model to the production server and use it to analyze images in a real time.

We can see, that predicted labels for roof slopes are quite accurate, but, as always, there is some space for improvement. From the image above we can see how the roof slope detection doesn’t work well on roofs uncommon material such as metal.

One can imagine several ways to improve this model. For example, we can try to add more training samples for this type of roof material or do more data transformations to generate new samples.

But for the goal of our exploration of deep learning in advanced aerial image analysis, it is already a great result.

This post was just one example of a deep learning model used by nam.R to make France’s Digital Twin richer and closer to the reality. We will share our other techniques in future posts.

Stay tuned!

Developing a Complex Computer Vision System, a Case Study : Solar Panels Equipped Roofs.

At nam.R we are working hard to build the Digital Twin of France. To this end, we use a lot of sources of information such as aerial images.Extracting useful information from the unstructured data that are images? Sounds like a job for the computer vision team !

To detect all solar panels on the roofs of french buildings, we used aerial imagery and the known outlines of the buildings, through a pipeline consisting of a solar panel outline detector and a filtering algorithm.

We took inspiration from the projects revolving around the state-of-the-art object segmentation deep learning algorithm known as Mask-RCNN. This algorithm is the newborn of a family of algorithms developed by Ross Girshick & al., in direct continuation of RCNN, Fast-RCNN and Faster-RCNN.

The chosen pipeline consists in two complementary parts :

– an object detector, more specifically an instance segmentation algorithm, meant to detect solar panels and extract their contours ;

– a filtering algorithm that takes all detections and filters out those that don’t match our business rules.

While the filtering algorithm can easily be developed using the expert rules we chose to consider (size of the detected solar panels, their position regarding to the considered roof), the deep learning model depends directly of the data we will feed it with.

The first part of the project was, accordingly, to generate a dataset of roofs equipped with solar panels, and the matching labels. There are multiple existing tools for image annotations (VGG VIA, MIT LabelMe,…) that can be used as is. We chose the VGG Image Annotator. After a few (hundreds of) clicks, we ended up with a dataset we’re quite proud of.

Only then, we were able to train the Mask-RCNN model to detect solar panels. The first version of our model wasn’t performing all that well and if was necessary to add more data to the training stage. We realized semi-supervised learning with automatic labelling.

This technique consists in using the model as a way to compute more labels that are then controlled by human operators and used as training data for a new, more robust version of the model. Controlling whether the proposed labels were right and correcting the wrong ones was way simpler than labelling by hand hundreds of images. Basically, we used our first model as a replacement for crowdsourcing ! After a few loops we fetched more data and matching labels and were able to train a model that had acceptable performances.

We transformed the raw output of the model into polygons in the same format and projection as our building polygons using geometric algorithms (Marching Squares, Douglas-Peucker) and geographic transformations. This enables us to directly filter out the potential false positives. We found out that roof windows, glass roofs and blue awning fabric were likely to be mistaken for solar panels due to their similar visual textures.

Fortunately, most false positives can be filtered out using the information of the building shape and position, but also expert rules concerning the minimal surfaces for solar panels.

The first conclusion we draw is that the combination of machine learning and expert rules can become a reliable framework, harnessing the power of machine learning algorithms and the robustness of business rules.

The second one was the use of our first imperfect model to help us label more data. Real data but synthetic labels, a great example of human-machine cooperation, isn’t it?

Finally, there are many different ways for computing the performances of this kind of pipelines. The deep learning model itself can be evaluated using metrics such as its mean average precision but we were mainly interested in the performances of the whole flow. Thus, we chose metrics that are less image-centric and oriented more towards information retrieval : precision, recall and overall accuracy. We added a geometric metric that indicates how well our predicted panels matched with the actual ones, the Intersection over Union (IoU).

We achieved the performances of 96% overall algorithm accuracy and 84% IoU on our test set, values we’re quite proud of.

The predictions of solar panels were integrated in nam.R’s Digital Twin and the information is already put to good use !

Merging Geo-Spatial Data on Twin Polygons.

One of the most important aspects of our work at nam.R is to find, clean, aggregate and organise large datasets of geo-localised data found in Open Data portals. Very often the same geographical area or object is described in many different datasets, each containing a different piece of information. To build a rich description of the object the real challenge is to join these pieces together. But in absence of a clear and coherent name or index, this process can become quite difficult and noisy.

For example when trying to aggregate information on a building we may find its price per square meter in a file, while another file may contain information on its heating system or material. Of course though buildings usually don’t have names, and are often indexed differently according to the source, or institution, behind the creation of the dataset. This means that to correctly merge the various sources one has to be creative.

Luckily more often than not the geographical objects come with the coordinates that describe their geometry, and this indeed can be used to merge the data on twin polygons. By twin polygons we mean polygons describing the same object in the different datasets.

That said one can be surprised to see how many small differences are found in the coordinates of one same object coming from different sources: the angles, the number of vertexes, the length of the edges and of course the geolocalization are often slightly different. Moreover a single building in a dataset can be easily represented as several buildings in another one. In the figure above we can see two tricky examples. All this implies that a unique definition of twin polygonsdoes not exist and therefore we leave this task to an algorithm.

Machine Learning Approach: A Shape Matching Algorithm.

The Machine Learning approach we chose consists of training a Random Forest algorithm to give a similarity score of two polygons. To do this we started by describing polygons through a number of geometrical features: compactness, PCA orientation, eccentricity, convexity and more. A polygon is therefore interpreted as a vector of numbers like in the figure above. This allowed us to quantify the difference of any given couple as the difference of these vectors .

To train our Random Forest algorithm we labeled by hand more than 20 000 couples of polygons, coming from spatial intersections of real datasets, as same or different.

The algorithm then learnt to give a similarity score based on the geometrical features of two polygons. This can be interpreted as the probability of the two polygons being the same one. By setting a threshold at t = 0.7 on the similarity score we obtained the following performances on our training set.

We considered these values good enough, and therefore started using this algorithm to put some order in the wildness of the geospatial data universe. May the merging begin !

Rapport au gouvernement “Les données géographiques souveraines”, des propositions ambitieuses dans un cadre clarifié

Rapport au Gouvernement “Les données géographiques souveraines”, Valéria FAURE-MUNTIAN, députée de la Loire, Juillet 2018 : Disponible ici

La donnée géographique souveraine comme ‘nouveau’ cadre de réflexion pour une data stratégie nationale

Le récent rapport sur les données géographiques propose un nouvel angle d’analyse sur les données géographiques produites en France. Tout d’abord le rapport nous rappelle que les données géographiques sont extrêmement variées dans leurs sujets (description du territoire dans ses aspects physiques naturels, artificiels, visibles comme invisibles) mais ont pour particularité de toutes trouver leur utilité directement de leur composante géolocalisée. Le rapport propose une distinction intéressante entre les données “socle” (fond de carte agnostique des usages qui en seront faits) et les données “métier” (données géographiques produites dans le cadre d’un besoin d’une mission).

Le rapport a également pour grand intérêt de proposer une définition très claire de ce qui donne à une donnée géographique un caractère souverain. Cette souveraineté sera tirée de l’ indépendance dans laquelle elle est produite (le rapport n’exclut pas que la donnée puisse être produite par un tiers privé mais insiste sur le fait que la puissance publique doit pouvoir maîtriser techniquement la production, de l’acquisition jusqu’au stockage de l’information) et de l’ autorité qu’aura la donnée géographique diffusée par l’Etat. Par autorité, le rapport reprend assez largement le travail de l’administrateur général des données dans le rapport sur les données de référence ; l’autorité d’une donnée ne se décrète pas, elle doit faire autorité, c’est-à-dire s’imposer d’elle-même par sa qualité et sa réutilisabilité (à ce titre on pourrait être surpris de ne pas retrouver la mention aux principes FAIR dans ce rapport). C’est sur cette dimension d’autorité qu’une partie conséquente du rapport va s’appuyer pour défendre plusieurs propositions.

Les propositions fortes du rapport

Créer un point d’accès unique à la donnée géographique sous la responsabilité de l’IGN

La création d’un guichet unique de la donnée est un pari qui a été entrepris à plusieurs reprises à travers des projets comme le Géoportail, le Géocatalogue ou encore geo.data.gouv.fr. Il est évident que chacun de ces points d’accès est très spécifique et beaucoup d’utilisateurs (nam.R inclu) témoignent des nombreuses barrières à l’entrée que chacune de ces plateformes représentent. La nouvelle mouture de geo.data.gouv.fr de cet été augmentera probablement le degré d’utilisabilité de cette plateforme qui s’impose progressivement comme référence. Le choix de désigner l’IGN comme porteur d’un nouveau portail peut être intéressant étant donné les ressources et compétences disponibles au sein de l’Institut, néanmoins on peut émettre certaines réserves sur ses capacités à imposer un nouvel espace qui fera autorité.

L’introduction de la DINSIC au sein du CNIG (en tant que secrétaire) et la faire siéger ès-qualité au conseil d’administration de l’IGN

La DINSIC rattachée au Premier Ministre pourrait prendre une place de plus en plus importante au sein de la gouvernance de l’IGN et du CNIG. Cela est justifié par le fait que la mission des producteurs de données géographiques se veut de plus en plus interministériel et non l’apanage des seuls ministères chargés du développement durable et de l’agriculture.

Transférer à l’IGN la mission topographique de la DGFiP

S’il ne s’agirait dans un premier que d’une évaluation des impacts techniques et organisationnels, juridiques et financiers d’un transfert des missions topographiques de la DGFiP à l’IGN, cette prise de décision pourrait être salutaire dans la perspective de la réalisation de la RPCU.

Mandater l’IGN pour jouer un rôle plus actif dans la réalisation du PCRS

Le Plan de Corps de Rue Simplifié est un projet fantastique initié en 2011. L’Afigéo a même déclaré lors des Géo Data Days (3-4 juillet 2018, Le Havre) qu’il faisait des émules à l’étranger par la grande qualité des spécifications dont il a fait l’objet. Dynamiser sa réalisation grâce à un mandat plus clair de l’IGN en tant que pilote permettrait de concrétiser ce superbe projet. L’IGN a déjà déclaré à l’occasion des Géo Data Days qu’il se positionnerait comme un acteur actif sur le PCRS 1.

Poser un principe de gratuité de la mise à disposition des données géographiques souveraines, imposer la licence ouverte comme unique référence

C’est peut-être sur cette dernière proposition que la rapport était le plus attendu. Celui-ci est ambitieux : il propose non seulement de garantir l’accès gratuit aux données géographiques souveraines mais de le faire dans des conditions permettant sa plus libre réutilisation (qui est permise par la licence ouverte, plus largement que la licence ODbL ou d’autres licences Share Alike). Ces principes d’accès libre et gratuit aux données géographiques souveraines pourra certainement permettre le développement d’un plus grand nombre de projets mobilisant des données géographiques de qualité. Cela aura également pour vertu de renforcer l’autorité des données dans la mesure où elles deviendraient plus aisément les données de référence d’un domaine. Néanmoins, il ne faut pas écarter le risque de la diminution en qualité des données géographiques souveraines si des manques venaient à subvenir, la réutilisation qui est au principe de l’autorité de la donnée souveraine ne pourra être effective que si la qualité des données et des métadonnées est maintenue ou améliorée.

Une Data Library au service de la valorisation des données externes

Open data issues des secteurs privé comme public, données des réseaux sociaux, plateformes de données privées, le web est une source infinie de données externes. Tous types de formats sont diffusés : tables de données, données géolocalisées, API, images ou encore textes. Comment tirer profit de toute cette valeur dans le cadre de processus Big Data ?

Avant de vous présenter la solution qui a été développée par nam.R, il est important de mieux en comprendre les activités. nam.R est un producteur de données qui a pour particularité de ne mobiliser dans ses processus Data Science que des données externes. Ce principe fondateur a pour avantage de ne pas faire dépendre nam.R des données de partenaires qui pourraient fixer des exclusivités quant à l’usage de leurs données. nam.R a développé une expertise dans les données concernant tous les secteurs de la transition écologique : développement des énergies renouvelables, opérations d’efficacité énergétique, smart grids, circuits courts… Ses équipes de Data Science exploitent des données géolocalisées mais aussi des images ou des données textuelles afin de construire à la maille la plus fine une information actionnable par une grande variété d’acteurs.

Dans la mesure où la donnée externe constitue l’unique source de données exploitée par nam.R, il est nécessaire d’en exploiter toute la richesse. C’est pourquoi la start up a entrepris de construire une base de connaissance structurée la plus vaste possible.

La première exigence de cette base était qu’elle soit exhaustive de toutes les sources de données structurées en France. Pour cela un travail important de recensement des sources open data et de données fermées a été crucial et suscite une veille permanente. A partir de cette liste, nam.R a pu développer des scrapers qui parcourent quotidiennement les pages de ces sites web. Ces scrapers téléchargent tous les datasets et en récupèrent de manière structurée toutes les métadonnées disponibles.

La seconde exigence était d’harmoniser l’information disponible sur chacune des bases de données afin de pouvoir les requêter de manière équitable. Pour cela, le développement de miners a été nécessaire. Les miners complètent les scrapers car ils parcourent les fichiers téléchargés en eux-mêmes. Ils tirent de nombreuses informations sur chacun des fichiers comme le nombre d’enregistrements, le nombre de variables, l’en-tête et le type de chacune des colonnes et (prochainement) même une thématique ou plusieurs à partir d’un traitement à base de Natural Language Processing.

Enfin, la troisième exigence est de parvenir à mettre en place un pipeline fluide d’intégration de la donnée externe au sein de processus de traitement en Machine Learning. La robustesse du pipeline repose sur sa capacité à s’adapter à la mise à jour de la donnée source. Alerté par les scrapers, le Data Scientist peut mettre à jour les bases de données qu’il traite en amont du flow. A court termes la Data Library sera en capacité de scorer les évolutions issues de la mise à jour du dataset. Si le schéma reste cohérent et que le nombre d’enregistrements n’est pas décuplé, le dataset peut automatiquement être mis à jour.

Les opportunités qu’ouvrent l’open data ou la multiplication des marketplaces de data doivent être saisis avec de nouveaux outils. La Data Library de nam.R cherche à être à la hauteur de ces enjeux. Toujours en phase de développement, cette Data Library a déjà de nombreux usages en interne. De premières exploitations ouvertes au public se feront en février dans le cadre de l’observatoire de l’open data que nam.R développe en partenariat avec l’association OpenData France, Etalab et la Cour des Comptes.