A Data Library to strengthen external data value

Private and public open data, social network data, private data platforms... The web is an infinite source of external data. It comes in a variety of formats: data tables, geolocated data, APIs, images and text. How can organisations take advantage of all the value to be gained from big data processing?

Private and public open data, social network data, private data platforms… The web is an infinite source of external data. It comes in a variety of formats: data tables, geolocated data, APIs, images and text. How can organisations take advantage of all the value to be gained from big data processing?

Before you hear about nam.R’s solutions, it’s important to understand a bit about what they do. nam.R is a data producer that uses only external data in its data science processes. This unique founding principle has one important advantage — no reliance on data from partners who enforce data exclusivity/protections preventing data use. nam.R has extensive expertise with data in every sector of the ecological transition: renewable energy development, energy efficiency operations, smart grids, short circuits . . . Its data science teams exploit not only geolocalised data, but also images and textual corpora to build an incredibly fine mesh of actionable information for a wide variety of actors.

Given that external data is nam.R’s only source of data, they focus on exploitation to the fullest extent. This is why the start-up has tasked itself with building the widest possible structured knowledge base.

The first requirement of this database was that it be comprehensive, drawing from every structured data source in France. Exhaustive research into open and closed data sources was crucial, and monitoring efforts are ongoing. nam.R developed scrapers that browse the pages of these sources on a daily basis. The scrapers download available datasets and retrieve the metadata in a structured way.

The second requirement was to harmonize the information available on each of the databases so that queries would be evenly distributed. This meant developing data mining tools that complete the work of the scrapers by browsing the downloaded files. The scrapers extract a vast array of information from each of the files: number of records, number of variables, column headers and types, and very soon they will reveal single or multiple themes thanks to an algorithm of Natural Language Processing.

Finally, the third requirement was to set up a fluid pipeline integrating external data into machine learning processes. The robustness of the pipeline is based on its ability to adapt to source data updates. Upon receiving an alert form the scrapers, the data scientist can update the databases upstream of the flow. In the short term, the Data Library will be able to score evolutions resulting from dataset updates. If the schema remains consistent and the number of records is not increased tenfold, the dataset will be automatically updated.

The open data movement and the multiplication of data marketplaces both present opportunities that can only be seized with new tools. The nam.R Data Library is equal to the challenge. Although the library is still in development, it already fulfils several internal functions. Its first public trial run will be in February as part of the open data observatory co-developed by nam.R, OpenData France, Etalab and the Cour des Comptes.

More articles

  • Cedric Villani visiting nam.R at Web Summit

    When writing his report AI for Humanity, giving meaning to artificial intelligence, Cédric Villani had auditioned nam.R and received a contribution that was included in the fourth part: “Artificial intelligence in the service of ecological transition”. In particular, Cédric Villani […]


    Read More
  • European AI night

    A special wink for Emmanuel Bacry co-founder of nam.R and Florian Douetteau Founder of our partnership Dataiku. Congratulations to all actors to make France an international reference in artificial intelligence Cedric O, Mohammed Adnène Trojette Bertrand Pailhès Eric Labaye Professor […]


    Read More
  • Advanced Aerial Imagery Analysis with Deep Neural Networks Explained in 5 Minutes.

    At nam.R we are working hard to build a Digital Twin of France, and to achieve that we use a lot of sources of information. One of the richest of them being aerial images. For us, humans, “reading” images is […]


    Read More
  • Developing a Complex Computer Vision System, a Case Study : Solar Panels Equipped Roofs.

    To detect all solar panels on the roofs of french buildings, we used aerial imagery and the known outlines of the buildings, through a pipeline consisting of a solar panel outline detector and a filtering algorithm. We took inspiration from […]


    Read More
  • Merging Geo-Spatial Data on Twin Polygons.

    To build a rich description of the object the real challenge is to join these pieces together. But in absence of a clear and coherent name or index, this process can become quite difficult and noisy. For example when trying […]


    Read More