IDC estimates that the volume of the digital universe will double every year from 4400 billion gigabytes to 44 billion in 2020. It took 10 years to achieve the first sequencing of human DNA. Nowadays specialized companies are able to do this in five days.
Most states have grasped the importance of the subject and are giving themselves the means to enter the race for big data. Public research programs are funded in a meaningful way. The United States endowed the Big Data Research and Development Initiative with a budget of $200 million. France has released 25 million Euros for Big Data research. Big data is one of the priorities of the 7th R & D framework program of the European Union.
Scientific projects are a great boost for research in data processing. The astrophysics programs are a good example. In comparison, during the eight years of the Sloan Digital Sky (200-2008), 140 terabytes of images were collected. Once set up in 2020, the Large Synoptic Survey Telescope (LSST) will take five days to achieve the same result. According to LSST astronomers, current technologies would take more than 10 years to analyze the images and data produced by the program. The main lines of research will concern the storage, exploitation and sharing of this information, as well as the collaboration of the entities involved.
In France, the Mastodons project of CNRS began work in 2012. It supports interdisciplinary projects that will study the algorithms, methodologies and infrastructures necessary for storing, processing, analyzing, visualizing but also protecting mega data.
CASD, a French solution for sharing data between private stakeholders and researchers
Carried out by Kamel Gadouche, this project for the sharing of data for research purposes was born in 2010. Its objective was to open access to INSEE data more widely to researchers but also to certain ministries and Stakeholder data, particularly in the areas of banking, insurance and energy. Using sensitive and confidential data, it had to meet very high security constraints. Two challenges have also arisen to its bearer: convince on the one hand private data-producing companies to share them with maximum security, and on the other hand researchers to use this program thanks to ergonomics and simplicity of use. Two challenges were identified as RTE, Generali and La Poste quickly became contributors. Designed before the Big Data surge, CASD has acquired since 2013 the ability to process mega data with the addition of tools like Hadoop or Spark.
Concretely, the CASD is a secure space for consultation and data processing. Publication of results is subject to strict rules and mutual agreements between data producers (who decide the conditions of publication) and their users. The CNIL is also involved in the process.
CASD is accessible through a casing coupled with a personalized bimetric smart card for each user. “Its installation must meet strict security requirements and a demanding infrastructure, contractually binding each stakeholder. A “bubble” system creates a complete insulation of the limp and its user, operating in closed circuit, without contact with the outside from the moment the user entered the platform. (Source: Big Data directory 2015-2016) This technical choice was motivated by a calculation of the maintenance costs: a software solution presented too many risks and indirect costs. The solution ensures a very high level of safety, the 350 boxes (for 1000 users) require only 4 technicians dedicated to their maintenance.
CASD is therefore a great French success that is exported: it is working with the European Union to create a common infrastructure for data sharing.