Start website main content

Big Data: opportunities and challenges


Data availability, as well as the chances of integrating information from multiple sources, are rapidly growing in each of EMbeDS research areas. For instance:

  • Innovation and the 4th Industrial Revolution: contemporary databases on patents and scientific publications comprise millions of observations and hundreds of variables. Combined with firm-level microdata, these allow for detailed empirical analyses and characterizations of the technological paradigms and trajectories of Industry 4.0.
  • Industrial Dynamics, Firms Competitiveness and Financial Markets: several sources of micro-data (including online “scraping”) can be integrated and triangulated to produce very detailed datasets on firms’ behavior – which can then be combined with data from traditional sources and official statistics. Further, one can generate data that capture the interactions of different economic agents as networks (firms’ alliances, international trade, etc.). Finally, massive amounts of data are available on financial transactions (prices and quantities), both for titles and for commodities.
  • Economies as Evolving Complex Systems: big empirical data and data produced by massive simulations are used to validate and calibrate, at the micro-and macro-level, Agent-Based Models (ABMs) that reproduce the evolution of economic systems accounting for a large number of details and variables. ABMs require specialized computational capabilities, both in terms of hardware and in terms of software, as well as sophisticated statistical approaches to investigate their structural and emergent properties.
  • Social and Environmental Sustainability: changes in climate, in the availability of natural resources and in the quality of ecosystems can be investigated by integrating and analyzing high temporal frequency and spatial resolution data on environmental parameters, data on the environmental impacts of raw materials and production processes, data on the economic impacts of catastrophic meteorological events, etc. Concerning healthcare, we collect an enormous amount of data on the functioning and usage of Italian regional and national systems, and are augmenting these data through novel surveying technologies to measure patients/users’ experiences and outcomes.  

… and Challenges

Exploiting these opportunities to spur major research progress requires sophisticated modeling approaches and state-of-the-art statistical and computational tools to gather, integrate and analyze big data. We work on many such approaches and tools, including non-linear dynamical systems; stochastic models; non-parametric and semi-parametric statistics; models and statistical methods for networks; machine learning techniques for emulating and validating ABMs; techniques for the computational assessment of statistical inferences (e.g., resampling, random permutation and perturbation schemes, Monte Carlo simulations); and methods for the analysis of high-dimensional and structured data (e.g., dimension reduction, feature selection e feature screening algorithms). We also note that utilizing big data for research purposes, in many of EMbeDS areas, requires special care for issues of privacy, data protection, and data safety. Legal and ethical aspects related to these endeavors are part of the competencies represented in the DIRPOLIS Institute of the Sant’Anna School.