Big data: opportunities and challenges
Opportunities...
Massive data availability, as well as the chances of integrating information from multiple sources, are rapidly growing in each of L'EMbeDS domain research areas. For instance:
- We use massive databases on patents and scientific publications, and combine them with firm-level microdata in empirical analyses to characterize technological paradigms and trajectories.
- We integrate several sources of micro-data (including online “scraping”) to produce large datasets on firms’ behavior and combine them with data from traditional sources and official statistics. We also use data on the interactions of different economic agents within networks (firms’ alliances, international trade, etc.) and massive data on financial transactions (prices and quantities), both for titles and for commodities.
- We use empirical data and data produced by large simulations to validate and calibrate, at the micro-and macro-level, Agent-Based Models (ABMs) that reproduce the evolution of economic systems. ABMs require specialized computational capabilities, as well as sophisticated statistical approaches to design simulation experiments and investigate structural and emergent properties.
- We investigate the impact of changes in climate, availability of natural resources and quality of ecosystems by integrating multiple sources of high temporal frequency and spatial resolution data on environmental parameters, environmental impacts of raw materials and production processes, economic impacts of catastrophic meteorological events, etc.
- We collect and analyze massive data on the Italian regional and national healthcare systems, and are augmenting these data through novel surveying technologies to measure patients/users’ experiences and outcomes.
- We collaborate with several Italian courts, building pipelines to pseudonymize texts, annotate and query legal materials, and construct tools to simplify access to legal records, automate case law research and assist judicial decisions.
… and Challenges
Exploiting these opportunities requires sophisticated modeling approaches and state-of-the-art statistical and computational tools. We work on many such approaches and tools, including non-linear dynamical systems; stochastic models; non-parametric and semi-parametric statistics; models and statistical methods for networks; machine learning techniques for emulating and validating ABMs; techniques for the computational assessment of statistical inferences and stability (e.g., resampling, augmentation and perturbation schemes); methods for the analysis of high-dimensional and structured data (e.g., dimension reduction, feature selection e feature screening algorithms, functional data analysis); methods for causal inference; etc.
Exploiting these opportunities also requires rigorous approaches to all legal and ethical aspects of data privacy, data protection, and data safety -- as well as regulatory innovation and governance frameworks for data and data-driven technologies. We work on EU data spaces, adjudication & decision-making, human centered Al & fundamental rights, digital markets & platforms, open science & open culture, cybersecurity, and the digitalization of PA and society.