Can big data be official?

At the Renyi Hour on November 13th 2014, Frederic Udina gave a talk on big data and official statistics. Apart from being a professor at UPF and BGSE, Frederic is Director of IDESCAT, the statistical institute of Catalonia.

Frederic Udina presenting to BGSE Data Science students

Frederic Udina presenting to BGSE Data Science students

In his talk, Frederic compared the “traditional” official statistics – slow to produce, with well-defined privacy limits and access rights – to “big data”, which is fast to produce, volatile and with fuzzy privacy limits. Frederic highlighted the tension between these two worlds, focusing particularly on the need for official statistics to become easier to collect, organise and customise to the need of the final user. In particular, Frederic identified the opportunity for IDESCAT (and other statistical institutes) to integrate the officially collected information with alternative information sources, such as:

  • Administrative data
  • Data freely available from the society
  • Data from private companies

Frederic outlined IDESCAT’s plan to move away from the current data generation system (the ‘stove pipe model’) which is slow, expensive and inefficient as it does not re-use information already collected, towards a fully integrated model (‘Plataforma Cerdà’) where any new information needs to be integrated with existing data.

The Renyi hour crowd

The Renyi hour crowd

Frederic noted that data is becoming increasingly important in society, and this is beginning to be recognised by official statistical institution. In particular, Frederic discussed the Royal Statistical Society’s Data manifesto where the RSS notes that data is:

  • A key tool for better, informed policy-making
  • A way to strengthen democracy and trust
  • A driver of prosperity.
The Royal Statistical Society Data Manifesto

The Royal Statistical Society Data Manifesto

Frederic also stressed the importance of confidentiality and privacy issues with regards to data availability. While it is desirable for some data to be freely available to the public, confidentiality and privacy should always be protected. However, it is important to strike the right balance between access and privacy, ensuring that while personal sensitive data is protected, important information is not prevented from being used in ways that may ultimately help the wider society. Personal health records are a classic example of this.

Frederic concluded his talk by providing some example of national statistical authorities integrating official statistics with widely available information to carry out new interesting analysis. Examples include:

  • Production of origin/destination arrays between territorial units (usually municipalities) for working or studying reasons using trajectories of mobile phones (ISTAT, New Zealand Statistics)
  • Using Google Trends to estimate/predict labour market, monthly forecast, small-area estimation (ISTAT)
  • Measuring use of TCI in firms, by using web scraping and text mining techniques
Lunch with Frederic after his talk

Lunch with Frederic after his talk

Useful links:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s