Thursday, March 10, 2011

The Nobility of Primary Data Collection

Many of our colleagues here are involved in collecting data of varying degrees of depth, ranging from detailed childhood trials to student surveys to longitudinal aging surveys. Most of my published papers have been based on data from surveys that I designed and played a heavy role in creating. Though I have worked on a number of secondary data files and have at least formed a sense of the relative costs and benefits of collecting data.

One discussion that comes up sometimes, particularly among colleagues in psychology and epidemiology, is whether the incentives are sufficient for researchers to collect and create data. Plain and simply, one school of thought is that people who just use secondary data without being involved in the process of designing and collecting the data are free-riding from other efforts. A related view to this is that data from major studies should only be released after the researchers responsible for creating the data have written the core papers. I know people who think this lag should be very long so as to ensure there are career benefits from actually collecting data.

Another view, closer to my sense, is that the dissemination and use of secondary data is healthy. In particular, the creation of large data-sets like HRS, ELSA, SHARE etc., that are publicly available provides a common scientific infrastructure that allows researchers to focus on analysis and modeling. I have certainly seen many situations where PhD students from other disciplines think automatically about collecting new data when they could get farther by analysing existing and publicly available secondary data. The rapid dissemination of secondary data obviously provides accountability also in terms of people's results and many journals now require data and code to be made available as part of the publication process.

The development of the LISS panel in the Netherlands comes close to the ideal situation. A professional team of researchers, backed by a scientific advisory team, takes applications from academic researchers and will host their surveys or experiments on a nationally representative panel free of charge provided they agree to allow the data to be made fully available to other researchers. This allows full sharing and use of data, and also makes sure that people's results are fully open to public scrutiny and replication. The fact that all of the collection is being conducted by one team allows considerable scope for specialisation and development of advanced rigour in the sampling and collection processes.

