Teresa Scassa - Blog

Monday, 19 December 2016 08:52

Open licensing of real time data

Municipalities are under growing pressure to become “smart”. In other words, they will reap the benefits of sophisticated data analytics carried out on more and better data collected via sensors embedded throughout the urban environment. As municipalities embrace smart cities technology, a growing number of the new sensors will capture data in real time. Municipalities are also increasingly making their data open to developers and civil society alike. If municipal governments decide to make real-time data available as open data, what should an open real-time data license look like? This is a question Alexandra Diebel and I explore in a new paper just published in the Journal of e-Democracy.

Our paper looks at how ten North American public transit authorities (6 in the U.S. and 4 in Canada) currently make real-time GPS public transit data available as open data. We examine the licenses used by these municipalities both for static transit data (timetables, route data) and for real-time GPS data (for example data about where transit vehicles are along their routes in real-time). Our research reveals differences in how these types of data are licensed, even when both types of data are referred to as “open” data.

There is no complete consensus on the essential characteristics of open data. Nevertheless, most definitions require that to be open, data must be: (1) made available in a reusable format; (2) prepared according to certain standards; and (3) available under an open license with minimal restrictions or conditions imposed on reuse. In our paper, we focus on the third element – open licensing. To date, most of what has been written about open licensing in general and the licensing of open data in particular, has focused on the licensing of static data. Static data sets are typically downloaded through an open data portal in a one-time operation (although static data sets may still be periodically updated). By contrast, real-time data must be accessed on an ongoing basis and often at fairly short intervals such as every few seconds.

The need to access data from a host server at frequent intervals places a greater demand on the resources of the data custodian – in this case often cash-strapped municipalities or public agencies. The frequent access required may also present security challenges, as servers may be vulnerable to distributed denial-of-service attacks. In addition, where municipal governments or their agencies have negotiated with private sector companies for the hardware and software to collect and process real-time data, the contracts with those companies may require certain terms and conditions to find their way into open licenses. Each of these factors may have implications for how real-time data is made available as open data. The greater commercial value of real-time data may also motivate some public agencies to alter how they make such data available to the public.

While our paper focuses on real-time GPS public transit data, similar issues will likely arise in a variety of other contexts where ‘open’ real-time data are at issue. We consider how real-time data is licensed, and we identify additional terms and conditions that are imposed on users of ‘open’ real-time data. While some of these terms and conditions might be explained by the particular exigencies of real-time data (such as requirements to register for the API to access the data), others are more difficult to explain. Our paper concludes with some recommendations for the development of a standard for open real-time data licensing.

This paper is part of ongoing research carried out as part of Geothink, a partnership grant project funded by the Social Sciences and Humanities Research Council of Canada.


A 2016 European Commission report titled Survey report: data management in Citizen Science projects provides interesting insights into how such projects manage the data they collect. Proper management is, of course, essential to ensure that the collected data can be used and reused by project leaders as well as by other downstream users. It is relevant as well to the protection of the privacy of citizen participants. The authors of this report surveyed a large number of citizen science projects. From the 121 responses received they distilled findings that explore the diversity of the citizen science projects, and that reveal a troubling lack of thorough data management practices. A significant shortcoming for many projects was the lack of appropriate data licences to govern reuse of either raw or aggregate data collected.

There has been growing pressure on those carrying out research using public resources to make the fruits of the research – including the research data – publicly available for consultation, verification or reuse. But doing so is not as simple as a binary open/closed choice. There are a number of different questions that researchers must address: Should the raw data be made open or only the aggregate data? Should it be immediately available or available only after an embargo period? Is all data suitable for release or should some be protected for public policy reasons (such as protecting privacy)? And what, if any, terms and conditions should be imposed on reuse?

The authors of the EC report, Sven Schade and Chrysi Tsinaraki, found that overall there was a relatively high level of data sharing from citizen science projects. Significantly, 38% of the respondents to their survey provided access to their raw data; 37% provided access to aggregate data and 30% provided access to both. One interesting observation in this respect was that 68% of those respondents who provided access to their raw data also included within this dataset personal identifiers of citizen contributors to the project. Such data might be advertently collected, as where individuals provide personal information with their data uploads. In some cases, the scope of personal information might be significant. Contributions to a project might include geolocation information and geodemographic details. Schade and Tsinaraki asked respondents about their practices when it came to obtaining informed consent to data collection from project participants; they found that 25% of respondents did not obtain such consent whereas 53% relied upon a generic terms of use document to obtain consent. It was not entirely clear whether the consent being sought related to privacy issues or to obtaining any necessary rights to use or disseminate the data being collected (which might, for example, include copyright protected photographs). In any event, the results of the survey suggest that there is a significant lack of attention to both privacy and IP rights issues in citizen science projects.

On the issue of data licensing, Schade and Tsinaraki found that the conditions imposed on reuse by different projects varied. A majority of those who made data available believed that the data was in the public domain, while others imposed conditions such as non-commercial or share-alike restrictions. When asked which license they used to achieve these goals, 32 out of 56 respondents indicated that they used one of the commonly available template licences such as Creative Commons or Open Data Commons. A surprising number of respondents indicated that no particular licence was used. While data released in this way might be presumed to be “open”, the usefulness of the data might well be hampered by a lack of clarity regarding the scope of permitted reuse.

In addition to providing access to data, the authors of the Report asked whether citizen science researchers allowed open access to research results (presumably in the form of published papers and other output). While the overwhelming majority of projects indicated that they used open access options (ranging from public domain dedication to open access with conditions), Schade and Tsinaraki also found that 14 of the projects they considered used licences with terms that were not consistent with the reuse conditions that the researchers had identified. Clearly there is a need for greater support for projects in developing or choosing appropriate licences.

Although many of the projects indicated that they provided access to their data, the duration of that access was less certain. The authors found that 42% of projects intended to guarantee access to their data only within the lifespan of the project. The authors also found that 40% of projects that provide data access do not provide comprehensive metadata along with the data. This would certainly limit the value of the data for reuse. Both these issues are important in the context of citizen science projects, which are often granted-funded and temporally-limited. The ability to archive and preserve research data and to make it available for meaningful access and reuse should be part of researchers’ data management plans, and is something which should be supported by research institutions and funding agencies.

Overall, the Report provides data that suggests that the burgeoning field of citizen science needs more support when it comes to all aspects of data management. Proper data management practices will help citizen science researchers to meet their own objectives, to share their data effectively and appropriately, and to protect the rights and interests of participants.

Note: In 2015 I drafted a report, with Haewon Chung, for the Wilson Center Commons Lab titled Managing Intellectual Property Rights in Citizen Science. This report addresses many licensing issues related to the collection, sharing and reuse of citizen science data and outputs. It is available under a Creative Commons Licence.


Canadian Trademark Law

Published in 2015 by Lexis Nexis

Canadian Trademark Law 2d Edition

Buy on LexisNexis

Electronic Commerce and Internet Law in Canada, 2nd Edition

Published in 2012 by CCH Canadian Ltd.

Electronic Commerce and Internet Law in Canada

Buy on CCH Canadian

Intellectual Property for the 21st Century

Intellectual Property Law for the 21st Century:

Interdisciplinary Approaches

Purchase from Irwin Law