In a recent blog post I wrote about the issues raised by the mapping of public information. The issue that prompted this blog post was the creation, by the Journal News of New York State, of a map featuring the names and addresses of all gun permit holders in two counties. The map prompted outrage although it merely represented data made available to the newspaper on an access to information request.
A recent development in the story highlights another issue both with open data and with the mapping of public information. The Journal News reports that a substantial amount of the posted information was inaccurate. Apparently this was attributable to the fact that one of the two counties at issue did not require permit renewals, and thus contained a significant amount of outdated information. In fact, the data for this county was only about 25% accurate. The other county required renewals every five years, which made the data more current, though not entirely up-to-date.
The open data movement promises significant social and economic benefits. Making government data freely available in appropriate formats for reuse is meant to increase government transparency and accountability, and to provide individuals and the private sector with raw data for research or innovation. Many already use such information to create useful apps, or to develop information maps that place government data in an interactive and accessible geographic context.
One of the challenges, however, is ensuring that the data sets provided by government are accurate, complete and fit for the purpose to which they are put. Not only must governments ensure that they are providing current data and appropriate updates, they must also include the meta data necessary for users to understand the scope and limitations of the data set.
Where the data includes personal information (including home addresses) it would seem that the onus should be even higher on governments to ensure that the information being provided is current, or that the limitations of the data set are clearly identified. Of course, there is also an onus on the party using the information to ensure that they understand the limits of the data set.