A recent decision of the Federal Court of Canada exposes the tensions between access to information and privacy in our data society. It also provides important insights into how reidentification risk should be assessed when government agencies or departments respond to requests for datasets with the potential to reveal personal information.
The case involved a challenge by two journalists to Health Canada’s refusal to disclose certain data elements in a dataset of persons permitted to grow medical marijuana for personal use under the licensing scheme that existed before the legalization of cannabis. [See journalist Molly Hayes’ report on the story here]. Health Canada had agreed to provide the first character of the Forward Sortation Area (FSA) of the postal codes of licensed premises but declined to provide the second and third characters or the names of the cities in which licensed production took place. At issue was whether these location data constituted “personal information” – which the government cannot disclose under s. 19(1) of the Access to Information Act (ATIA). A second issue was the degree of effort required of a government department or agency to maximize the release of information in a privacy-protective way. Essentially, this case is about “the appropriate analytical approach to measuring privacy risks in relation to the release of information from structured datasets that contain personal information” (at para 2).
The licensing scheme was available to those who wished to grow their own marijuana for medical purposes or to anyone seeking to be a “designated producer” for a person in need of medical marijuana. Part of the licence application required the disclosure of the medical condition that justified the use of medical marijuana. Where a personal supply of medical marijuana is grown at the user’s home, location information could easily be linked to that individual. Both parties agreed that the last three characters in a six-character postal code would make it too easy to identify individuals. The dispute concerned the first three characters – the FSA. The first character represents a postal district. For example, Ontario, Canada’s largest province, has five postal districts. The second character indicates whether an area within the district is urban or rural. The third character identifies either a “specific rural region, an entire medium-sized city, or a section of a major city” (at para 12). FSAs differ in size; StatCan data from 2016 indicated that populations in FSAs ranged from no inhabitants to over 130,000.
Information about medical marijuana and its production in a rapidly evolving public policy context is a subject in which there is a public interest. In fact, Health Canada proactively publishes some data on its own website regarding the production and use of medical marijuana. Yet, even where a government department or agency publishes data, members of the public can use the ATI system to request different or more specific data. This is what happened in this case.
In his decision, Justice Pentney emphasized that both access to information and the protection of privacy are fundamental rights. The right of access to government information, however, does not include a right to access the personal information of third parties. Personal information is defined in the ATIA as “information about an identifiable individual” (s. 3). This means that all that is required for information to be considered personal is that it can be used – alone or in combination with other information – to identify a specific individual. Justice Pentney reaffirmed that the test for personal information from Gordon v. Canada (Health) remains definitive. Information is personal information “where there is a serious possibility that an individual could be identified through the use of that information, alone or in combination with other available information.” (Gordon, at para 34, emphasis added). More recently, the Federal Court has defined a “serious possibility” as “a possibility that is greater than speculation or a ‘mere possibility', but does not need to reach the level of ‘more likely than not’” (Public Safety, at para 53).
Geographic information is strongly linked to reidentification. A street address is, in many cases, clearly personal information. However, city, town or even province of residence would only be personal information if it can be used in combination with other available data to link to a specific individual. In Gordon, the Federal Court upheld a decision to not release province of residence data for those who had suffered reported adverse drug reactions because these data could be combined with other available data (including obituary notices and even the observations of ‘nosy neighbors’) to identify specific individuals.
The Information Commissioner argued that to meet the ‘serious possibility’ test, Health Canada should be able to concretely demonstrate identifiability by connecting the dots between the data and specific individuals. Justice Pentney disagreed, noting that in the case before him, the expert opinion combined with evidence about other available data and the highly sensitive nature of the information at issue made proof of actual linkages unnecessary. However, he cautioned that “in future cases, the failure to engage in such an exercise might well tip the balance in favour of disclosure” (at para 133).
Justice Pentney also ruled that, because the proceeding before the Federal Court is a hearing de novo, he was not limited to considering the data that were available at the time of the ATIP request. A court can take into account data made available after the request and even after the decision of the Information Commissioner. This makes sense. The rapidly growing availability of new datasets as well as new tools for the analysis and dissemination of data demand a timelier assessment of identifiability. Nevertheless, any pending or possible future ATI requests would be irrelevant to assessing reidentification risk, since these would be hypothetical. Justice Pentney noted: “The fact that a more complete mosaic may be created by future releases is both true and irrelevant, because Health Canada has an ongoing obligation to assess the risks, and if at some future point it concludes that the accumulation of information released created a serious risk, it could refuse to disclose the information that tipped the balance” (at para 112).
The court ultimately agreed with Health Canada that disclosing anything beyond the first character of the FSA could lead to the identification of some individuals within the dataset, and thus would amount to personal information. Health Canada had identified three categories of other available data: data that it had proactively published on its own website; StatCan data about population counts and FSAs; and publicly available data that included data released in response to previous ATIP requests relating to medical marijuana. In this latter category the court noted that there had been a considerable number of prior requests that provided various categories of data, including “type of license, medical condition (with rare conditions removed), dosage, and the issue date of the licence” (at para 64). Other released data included the licensee’s “year of birth, dosage, sex, medical condition (rare conditions removed), and province (city removed)” (at para 64). Once released, these data are in the public domain, and can contribute to a “mosaic effect” which allows data to be combined in ways that might ultimately identify specific individuals. Health Canada had provided evidence of an interactive map of Canada published on the internet that showed the licensing of medical marijuana by FSA between 2001 and 2007. Justice Pentney noted that “[a]n Edmonton Journal article about the interactive map provided a link to a database that allowed users to search by medical condition, postal code, doctor’s speciality, daily dosage, and allowed storage of marijuana” (at para 66). He stated: “the existence of evidence demonstrating that connections among disparate pieces of relevant information have previously been made and that the results have been made available to the public is a relevant consideration in applying the serious possibility test” (at para 109). Justice Pentney observed that members of the public might already have knowledge (such as the age, gender or address) of persons they know who consume marijuana that they might combine with other released data to learn about the person’s underlying medical condition. Further, he notes that “the pattern of requests and the existence of the interactive map show a certain motivation to glean more information about the administration of the licensing regime” (at para 144).
Health Canada had commissioned Dr Khaled El Emam to produce and expert report. Dr. El Emam determined that “there are a number of FSAs that are high risk if either three or two characters of the FSA are released, there are no high-risk FSAs if only the first character is released” (at para 80). Relying on this evidence, Justice Pentney concluded that “releasing more than the first character of an FSA creates a significantly greater risk of reidentification” (at para 157). This risk would meet the “serious possibility” threshold, and therefore the information amounts to “personal information” and cannot be disclosed under the legislation.
The Information Commissioner raised issues about the quality of other available data, suggesting that incomplete and outdated datasets would be less likely to create reidentification risk. For example, since cannabis laws had changed, there are now many more people cultivating marijuana for personal use. This would make it harder to connect the knowledge that a particular person was cultivating marijuana with other data that might lead to the disclosure of a medical condition. Justice Pentney was unconvinced since the quantities of marijuana required for ongoing medical use might exceed the general personal use amounts, and thus would still require a licence, creating continuity in the medical cannabis licensing data before and after the legalization of cannabis. He noted: “The key point is not that the data is statistically comparable for the purposes of scientific or social science research. Rather, the question is whether there is a significant possibility that this data can be combined to identify particular individuals.” (at para 118) Justice Pentney therefore distinguishes between the issue of data quality from a data science perspective and data quality from the perspective of someone seeking to identify specific individuals. He stated: “the fact that the datasets may not be exactly comparable might be a problem for a statistician or social scientist, but it is not an impediment to a motivated user seeking to identify a person who was licensed for personal production or a designated producer under the medical marijuana licensing regime” (at para 119).
Justice Pentney emphasized the relationship between sensitivity of information and reidentification risk, noting that “the type of personal information in question is a central concern for this type of analysis” (at para 107). This is because “the disclosure of some particularly sensitive types of personal information can be expected to have particularly devastating consequences” (at para 107). With highly sensitive information, it is important to reduce reidentification risk, which means limiting disclosure “as much as is feasible” (at para 108).
Justice Pentney also dealt with a further argument that Health Canada should not be able to apply the same risk assessment to all the FSA data; rather, it should assess reidentification risk based on the size of the area identified by the different FSA characters. The legislation allows for severance of information from disclosed records, and the journalists argued that Health Canada could have used severance to reduce the risk of reidentification while releasing more data where the risks were acceptably low. Health Canada responded that to do a more fine-grained analysis of the reidentification risk by FSA would impose an undue burden because of the complexity of the task. In its submissions as intervenor in the case, the Office of the Privacy Commissioner suggested that other techniques could be used to perturb the data so as to significantly lower the risk of reidentification. Such techniques are used, for example, where data are anonymized.
Justice Pentney noted that the effort required by a government department or agency was a matter of proportionality. Here, the data at issue were highly sensitive. The already-disclosed first character of the FSA provided general location information about the licences. Given these facts, “[t]he question is whether a further narrowing of the lens would bring significant benefits, given the effort that doing so would require” (at para 181). He concluded that it would not, noting the lack of in-house expertise at Health Canada to carry out such a complex task. Regarding the suggestion of the Privacy Commissioner that anonymization techniques should be applied, he found that while this is not precluded by the ATIA, it was a complex task that, on the facts before him, went beyond what the law requires in terms of severance.
This is an interesting and important decision. First, it reaffirms the test for ‘personal information’ in a more complex data society context than the earlier jurisprudence. Second, it makes clear that the sensitivity of the information at issue is a crucial factor that will influence an assessment not just of the reidentification risk, but of tolerance for the level of risk involved. This is entirely appropriate. Not only is personal health information highly sensitive, at the time these data were collected, licensing was an important means of gaining access to medical marijuana for people suffering from serious and ongoing medical issues. Their sharing of data with the government was driven by their need and vulnerability. Failure to robustly protect these data would enhance vulnerability. The decision also clarifies the evidentiary burden on government to demonstrate reidentification risk – something that will vary according to the sensitivity of the data. It highlights the dynamic and iterative nature of reidentification risk assessment as the risk will change as more data are made available.
Indirectly, the decision also casts light on the challenges of using the ATI system to access data and perhaps a need to overhaul that system to provide better access to high-quality public-sector information for research and other purposes. Although Health Canada has engaged in proactive disclosure (interestingly, such disclosures were a factor in assessing the ‘other available data’ that could lead to reidentification in this case), more should be done by governments (both federal and provincial) to support and ensure proactive disclosure that better meets the needs of data users while properly protecting privacy. Done properly, this would require an investment in capacity and infrastructure, as well as legislative reform.