Towards Trans-Atlantic Interoperability: Scientific Research and Privacy Under the EU Data Protection Regulation
Written By: Natalie Kim Edited By: Alex Shank
Introduction Amidst heated debate and unprecedented lobbying in Brussels, European Union lawmakers are currently drafting a General Data Protection Regulation (“DPR”) to replace the outdated 1995 Data Protection Directive. The 1995 Directive has been criticized for being technologically outdated and cumbersome to follow. If enacted, the DPR will be among the toughest data protection laws in existence. Regardless of enaction, the DPR signifies a growing rift between EU and U.S. data protection ideals. This rift is an interoperability problem. Defined as the capacity to achieve effective compatibility across different systems, interoperability has outgrown its technological origins to encompass broader social, political and legal frameworks.[1] Interoperability in the legal frameworks regarding data protection has become crucial with the emergence of two trends: (1) a rise in the volume, speed and scope of sensitive data exchanged and (2) a fall in the geographic ties of data, which is often transferred across multiple countries. Without interoperability, privacy safeguards are compromised, risks of exploitation of privacy loopholes increase, and effective cross-jurisdictional operation of multinational actors is hampered. Interoperability in data protection protects both civil liberties and economic interests. Big data developments have played a major role in driving the first trend. While data-driven research has existed for centuries,[2] exponential increases in data storage and processing capacities have pushed analytics into the forefront. Proponents are trumpeting big data’s untapped innovative potential. A McKinsey report estimates that big data could save up to $300B annually in healthcare costs through improved administrative efficiency, cross-system coordination and reduced fraud. Mayer-Schönberger and Cukier claim that big data will transform traditional hypothesis-driven, sample-based research into discovery founded on unforeseen combinations of aggregate data.[3] The flood of often-sensitive consumer data, such as social security numbers and medical records, also brings with it its share of problems, such as rampant cybersecurity breaches and indiscriminate sale of consumer data to untrustworthy parties. Additionally, re-identification of supposedly anonymous data subjects has become all too easy.[4] The NSA Verizon scandals show the potential for government abuse of such a vast trove of data. The DPR’s wide territorial scope exemplifies the second trend. As set out in Article 3, the DPR covers EU data controllers and processors, as well as any processing of EU resident personal data, even if the controller is outside the Union. This means that non-European entities dealing with European clients’ data — including most tech companies, multinational corporations and international research organizations — would be affected by the DPR’s enactment. This comment suggests recommendations for mitigating some of the most significant obstacles to effective interoperability between the U.S. and EU data protection frameworks pertaining to scientific research. Not meant to be exhaustive, the recommendations are: (a) closing the researcher-status loophole; (b) appropriate emphasis on notice-and-consent; (c) better de-identification standards; and (d) closing gaps in the chain of liability. The Law “Historical, statistical and scientific research” is regulated by Article 83 of the DPR. Article 83(1) permits personal data processing only if (a) “these purposes cannot be otherwise fulfilled” or (b) “identifiable data . . . is kept separately from the other information as long as these purposes can be fulfilled in this manner.” Article 83(2) allows personal data publication only if (a) “the data subject has given consent”; (b) “the publication of personal data is necessary . . . insofar as the interests or the fundamental rights or freedoms of the data subject do not override these interests”; or (c) the “subject has made the data public.” These provisions are much clearer than the 1995 Directive’s Articles 13(2) and 32(3), which contained rudimentary exceptions for data access and notice-and-consent requirements for research, but largely left implementation to each member state’s discretion. The U.S. lacks overarching privacy laws and instead relies on a system of industry-specific laws. The Health Insurance Portability and Accountability Act (“HIPAA”) governs some, though not all,[5] of the data transfers for scientific research implicating privacy rights. The HIPAA defines research as “a systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to generalizable knowledge,”[6] and allows for “covered entities” to use “protected health information” (“PHI”) without data subject authorization as long as a “waiver of HIPAA Research Authorization” is obtained.[7] The HIPAA permits such waivers if they demonstrate that: (1) there is minimal risk to privacy; (2) the research could not be properly conducted without the PHI, and (3) the research could not practicably be conducted without the waiver.[8] In practice, academic researchers and data brokers alike easily obtain such waivers. Policy Recommendations A. Closing the Researcher-Status Loophole Currently, neither the DPR nor HIPAA provides adequate safeguards against corporate exploitation of researcher status. The DPR Article 83 makes no distinction among actors, containing only a catch-all category of “bodies conducting historical, statistical or scientific research.” While the HIPAA only allows “covered entities” to access health data, in practice it is remarkably easy to qualify as a covered entity — one subset category, the “health care clearinghouse,” includes most processors of health data.[9] A tiered system giving differing levels of access to different types of processors better distinguishes between data processing entities to reduce loophole exploitation and improve interoperability in this regard. The current ambiguities in the DPR and HIPAA’s classification schemes constitute an interoperability problem. Beyond the current legal discrepancies’ potential to confuse, the broad categorizations ignore the increasing complexity and diversification in the data-processing market. A spectrum of hybrid actors, including data brokers, de-identification services, analytics software companies and accrediting organizations, can both help and harm through research. Large tech companies like Google can release publicly beneficial research findings using capabilities and information inaccessible to non-profit actors. In contrast, third-party data analytics firms may appropriate publicly available data then sell them for behavioral advertising, insurance and credit determinations. A tiered system that distinguishes by research actors and their purposes could greatly reduce the current discrepancies that prevent greater interoperability. Although a discussion of the multiple ways to implement a tiered standard is beyond the scope of this paper, future legislation could consider (a) the nature of the research institution, (b) compatible use, and (c) appropriate data protection measures, especially if third-party processing of personally identifiable information (PII) is involved. A better accounting of the increasing complexity of the data processing and research markets represents one major benefit of a tiered system. A certification process would also foster greater security and accountability of PII. Many hybrid actors are new and specific to particular industries or regions, and their risks and benefits to data security are only incipiently being addressed. Better identification of criteria and certification procedures for research bodies would mitigate current confusion and the exploitation of loopholes by unaccountable, purely commercial actors. Closing the current loophole comes with its own drawbacks. First, if research categories are not tiered correctly, the tiered system may reduce innovation by preventing beneficial research, especially from hybrid actors who may fall in the unhappy middle. Based on such concerns, a coalition of research, non-profit and academic groups support maintenance of broad protections of data uses for research. Second, although a tiered system can avoid stifling publicly beneficial research by maintaining broad permission for research with appropriate safeguards, it escapes less effectively the drawbacks of added cost and inefficiency. B. Appropriate Emphasis on Notice-and-Consent Enforcing notice-and-consent through terms of service has been a centerpiece of information disclosure law on both sides of the Atlantic, but the EU and U.S. have taken opposite routes with the advent of big data. The DPR seeks stringent consent requirements through Article 83(2)(a), which requires consent for data publication.[10] While there are currently ways for researchers to circumvent this strict requirement,[11] amendments suggested by the LIBE committee may narrow these considerably. Wary of hindering beneficial research with oppressive consent requirements, the U.S. HIPAA adopted the opposite extreme, removing the consent requirement entirely in its 2002 amendment. Notice simplification and an emphasis on effective tracking rather than piecemeal consent are necessary first steps towards achieving meaningful data subject control. This diametric opposition is an interoperability problem and reflects policymakers’ struggle to preserve the power of consent over one’s information in an environment where that is increasingly difficult and infeasible. Increased digitization and processing capacity means that the consequences of privacy breaches are more serious than ever. At the same time, the scale and speed of data transfers effectively eliminate consent as a realistic safeguard for downstream data uses. For some types of research, such as longitudinal studies, obtaining consent at every stage is prohibitively difficult, at the cost of valuable innovation. Notice-and-consent frameworks that vary across countries introduce confusion in adherence, and at their worst, stimulate a race to the bottom where organizations follow the country with the lowest standards. Privacy policy simplification would greatly improve notice. Proponents of “layered notices” — shortened, plain-English versions of disclosures, with more complex “layers” accessible with a click — claim that layered notices would increase consumers’ awareness and responsibility for the policies they sign[12] and reduce data processors’ inadvertent violations of their own policies.[13] Given the current impossibility of achieving consent in every context, an important first step to regaining consumer data control is to focus on achieving meaningful data tracking. Applications such as BlueButton+, first developed by the Veterans’ Affairs Office in 2010, allow consumers to track their health data. These incipient attempts should be encouraged through increased public awareness and participation. Too often, lack of public awareness results in the quiet gutting of privacy rights, such as the disclosure requirements of the proposed but relatively unpublicized HITECH Act.[14] Institutionalized, national data-tracking frameworks may eventually resemble what FTC Commissioner Julie Brill calls “Reclaim Your Name”: a system in which data brokers could disclose data use and consumers could track and opt out of uses that they do not approve of. Some companies such as Acxiom have already announced such a system. These systems would be supplemented by “cradle-to-grave” accountability of data, reporting requirements and the appointment of data protection officers (or “algorithmists,” according to Mayer-Schönberger and Cukier).[15] Other experts, such as David Navetta, envision a do-not-collect option for consumers as a big-data analog to the existing do-not-track model. Effective tracking is the first step in addressing the loss of consumer consent. Emphasizing tracking mitigates the consent problem by allowing beneficial research while giving consumers a degree of control, allowing them to exercise at least after-the-fact opt-out rights. This works best when the practicality of consent is low and data subjects are unlikely to dispute use of their data, such as with Canada’s ARTEMIS system, which uses aggregate patient data to predict the onset of potentially fatal infections for newborns. By improving overall transparency and accountability for data processors, effective consumer data tracking both protects consumers’ rights and reinforces law enforcement’s efforts to minimize privacy breaches. Improved notice-and-consent efforts currently face major hurdles. The first is technological: a centralized directory may itself be vulnerable to security breaches. Opting out is incomplete because of the ease with which even gaps in large data sets can sometimes be used to identify missing individuals.[16] The second is political: data brokers and tech companies derive substantial revenue from the influx of consumer data. These commercial and research bodies will lobby heavily against any disruption of their revenue stream. Given their political clout, any meaningful change will likely come after a long, uphill political battle. C. Better De-Identification Standards Another prong in data protection regulation in both the EU and U.S. is de-identification, the process of removing PII from an aggregated data set. The DPR distinguishes three degrees of de-identification: (1) full anonymization (PII completely removed), (2) partial anonymization, or pseudonymization (PII partially removed),[17] and (3) personally identifiable information (PII visible). While data collected for non-research purposes is typically subject to stringent anonymization requirements under DPR Article 5(e),[18] research data are exempt so long as the research purposes “cannot be otherwise fulfilled.”[19] Superficially, U.S. laws seem similar to those in the DPR. The FTC permits use of PII as long as it is effectively de-identified. The HIPAA also allows publication of PII if it is properly de-identified but does not distinguish between research and non-research purposes.[20] Aiming for pseudonymization rather than complete anonymization achieves the right balance between technological practicality and consumer control. Differences in interpretation and enforcement make discrepancies in de-identification standards an interoperability problem. While EU authorities are likely to interpret and enforce tougher de-identification standards than the U.S., there has been very little enforcement of the stiff penalties outlined in the HIPAA. Discrepancies in de-identification standards may result in weak points in composite data sets, allowing for re-identification.[21] Opting out may not be enough; even gaps in data can be used to identify the nonparticipants.[22] With some data sets, no technical expertise is needed to re-identify the associated individuals.[23] Even more problematic, re-identification makes subsequent security breaches easier.[24] While technological barriers currently preclude anonymization, better pseudonymization practices are a reasonable goal. Fortunately, re-identification is among the better-recognized obstacles[25] to data protection, and many solutions have been proposed. The UK Information Commissioner’s Office advocates a “motivated intruder” test to assess the likelihood of re-identification for a given data set. A trusted third party could provide assistance if the data controller lacks the technical capacity to run the tests. The FTC is increasingly bringing data re-identification under its purview of deceptive trade practices, mandating in a recent report that a company publicly commit to de-identifying data.[26] Currently pending in Arizona District Court is FTC v. Wyndham, perhaps the first fully litigated FTC data protection case. In Wyndham, a two-year data security breach resulted in the exposure of hundreds of thousands of consumer credit card data. In a highly unusual move, Wyndham refused to settle. If the FTC prevails, the case could signal an era of more aggressive FTC enforcement of data protection failures. Continued investment into de-identification technologies is essential for both heightened data protection and global interoperability.[27] Better enforcement of and a greater emphasis on realistic pseudonymization have numerous advantages. In the short run, pseudonymization is more achievable, and it relieves research organizations from the impracticable (and at times unnecessary) obligations of achieving complete anonymization. Additionally, better pseudonymization reduces the burden on data controllers and subjects to rely on explicit notice-and-consent, which might not always be feasible. Technological limitations comprise most of pseudonymization’s drawbacks. One entity’s “motivated intruder” test could differ in scope and capacity from that of another, and ultimately, no test may hinder the most substantial threats. While some research groups claim that key-less pseudonymization sufficiently eliminates weaknesses, this is unlikely given the ease of re-identification and the difficulties with certain data sets, such as genetic data. At times, psudonymization may reduce the value of data sets that rely on identifiable data. D. Chain of liability Currently, data protection provisions regarding third party data processors are largely left to individual contracting. The DPR obligates data processors to be bound under contract or Binding Corporate Rules (“BCR”). If processors engage in data use unforeseen by the data controller, they share liability as “joint data controllers.”[28] For outbound data transfers, the data controller provides collateral if the importer country does not have adequate data protection practices.[29] In the U.S., the FTC adds a “reasonable effort to anonymize” obligation for data controllers to prevent third-party re-identification.[30] While the HIPAA has the same contractual obligations, any third party can access information possessed by a covered entity for research purposes if the third party enters a data-use agreement containing certain compulsory provisions.[31] Appropriate continuation and standardization of these minimal contracting provisions will reduce interoperability problems and third-party misuse of PII. Differing regulatory standards for third-party liability are an interoperability problem. With large data organizations transferring large volumes of data, a lack of standardization across an organization’s multiple contracts might mean that third-party breaches are not easily preventable or even identifiable.[32] Ad hoc contracting reflects an outdated paradigm of data transfers, where data collection involved solely the collector and subject. Increasing complexity arising from new actors and a growing division of labor in data processing now necessitates firmer safeguards against security lapses in these second and third-order steps. Mandating minimal contracting standards and liability provisions for third parties and subcontractors improves accountability and strengthens the data subject’s rights in the event of a breach or unwanted disclosure.[33] At a minimum, such mandates should specifically account for the differences between different industries and research models. Effective data tracking is also immensely helpful here, not only in supplementing traditional notice-and-consent frameworks but also in allowing easier identification of security breaches and their resolution. The effectiveness of contractual implementation could be monitored through independent auditing, privacy impact assessments, and data protection officers.[34] Minimal contracting standards allow for greater flexibility for industry-specific self-regulation, which is critical given the U.S.’ patchwork system. Under the 1995 Data Protection Directive Article 25(6), the EU often used safe harbors, typically bilateral agreements that allow for entities from an entire country to bypass certain provisions if they have passed an “adequacy test.” This often resulted in low enforcement and accountability for individual entities in these countries. For this reason, the DPR moves away from safe harbor bilateral arrangements and towards industry-specific BCR contracting, both heightening the accountability of and reducing the burden on research organizations faced with incompatible regulation.[35] The main difficulty with minimal contractual provisions is achieving an optimal balance of responsibility for both data protection and business. For example, burdening controllers of outbound data with too many administrative hurdles may discourage international research and development. As the physical location of data becomes less and less meaningful, jurisdictional issues could prolong data breach litigation. In some cases, enforcement may be difficult or infeasible. Whether to leave liability distribution to individual contracting or to designate statutory minimums will also be a heated question. Whichever body is given the power to allocate liability and damages will have great influence on how incentives are structured in the data market. Conclusion Regardless of the whether the DPR is enacted, the current transatlantic rifts in data protection systems will continue to hamper effective interoperability. However, data protection policies such as minimum contracting standards to reduce third-party liability loopholes, strengthened de-identification techniques, and enhanced data tracking measures can mitigate the problem. Effective interoperability is beneficial not just for consumer privacy, but also for continued innovation, revenue generation and better cross-jurisdictional coordination.
[1] What is Interoperability?, Network Centric Operations Industry Consortium, https://www.ncoic.org/technology/what_is_interoperability (last visited on June 24, 2013).
[2] Victor Mayer-Schönberger & Kenneth Cukier, Big Data: A Revolution that will Transform how we Live, Work, and Think, 21–23 (Houghton Mifflin Harcourt, 2013).
[3] Id. at 55.
[4] Id. at 8–9.
[5] See, e.g., Fair Credit Reporting Act, 15 U.S.C. § 1681 et seq. (2003) (governing disclosures of consumer credit data), Gramm-Leach-Bliley Act, Pub.L. 106–102 (1999) (governing financial institutions).
[6] 45 C.F.R. 164.501 (2002).
[7] 45 C.F.R. 164.512(c)(2)(ii) (2002).
[8] Id.
[9] 45 C.F.R. 106.102 (2002). See infra note 16.
[10] General Data Protection Regulation, European Commission, 2012/0011, Article 83(2)(a). Article 7 requires written consent to use the specific data to be disclosed, with the option to withdraw consent at any time. General Data Protection Regulation, Article 7.
[11] See, e.g., Id. Article 83(2)(b) (“the publication of personal data is necessary to present research findings . . . insofar as the interests or the fundamental rights or freedoms of the data subject do not override those interests”) and Article 83(1)(a) (“these purposes cannot be otherwise fulfilled”).
[12] Christine Porter, De-Identified Data and Third Party Data Mining: the Risk of Re-Identification of Personal Information, 5 Shidler J.L. Com. & Tech. 3, 16 (2008), available at http://digital.law.washington.edu/dspace-law/bitstream/handle/1773.1/417/vol5_no1_art3.pdf.
[13] Id. at 14.
[14] Disclosure requirements are limited to electronically stored information, up to three years before the date of the disclosure request. Health Information Technology for Economic and Clinical Health Act, Pub.L. 111–5, § 13405 (c)(1)(B) (2009).
[15] Commissioner Julie Brill, Reclaim Your Name, 23rd Computers Freedom and Privacy Conference Keynote address, Washington, D.C. (June 26, 2013), transcript available at http://www.ftc.gov/speeches/brill/130626computersfreedom.pdf (last visited July 8, 2013).
[16] Challenges and Opportunities with Big Data, Computing Research Ass’n, http://www.cra.org/ccc/files/docs/init/bigdatawhitepaper.pdf (last visited June 28, 2013).
[17] Various techniques (e.g. key-coding, rotating salts, encryption keys, and introduction of “noise”) are currently used to reduce the risk of re-identification. Article 29 Working Party, Opinion 03/13 on Purpose Limitation, at 31, 00569/13/EN, http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2013/wp203_en.pdf.
[18] This principle also informs other data subject rights such as the right of access (15), right to be forgotten (17), and right to object (19–20). General Data Protection Regulation, Article 5(e), 15, 17, 19–20.
[19] Id. Article 83(1)(a). While (1)(b) requires separate safekeeping of PII from other data, this too can be waived if the research purpose cannot be otherwise fulfilled.
[20] 45 C.F.R. 164.502(d) (2008). Other laws such as the Gramm-Leach-Bliley Act explicitly state that anonymized data is not covered by the statute. 16 C.F.R. 313.3(o)(2)(ii)(B) (2008).
[21] Khaled El Emam et al., A Systematic Review of Re-identification Attacks on Health Databases, PLoS ONE 6(12) (2011), http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0028071 (finding that the re-identification rates were dominated by smaller studies that had not followed proper de-identification methods).
[22] Cf. Challenges and Opportunities with Big Data, supra note 16.
[23] After AOL accidentally released their search records in 2006, a group of N.Y. Times reporters could re-identify an individual, Thelma Arnold, by her past searches. Porter, supra note 12, at 9.
[24] Id. at 12.
[25] Article 29 Working Party, supra note 17, at 32.
[26] Fed. Trade Comm’n, Protecting Consumer Privacy in an Era of Rapid Change: Recommendations for Businesses and Policymakers (Mar. 2012), at 21.
[27] See, e.g., Data Protection Officer Conference 2013, http://www.ico.org.uk/conference2013; IAPP Global Privacy Summit 2013, https://www.privacyassociation.org/events_and_programs/global_privacy_summit_2013.
[28] General Data Protection Regulation, Article 24.
[29] Article 29 Working Party, Explanatory Document on the Processor Binding Corporate Rules, at 4, 00658/13/EN, available at http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2013/wp204_en.pdf.
[30] Fed. Trade Comm’n, supra note 26, at 21.
[31] HIPAA Privacy Rule and its Impacts on Research, Nat’l Institutes of Health, http://privacyruleandresearch.nih.gov/pr_08.asp (last visited July 11, 2013) (describing how non-covered entities can still access PII in a limited data set with a data use agreement).
[32] Among the biggest data security breaches are those in third party or group databases, such as the Epsilon data breach. See Taylor Armerding, The 15 Worst Data Breaches of the 21st Century, CSO (Feb. 15, 2012), http://www.csoonline.com/article/700263/the-15-worst-data-security-breaches-of-the-21st-century.
[33] Article 29 Working Party, Opinion 05/2012 on Cloud Computing, at 21, 01037/12/EN, available at http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2012/wp196_en.pdf.
[34] Article 29 Working Party, Explanatory Document on the Processor Binding Corporate Rules, at 13.
[35] General Data Protection Regulation, Article 42–43.