In the New Zealand context, data is seen as taonga (something sacred, precious, or significant) (Whaanga et al. 2017). A taonga should be actively cared for in a manner that preserves its integrity and value. Health data is used in most health and disability research studies, as well as QI projects. Some of this data is prospectively collected for the purpose of research, but a growing proportion of data is collected through routine processes, for example through healthcare procedures or interaction with health agencies.
Data exists in both analogue (paper) and digital (electronic) formats. Increasing digitisation means data is being collected from both ‘traditional’ sources such as administrative data, electronic health records, as well as novel sources like apps, fitness trackers, cellular phones, social media (Internet of Things, IoT). Digital infrastructures are allowing person-level linkages between healthcare and non-healthcare data, allowing unique insight into the social determinants of health. This section adopts a broad definition of these data sources and is intended to encompass all sources and types of data described in these Standards as ‘Health Data’. These standards should be adhered to by all researchers who hold and use data within this broader context.
The life cycle of health data includes collection, use, analysis, publication, storage, curation, and destruction (Figure 12.1). This chapter provides ethical guidance on collecting new data from participants and/or individuals and accessing and reusing data that has already been collected (for example, from clinical records, other research projects or administrative data[1]).
Figure 12.1 – The life cycle of health data
Storing data often involves elements of security, governance and management, privacy, consent, and curation. Data can be used in a variety of ways: to explore concepts or answer the specific questions that prompted the collection of data in the first place; or, to explore concepts or answer questions formulated after the collection of data – this latter concept is referred to as “secondary use”. Data may be used for future studies and projects, including those which are unspecified, and data use may also occur through databanks (Data Registries). Lastly, data is disposed of: this disposal can take the form of destruction, or as is more often the case, either time-limited or indefinite archiving (for example, for regulatory compliance purposes).
Beyond these ethical standards, researchers must comply with current relevant standards for data governance and security.[2] At present, these include (1) the “Digital, data and technology services – minimum requirements”; (2) “HISO 10029: 2015 Health Information Security Framework”; and (3) “HISO 10064:2017 Health Information Governance Guidelines”, the latter of which highlights some key elements of data quality, privacy, privacy breach, and secondary use of data that are relevant to these standards. It is the obligation of researchers to ensure that they are up to date with current data privacy, governance, and security standards in New Zealand.
The following standards apply to both new data collection and re-use of existing data.
Māori data refers to data produced by Māori or that describes Māori and the environments they have relationships with. Māori data includes but is not limited to:
12.1 Māori should be involved in decisions about the primary collection, analysis, and interpretation of Māori data in research contexts.
12.2 Decisions about governance and access to data for secondary purposes should be consistent with the Māori Data Sovereignty principles, developed by Te Mana Raraunga[3] below. While these principles were developed for Māori data, their application to all health data is recommended, and reflects good practice.
There are a number of different levels of data identifiability and terms used to describe them.
12.3 Researchers must accurately describe the identifiability of data to obtain meaningful informed consent and to determine the ethical risk of their studies. HISO 10064:2017[4] describes the levels of data identifiability, Table 12.1.
Table 12.1 – Levels of data identifiability
Direct identifiers | Indirect identifiers |
---|---|
|
|
De-identified data | Anonymised data |
---|---|
|
A precise definition of anonymised data has become more difficult because methods to re-identify data are rapidly evolving. Researchers should assume that all data is potentially re-identifiable and maintain governance and guardianship to this standard.A minimal operational standard of anonymity should:
|
For the purposes of these Standards, data should be stored, utilised, and disposed of on the assumption that it is potentially re-identifiable.[5]
12.4 Researchers must identify and assess risks related to re-identification and implement measures to mitigate those risks though de-identification of data and obfuscation
12.4.a Data analysis involving data integration and linking may heighten risks of re-identification. Such a risk is greater if the study relates to a population in a small geographical area, or to individuals with unique characteristics, where a large number of variables relate to an individual.
12.4.b Researchers should give special consideration to the question of whether data being retained for future use needs to be kept identifiable.
12.5 Whenever studies using re-identifiable data reveal information that affects the health and wellbeing of participants and/or individuals (see ‘Returning results and incidental findings’), researchers must consider how to make that information available to the participants and/or individuals, if the participants and/or individuals have consented to receiving such information.
Health data can generate benefits for individuals and the public both now and in the future. In some cases, it may be unethical not to use data because it may deny these benefits, and a failure to use it may also cause harm. Researchers must identify the possible benefits and risks of harm of data use, carefully balance them against each other, and consider how to minimise and mitigate any harms of data use.
The nature, degree, and likelihood of benefits resulting from studies is dependent on context, which researchers must consider every time they propose to use health data.
The nature, degree, and likelihood of possible harms resulting from studies also depends on context, which researchers must also consider every time they propose to use health data.
Table 12.2 lists some of the main types of potential harms from the use of health data.
Type of harm | Indicator |
---|---|
Physical harm |
|
Social harm |
|
Economic harm |
|
Psychological or emotional harm |
|
Legal harm |
|
Privacy harm |
|
Interpretation harm |
|
In light of these potential harms, the following general standards apply to the use of health data. More detailed standards also apply for some aspects, for example, the storage and protection of health data.
12.6 Researchers must justify health data use, recognising the ethical tension between respect for individuals or groups (according to principles such as privacy, confidentiality, dignity and autonomy) and beneficence (the advantages of generating new knowledge).
12.7 Researchers must identify the possible benefits and risks of harm of health data use, carefully balance them against each other, and consider how to minimise and mitigate any harms of data use.
12.7.a Studies involving health data should seek to minimise risks and maximise benefits. This applies to both prospectively collected data and previously collected data being used for a secondary purpose.
The principles of privacy and confidentiality apply to all health data at all points of the data lifecycle.[6]
12.8 Researchers must record and respect restrictions that participants and/or individuals place on the use of their health data.
12.9 Researchers must protect participants’ and/or individuals’ health data and must only use and disclose it to people authorised by those participants and/or individuals, unless:
12.10 Unauthorised disclosure plans should be in place that are compliant with HISO 10064:2017 Health Information Governance Guidelines[7] and the Privacy Act,[8] and adherent to organisational policies and procedures. This plan should include steps to reduce accidental disclosure and data breach, how to inform participants and/or individuals, as well as mitigation steps to limit the impact of accidental disclosure and data breach.
Data can be stored in analogue or digital form. Regardless of the form of storage, health data storage must meet the following standards:
12.11 Health data should be stored in a secure manner. Examples of secure storage include: locked file cabinets in locked rooms; password protected databases located on computers in locked rooms; password protected databases via password protected computers; etc.
12.12 Researchers should weigh the benefits and risks of keeping identifiers on stored data.
12.12.a In some cases, there will be good reasons to maintain an identifier, or a link to an identifier (e.g. to maintain participant and/or individual safety, or to re-use the data).
12.13 Data should not be stored longer than is required for the purposes for which the information may lawfully be used, but should be stored for the minimum period required by New Zealand law (currently 10 years for health data that relates to an identifiable individual).
Robust policies, processes, and procedures must be in place to manage data throughout its life cycle. This requires high-quality, transparent data governance and data management. Appropriate governance and management are especially important in cases where the consent requirement for data use has been waived, where there is data linking, or where unspecified future use is intended. Māori control of Māori data is the primary goal for Māori data sovereignty by improving Māori/iwi access to data for governance decision-making and ensuring Māori/iwi involvement in governance of data.
Data can be primarily collected by a researcher, but in the modern healthcare environment organisations are often the primary data source. This creates a tiered structure of overlapping responsibilities of data guardianship between, on the one hand, organisations that create, store, and allow access to data and, on the other hand, individual researchers who use this data, who may work within or outside the data source organisation.
12.14 Organisations must establish proportional, appropriate and robust data governance and data management policies and procedures during the life cycle of data.
12.14.a Relevant current standards to which organisations must adhere include the “Digital, data and technology services – minimum requirements[9]”, “HISO 10029: 2015 Health Information Security Framework[10]”, and “HISO 10064:2017 Health Information Governance Guidelines.[11]”
12.14.b This latter document highlights some key elements of data quality, privacy, privacy breach, and secondary use of data that are relevant to these standards.
12.14.c Issues an organisational data governance committee should consider include:
12.15 Researchers and or institutions utilising data must establish proportional, appropriate and robust data governance and data management processes during the life cycle of data. This should complement organisational governance and management structures, but do not supersede those requirements.
12.15.a Researchers and or institutions must describe these frameworks in their protocol or associated documents, and they should include:
The New Zealand government has adopted a Cloud First policy, requiring agencies to accelerate their adoption of public cloud services as it pertains to digital data.[14] This adoption is on a case-by-case basis following risk assessment. In the case of research data, storage of digital and analogue data is normative. Data storage security and privacy principles should pertain to both analogue and digital data storage.
Researchers should be aware that digitally-transmitted and stored data may pass through jurisdictions outside New Zealand and be stored in facilities outside New Zealand. Researchers should be aware that data stored outside New Zealand is governed by local standards of data security and privacy protection, which may vary depending on local legal standards. The cloud risk assessment process and tools provided by digital.govt.nz[15] provide principles of data security and access control that should be considered by researchers using overseas or cloud-based data storage.
Identifiable data may be sent overseas for the purposes of research if the person from whom the data was collected has consented to it or if a waiver of consent is granted.
12.16 Researchers should consider whether it is possible, appropriate and practical to seek consent to store data overseas.
12.16.a If consent is being sought, researchers must ensure participants and/or individuals understand that privacy protections in other countries may be different to those offered in New Zealand and that there may be no New Zealand representation on overseas organisations which make decisions about data use.
Non-identifiable data may be sent overseas without consent where, due to the nature of the information, it is not possible, appropriate or practical to seek consent.
12.17 Generally, the risk of sending non-identifiable data overseas is lower than sending identifiable data; however, in this case researchers should consider the fact that:
These standards are about collecting new information from individuals or communities.
12.18 Researchers must collect new data from participants and/or individuals in a manner that is lawful and fair, and that does not intrude to an unreasonable extent on the personal affairs of participants and/or individuals (Rule 4, HIPC).
12.19 Researchers should pay attention to participants and/or individuals’ preferences (e.g. they may wish to have whānau or family members present) and cultural sensitivities.
12.20 Determining whether a particular means of collection is unreasonably intrusive may depend on the context and sensitivity of the information. For example:
12.20.a information may be particularly sensitive where it relates to a person’s sexual life, ethnicity or HIV status; diseases or conditions carrying social stigma; mental health; life expectancy; or addiction.
12.20.b privacy may be at risk if the physical environment at the time of collection is a prison, rest home, school, educational institution, hospital or place of employment.
12.21 People collecting data must be suitably trained, experienced and culturally knowledgeable. If they are new researchers, they must be supported by a suitably trained, experienced and culturally knowledgeable person.
12.22 When collecting new data from participants and/or individuals, researchers must ensure that they are informed of, and consent to, the collection and use of their new data for the study.
12.22.a Note that this applies to the collection of new data. If a researcher is accessing identifiable or re-identifiable data that has already been collected, a waiver may be required.
12.23 Researchers must only collect data necessary for the specified purposes of their study.
12.24 Researchers must obtain consent from a participant and/or individual from whom data has been collected in a study (‘the original study’) to use that data for future studies.
12.25 Protocols, by design, should specify the category to which the future unspecified research falls under, and should provide adequate rationale as to how the risks and benefits justify the proposed future unspecified research. Additionally, protocols should specify if results arising for future research will be made available to the participant and/or individual.
Increasingly, data that has been collected for a specific purpose, for example clinical care or administrative data through government agencies, is re-used and or linked for health research. These standards are about re-using data. For a further resource on the ethical secondary use of existing data, see Stats NZ’s Five Safes framework(external link).
Table 12.3 below summarises key Māori concepts relevant to questions that help assess the level of sensitivity of the data, and the corresponding requirement to consult for re use, and the appropriate level of data management (Hudson et all 2017) [16].
12.26 Taking into account the table below, researchers should carefully consider whether they should undertake robust, active and ongoing engagement with relevant communities and stakeholders to establish whether the proposed data use is acceptable.
12.27 Any such engagement should be transparent and fair, done in good faith, be truthful, and consistent with the concepts and practice of the Te Ara Tika principles.
Concept | Characteristic | Assessment Question |
---|---|---|
Tapu | Level of sensitivity |
|
Noa | Level of accessibility |
|
Tika | Level of value |
|
Pono | Level of trust |
|
Mauri | Level of originality |
|
Wairua | Nature of the application |
|
Whakapapa | Level of relationship |
|
Pukenga | Level of expertise |
|
Kaitiaki | Level of authority |
|
Wānanga | Level of responsibility |
|
Gaining informed consent to use previously collected identifiable data (including linking) should always be the default starting point. Where researchers propose to use identifiable without specific consent for a study or project (e.g. where data was collected for care, or the proposed data use is not consistent with the scope of the original research consent), they must:
12.28 Satisfy national data standards and local data governance requirements.
12.29 Justify to an ethics committee that the nature, degree and likelihood of possible benefits (including to participant and/or individuals and the value of the research to the public) outweigh the nature, degree and likelihood of possible harms (including to any participant and/or individual, other individuals, whānau, hapū, iwi, Māori communities and any other groups or communities). In determining whether to grant a waiver of consent an Ethics Committees may also have regard to the following factors:
12.29.a There are scientific, practical, or ethical reasons why consent cannot be obtained.
12.29.b Appropriate data governance plans are in place.
12.29.c The researchers have identified whether consultation is required, and if required they have undertaken appropriate consultation with cultural or other relevant groups, and those consulted support the proposed use.
12.30 When considering a waiver, researchers should identify if there is any known or likely reason to expect that the participant and/or individual(s) would not have consented if they had been asked.
12.30.a It should be understood that a waiver of consent is not a waiver of responsibility, e.g. should there be an actionable incidental finding then it should be disclosed to the participant and/or individual.[17]
Data-linking is a technique for connecting pieces of information that are thought to relate to the same person, family, place or event. If these different pieces of information can be connected to a person in a way that does not breach their privacy or cause harm, linking them can create a rich resource for research to answer complex questions and improve health outcomes (Data Linkage Western Australia 2019).
When data sets are linked, the risks of identification and adverse public reaction are likely to be greater, especially when the different data sources (which may apply to individual people, households or organisations), may have been designed and collected without the intention of using them together. The process may give rise to concerns that the combined format produces a detailed picture of individuals that they did not consent to when they supplied the data. Privacy is a major consideration in data linkage work.
12.31 Researchers involved in data-linking must weigh the potential benefits of their research against the risk that individuals will be identifiable within their results. See ‘Benefits and harms from data use’ and ‘Research benefits and harms’.
12.32 Researchers must either seek consent from participants and/or individuals or obtain a waiver from a local data governance committee or an ethics committee for research that involves data-linking with identifiable and re-identifiable data.
12.33 Consent from participants and/or individuals or a waiver from an ethics committee is not required for use of linked non-identifiable data, but researchers should be aware of the type and size of data sets being linked, and how these factors increase the risk of identification.
12.33.a Data linked by a third party at the request of a researcher, but provided in a non-identifiable format, is a way of controlling risk of re-identification in research involving linkage.
12.33.b Use of linked data that has been rendered non-identifiable presents lower risks than linked identifiable or re-identifiable data; however, risks in relation to interpretation harms and re-identification remain, and researchers must consider them.
12.34 Researchers must respect any conditions concerning data-linking expressed within participants and/or individuals’ existing consent. In the absence of direct participant and/or individual consent, a waiver must be sought from an ethics committee.
12.35 The amount of data that is linked should be fit-for-purpose. Researchers must be able to justify re-use of requested data.
12.36 Researchers should be aware that if their research includes data linkage the methods by which that data was collected may result in systematic biases. This in turn may have implications for the validity of the research results.
12.36.a Researchers should consider these limitations when designing their research and mitigate the impacts of these biases where possible. They should also be recognised when reporting research results.
12.37 Researchers should account for the destruction of any linked data. If an explicit destruction plan is not specified, then the rationale for archiving should be provided. Any long-term data storage must adhere to local data governance, national standards, and law as applicable.
12.37.a In considering how long to hold linked data, researchers must undertake a balancing exercise between the advantages of the robustness of data linkage and the ability to validate data linkage and protection of privacy, and benefits of re-use of data.
12.37.b Researchers should be prepared to provide local data governance committees (for example, a research office at a DHB) or ethics committees with a detailed plan of linked data storage, an accounting of the risks of storage, and plans to mitigate the risk of storage.
12.38 Researchers must work within established organisational governance structures, as well as develop specific data management plans that ensure the data is being accessed and linked in an appropriate and responsible manner.
12.39 Researchers must address the privacy risks of linking data by analysing the primary and secondary uses of the data, considering not just re-identification risks but also inference risks.
12.39.a Analysis should take into account not only whether a person can be directly associated with a particular attribute, but also the extent to which attributes that may be revealed or inferred depend on an individual’s data and the potential harm that may result. In addition, it should take into account the potential uses and analysis of the data, which in turn affect data governance and management.
The term ‘databanks’ in these Standards encompasses a wide range of data types and methodologies, from registries[18] to databanks.[19]
Databanks provide a major resource for many public health and epidemiological research activities, ranging from disease prevention to resource allocation. Researchers can use them to significantly accelerate understanding of health; diseases; and the effectiveness, efficiency, safety and quality of preventive, diagnostic and therapeutic interventions.
However, databanks raise issues of dignity, autonomy, privacy, confidentiality and discrimination. Researchers should address these issues in accordance with the following general principles.
Government agencies may establish mandatory registries and databanks (e.g. the New Zealand Cancer Registry) in which participants and/or individuals are obliged to provide data rather than volunteering or consenting to do so. Research using such registries and databanks may be mandated (e.g. one of the purposes of the New Zealand Cancer Registry is to provide a basis for cancer survival studies and research programmes) and may not require ethical review or a waiver of consent.
However, for research studies that use identifiable or re-identifiable data from such databanks or registries and combine it with other data (e.g. data collected from participant and/or individuals via questionnaires), researchers must obtain participant and/or individuals’ consent or if it is not practical to do so, seek a waiver of consent.
12.40 When planning to contact people because their data is included in a databank, researchers must bear in mind that some people may be unaware that their data was submitted to a databank or may be unfamiliar with the process by which researchers gain access to such data.
12.41 Researchers must seek a waiver or obtain participant and/or individuals’ consent to submit their data to databanks, paying particular attention to the parameters of consented future uses. Researchers must respect any conditions that participant and/or individuals have placed on the use of their data stored in databanks.
12.41.a In limited circumstances, researchers may use identifiable data stored in databanks without consent; in these circumstances, they must first justify such use to an ethics committee and receive approval.
12.42 Databanks must have a governance structure in place to protect the rights, dignity, autonomy, privacy and confidentiality of participant and/or individuals and their communities.
12.43 Researchers should make relevant information on the governance of databanks available to the public.
12.44 Robust governance of databanks is important, to maintain the public’s trust in research that uses data from them. Some databanks may have distributed governance arrangements, where different parties are responsible for different aspects of governance. A databank’s governance structure, policy and principles must address:
[1] Administrative data can be defined as data that is collected by government agencies or private organisations in the course of conducting their business or services
[3] See the Te Mana Raraunga website(external link) for more information.
[5] See Reidentification Risks in HIPAA Safe Harbor Data: A study of data from one environmental health study(external link).
[6] See also the SIA’s information on data protection and use(external link) and data.govt.nz’s Data Confidentiality Principles(external link).
[12] hardware/software/system policies, named individuals with accountability, standards (or near to) e.g. ISO27001, researcher registration, user training etc
[16] "He Matapihi ki te Mana Raraunga” - Conceptualising Big Data through a Māori lens 2017
[18] Registries have organised systems that use observational study methods to collect uniform data (clinical and other) to evaluate specified outcomes for a population defined by a particular disease, condition or exposure, and that serve one or more predetermined scientific, clinical or policy purposes. Such registries are variously described as patient registries, clinical registries, clinical data registries, disease registries and outcomes registries.
[19] Health databanks have organised systems for collecting, organising and storing health information. Databanks may pursue a specific, focused research agenda, collecting data for a limited time to answer a specific research question. Alternatively, they may collect data over an indefinite time to answer a variety of existing and emerging research questions. See further CIOMS and WHO 2016; WMA 2006; and NHMRC 2018, Chapter 3.2.