Home / National Ethical Standards / Part two / 12. Health data

12. Health data

Introduction

In the New Zealand context, data is seen as taonga (something sacred, precious, or significant) (Whaanga et al. 2017). A taonga should be actively cared for in a manner that preserves its integrity and value. Health data is used in most health and disability research studies, as well as QI projects. Some of this data is prospectively collected for the purpose of research, but a growing proportion of data is collected through routine processes, for example through healthcare procedures or interaction with health agencies.

Data exists in both analogue (paper) and digital (electronic) formats. Increasing digitisation means data is being collected from both ‘traditional’ sources such as administrative data, electronic health records, as well as novel sources like apps, fitness trackers, cellular phones, social media (Internet of Things, IoT). Digital infrastructures are allowing person-level linkages between healthcare and non-healthcare data, allowing unique insight into the social determinants of health. This section adopts a broad definition of these data sources and is intended to encompass all sources and types of data described in these Standards as ‘Health Data’. These standards should be adhered to by all researchers who hold and use data within this broader context.

The life cycle of health data includes collection, use, analysis, publication, storage, curation, and destruction (Figure 12.1). This chapter provides ethical guidance on collecting new data from participants and/or individuals and accessing and reusing data that has already been collected (for example, from clinical records, other research projects or administrative data[1]).

Figure 12.1 – The life cycle of health data

The life cycle includes 4 stages: store, use, dispose and collect. Store includes research specific, healthcare organisation and healthcare linked. Use includes primary research analysis and secondary use. Dispose includes destruction and archiving. Collect includes research specific, routine healthcare, internet of things and administrative.

Storing data often involves elements of security, governance and management, privacy, consent, and curation. Data can be used in a variety of ways: to explore concepts or answer the specific questions that prompted the collection of data in the first place; or, to explore concepts or answer questions formulated after the collection of data – this latter concept is referred to as “secondary use”. Data may be used for future studies and projects, including those which are unspecified, and data use may also occur through databanks (Data Registries). Lastly, data is disposed of: this disposal can take the form of destruction, or as is more often the case, either time-limited or indefinite archiving (for example, for regulatory compliance purposes).

Beyond these ethical standards, researchers must comply with current relevant standards for data governance and security.[2] At present, these include (1) the “Digital, data and technology services – minimum requirements”; (2) “HISO 10029: 2015 Health Information Security Framework”; and (3) “HISO 10064:2017 Health Information Governance Guidelines”, the latter of which highlights some key elements of data quality, privacy, privacy breach, and secondary use of data that are relevant to these standards. It is the obligation of researchers to ensure that they are up to date with current data privacy, governance, and security standards in New Zealand.

General considerations for data collection and re-use of existing data

The following standards apply to both new data collection and re-use of existing data.

Māori data

Māori data refers to data produced by Māori or that describes Māori and the environments they have relationships with. Māori data includes but is not limited to:

  • data from organisations and businesses
  • data about Māori that is used to describe or compare Māori collectives
  • data about Te Ao Māori that emerges from research.

12.1 Māori should be involved in decisions about the primary collection, analysis, and interpretation of Māori data in research contexts.

12.2 Decisions about governance and access to data for secondary purposes should be consistent with the Māori Data Sovereignty principles, developed by Te Mana Raraunga[3] below. While these principles were developed for Māori data, their application to all health data is recommended, and reflects good practice.

Māori Data Sovereignty principles

Rangatiratanga | Authority
  • Control
    Māori have an inherent right to exercise control over Māori data and Māori data ecosystems. This right includes, but is not limited to, the creation, collection, access, analysis, interpretation, management, security, dissemination, use and reuse of Māori data.
  • Jurisdiction
    Decisions about the physical and virtual storage of Māori data shall enhance control for current and future generations. Whenever possible, Māori data shall be stored in Aotearoa New Zealand.
  • Self-determination
    Māori have the right to data that is relevant and empowers sustainable self-determination and effective self-governance.
Whakapapa | Relationships
  • Context
    All data has a whakapapa (genealogy). Accurate metadata should, at minimum, provide information about the provenance of the data, the purpose(s) for its collection, the context of its collection, and the parties involved.
  • Data disaggregation
    The ability to disaggregate Māori data increases its relevance for Māori communities and iwi. Māori data shall be collected and coded using categories that prioritise Māori needs and aspirations.
  • Future use
    Current decision-making over data can have long-term consequences, good and bad, for future generations of Māori. A key goal of Māori data governance should be to protect against future harm.
Whanaungatanga | Obligations
  • Balancing rights
    Individuals’ rights (including privacy rights), risks, and benefits in relation to data need to be balanced with those of the groups of which they are a part. In some contexts, collective Māori rights will prevail over those of individuals.
  • Accountabilities
    Individuals and organisations responsible for the creation, collection, analysis, management, access, security, or dissemination of Māori data are accountable to the communities, groups, and individuals from whom data derives.
Kotahitanga | Collective benefit
  • Benefit
    Data ecosystems shall be designed and function in ways that enable Māori to derive individual and collective benefit.
  • Build capacity
    Māori Data Sovereignty requires the development of a Māori workforce to enable the creation, collection, management, security, governance and application of data.
  • Connect
    Connections between Māori and other Indigenous peoples shall be supported to enable the sharing of strategies, resources and ideas in relation to data, and the attainment of common goals.
Manaakitanga | Reciprocity
  • Respect
    The collection, use and interpretation of data shall uphold the dignity of Māori communities, groups and individuals. Data analysis that stigmatises or blames Māori can result in collective and individual harm and should be actively avoided.
  • Consent
    Free, prior and informed consent (FPIC) shall underpin the collection and use of all data from or about Māori. Less defined types of consent shall be balanced by stronger governance arrangements.
Kaitiakitanga | Guardianship
  • Guardianship
    Māori data shall be stored and transferred in such a way that it enables and reinforces the capacity of Māori to exercise kaitiakitanga over Māori data.
  • Ethics
    Tikanga, kawa (protocols) and mātauranga (knowledge) shall underpin the protection, access and use of Māori data.
  • Restrictions
    Māori shall decide which Māori data shall be controlled (tapu) or open (noa) access.

Data identifiability

There are a number of different levels of data identifiability and terms used to describe them.

12.3 Researchers must accurately describe the identifiability of data to obtain meaningful informed consent and to determine the ethical risk of their studies. HISO 10064:2017[4] describes the levels of data identifiability, Table 12.1.

Table 12.1 ­– Levels of data identifiability

Direct identifiers Indirect identifiers
Identifiable data
Data from which it can reasonably be assumed that it is possible to identify a specific individual involved in the study
  • NHI
  • Name
  • Street address
  • Phone number
  • Online identity (e.g., email, twitter name)
  • Identification numbers (e.g., community services card, driver’s licence).
  • Date of birth
  • Identification of relatives
  • Identification of employers
  • Clinical notes
  • Any other direct or indirect identifiers that carry significant risk of re-identification.
De-identified data Anonymised data
Non-identifiable data
There are two levels of non-identifiable data: de-identified data and anonymised data
  • The fields listed under the definition of identifiable data are excluded, and
  • Fields that might be used for deliberate
    re-identification are included, such as:
  • encrypted NHI or study codes
  • year of birth or age in years at a given date
  • event dates
  • gender
  • ethnicity (Level 2 as defined by Statistics New Zealand)
  • mesh block or suburb
  • deprivation index

A precise definition of  anonymised data has become more difficult because methods to re-identify data are rapidly evolving. Researchers should assume that all data is potentially re-identifiable and maintain governance and guardianship to this standard.

A minimal operational standard of anonymity should:

  • Exclude fields listed under the definition of identifiable
    or de-identified data, and
  • Obfuscate data to minimise re-identification risk, including but not limited to the following measures:
  • disclosure of the bare minimum data set for purpose
  • use of 5–10-year bands rather than dates
  • aggregation of ethnicity data (level 1 as defined by Statistics New Zealand)
  • blurring of geographic data (by area unit or city)
  • exclusion of low-frequency characteristics useful for re-identification (e.g., rare medical conditions)
  • strong consideration of more technical assessments or approaches such as k-anonymity ≥5, federated learning, differential privacy.

Re-identification

For the purposes of these Standards, data should be stored, utilised, and disposed of on the assumption that it is potentially re-identifiable.[5]

12.4 Researchers must identify and assess risks related to re-identification and implement measures to mitigate those risks though de-identification of data and obfuscation

12.4.a Data analysis involving data integration and linking may heighten risks of re-identification. Such a risk is greater if the study relates to a population in a small geographical area, or to individuals with unique characteristics, where a large number of variables relate to an individual.

12.4.b Researchers should give special consideration to the question of whether data being retained for future use needs to be kept identifiable.

12.5 Whenever studies using re-identifiable data reveal information that affects the health and wellbeing of participants and/or individuals (see ‘Returning results and incidental findings’), researchers must consider how to make that information available to the participants and/or individuals, if the participants and/or individuals have consented to receiving such information.

Benefits and harms from data use

Health data can generate benefits for individuals and the public both now and in the future. In some cases, it may be unethical not to use data because it may deny these benefits, and a failure to use it may also cause harm. Researchers must identify the possible benefits and risks of harm of data use, carefully balance them against each other, and consider how to minimise and mitigate any harms of data use.

The nature, degree, and likelihood of benefits resulting from studies is dependent on context, which researchers must consider every time they propose to use health data.

The nature, degree, and likelihood of possible harms resulting from studies also depends on context, which researchers must also consider every time they propose to use health data.

Table 12.2 lists some of the main types of potential harms from the use of health data.

Type of harm Indicator
Table 12.2 – Some potential harms from use of health data
Physical harm
  • Public attacks, spouse/partner abuse, domestic violence, delayed or inadequate treatment
Social harm
  • Discrimination, cultural harm, community discrimination, isolation, inability to access care or exclusion from care
Economic harm
  • Loss of employment or revenue, loss of health care services, loss of insurance, increased insurance premiums, increased health care costs, limited career options, loss of life resources, forced relocation
Psychological or emotional harm
  • Distress, trauma, stigma
Legal harm
  • Arrest, prosecution, expulsion, loss of insurance
Privacy harm
  • Participants and/or individuals not accessing services because they believe their privacy is at risk
Interpretation harm
  • Inappropriate conclusions, apophenia (reporting patterns that are not there), implied causality rather than correlation, unrecognised data-quality issues, digital misrepresentation (e.g. algorithmic bias)

In light of these potential harms, the following general standards apply to the use of health data. More detailed standards also apply for some aspects, for example, the storage and protection of health data.

12.6 Researchers must justify health data use, recognising the ethical tension between respect for individuals or groups (according to principles such as privacy, confidentiality, dignity and autonomy) and beneficence (the advantages of generating new knowledge).

12.7 Researchers must identify the possible benefits and risks of harm of health data use, carefully balance them against each other, and consider how to minimise and mitigate any harms of data use.

12.7.a Studies involving health data should seek to minimise risks and maximise benefits. This applies to both prospectively collected data and previously collected data being used for a secondary purpose.

Privacy and confidentiality

The principles of privacy and confidentiality apply to all health data at all points of the data lifecycle.[6]

12.8 Researchers must record and respect restrictions that participants and/or individuals place on the use of their health data.

12.9 Researchers must protect participants’ and/or individuals’ health data and must only use and disclose it to people authorised by those participants and/or individuals, unless:

  • disclosure of the data is required by law
  • the researchers believe, on reasonable grounds, there is a serious and imminent threat to public health, public safety or the life or health of an individual, [Rules 10 (1) (d) and 11 (2) (d) HIPC]

12.10 Unauthorised disclosure plans should be in place that are compliant with HISO 10064:2017 Health Information Governance Guidelines[7] and the Privacy Act,[8] and adherent to organisational policies and procedures. This plan should include steps to reduce accidental disclosure and data breach, how to inform participants and/or individuals, as well as mitigation steps to limit the impact of accidental disclosure and data breach.

Storage, governance and management of data

Data can be stored in analogue or digital form. Regardless of the form of storage, health data storage must meet the following standards:

12.11 Health data should be stored in a secure manner. Examples of secure storage include: locked file cabinets in locked rooms; password protected databases located on computers in locked rooms; password protected databases via password protected computers; etc.

12.12 Researchers should weigh the benefits and risks of keeping identifiers on stored data.

12.12.a In some cases, there will be good reasons to maintain an identifier, or a link to an identifier (e.g. to maintain participant and/or individual safety, or to re-use the data).

12.13 Data should not be stored longer than is required for the purposes for which the information may lawfully be used, but should be stored for the minimum period required by New Zealand law (currently 10 years for health data that relates to an identifiable individual).

Robust policies, processes, and procedures must be in place to manage data throughout its life cycle. This requires high-quality, transparent data governance and data management. Appropriate governance and management are especially important in cases where the consent requirement for data use has been waived, where there is data linking, or where unspecified future use is intended. Māori control of Māori data is the primary goal for Māori data sovereignty by improving Māori/iwi access to data for governance decision-making and ensuring Māori/iwi involvement in governance of data.

Data can be primarily collected by a researcher, but in the modern healthcare environment organisations are often the primary data source. This creates a tiered structure of overlapping responsibilities of data guardianship between, on the one hand, organisations that create, store, and allow access to data and, on the other hand, individual researchers who use this data, who may work within or outside the data source organisation.

Organisational Guardianship

12.14 Organisations must establish proportional, appropriate and robust data governance and data management policies and procedures during the life cycle of data.

12.14.a Relevant current standards to which organisations must adhere include the “Digital, data and technology services – minimum requirements[9]”, “HISO 10029: 2015 Health Information Security Framework[10]”, and “HISO 10064:2017 Health Information Governance Guidelines.[11]

12.14.b This latter document highlights some key elements of data quality, privacy, privacy breach, and secondary use of data that are relevant to these standards.

12.14.c Issues an organisational data governance committee should consider include:

  • the quality and reliability of data
  • whether there is a social license for secondary data use; i.e. the ability of an organisation to use and share data in a legitimate and acceptable way, based on the trust that individuals have
  • details of the form (i.e., identifiable, de-identified or anonymised) in which health data will be collected, accessed, used and stored at the different stages of use, and measures proposed to remove identifying details
  • policies for who will access health data and under what conditions
  • policies for how consent will be sought for data collection and use, including secondary use. If data collection and use are unconsented, policies for seeking a waiver of consent from an ethics committee
  • how Māori rights and interests in relation to data will be recognised, and how Māori will be involved in the governance of Māori data
  • how the privacy and confidentiality of health data will be protected, including any circumstances in which it may not be possible to protect it, and any circumstances that may result in unauthorised disclosure of health data. Organisations should aspire for best in class in terms of data security and accountability[12]
  • procedures for dealing with any breaches of privacy and confidentiality, including unauthorised disclosure of health data; measures that will be taken to notify those affected by the disclosure; and measures that will be taken to mitigate any harm caused by unauthorised disclosure
  • named accountability for complying with requirements regarding the privacy and confidentiality of health data
  • policies for how researchers accessing and using health data will be held accountable for complying with requirements regarding the privacy and confidentiality of that data
  • procedures for the return of results, including incidental findings
  • transparent policies for commercial use of health data and proposals for benefit sharing, including intellectual property issues
  • whether health data will be transferred to other countries, and whether, in those countries, it will be subject to laws providing comparable safeguards to those available in New Zealand and whether there will be New Zealand representation on overseas governance committees[13]
  • whether health data will be transferred to other institutions such as databanks and registries, and in that context who will access it, how it will be used (e.g., future uses and linking) and how privacy and confidentiality will be protected and whether there will be New Zealand representation on overseas governance committees
  • what measures will be adopted to ensure transparency across all aspects of the data life cycle.
Researcher Data Guardianship

12.15 Researchers and or institutions utilising data must establish proportional, appropriate and robust data governance and data management processes during the life cycle of data. This should complement organisational governance and management structures, but do not supersede those requirements.

12.15.a Researchers and or institutions must describe these frameworks in their protocol or associated documents, and they should include:

  • the purposes of the data collection, and how data will be collected and by whom, including any training required for data collectors
  • the proposed uses of health data, including any future uses, linking and other analytics that may result in harm to the participant and/or individuals or others, such as their families, whānau, communities and groups
  • details of the form (i.e., identifiable, de-identified or anonymised) in which health data will be collected, accessed, used and stored during the data life cycle and measures proposed to remove identifying details
  • who will access the health data and under what conditions
  • plans for how consent will be sought for data collection and use, including secondary use. If data collection and use are unconsented, plans for seeking a waiver of consent from organisational data governance committee or an ethics committee
  • how Māori rights and interests in relation to data will be recognised, and how Māori will be involved in the governance of Māori data
  • the length of time health data will be retained
  • how the privacy and confidentiality of health data will be protected, including any circumstances in which it may not be possible to protect it, and any circumstances that may result in unauthorised disclosure of such data
  • procedures compliant with organisational policies and procedures for dealing with any breaches of privacy and confidentiality, including unauthorised disclosure of health data; measures that will be taken to notify those affected by the disclosure; and measures that will be taken to mitigate any harm caused by unauthorised disclosure
  • named accountability for complying with requirements regarding the privacy and confidentiality of health data
  • procedures for the return of results, including incidental findings
  • transparent plans for commercial use of health data and proposals for benefit sharing, including intellectual property issues
  • whether health data will be transferred to other countries, and whether, in those countries, it will be subject to laws providing comparable safeguards to those available in New Zealand and whether there will be New Zealand representation on overseas governance committees
  • whether health data will be transferred to other institutions such as databanks and registries, and in that context, who will access it, how it will be used (e.g. future uses and linking) and how privacy and confidentiality will be protected and whether there will be New Zealand representation on overseas governance committees
  • participant and/or individuals’ rights to correct their data
  • procedures for withdrawing participant and/or individuals’ data
  • details of proposed approaches for community engagement
  • what measures will be adopted to ensure transparency across all aspects of the data life cycle.

Sending and or Storing Data Overseas

The New Zealand government has adopted a Cloud First policy, requiring agencies to accelerate their adoption of public cloud services as it pertains to digital data.[14] This adoption is on a case-by-case basis following risk assessment. In the case of research data, storage of digital and analogue data is normative. Data storage security and privacy principles should pertain to both analogue and digital data storage.

Researchers should be aware that digitally-transmitted and stored data may pass through jurisdictions outside New Zealand and be stored in facilities outside New Zealand. Researchers should be aware that data stored outside New Zealand is governed by local standards of data security and privacy protection, which may vary depending on local legal standards. The cloud risk assessment process and tools provided by digital.govt.nz[15] provide principles of data security and access control that should be considered by researchers using overseas or cloud-based data storage.

Identifiable data may be sent overseas for the purposes of research if the person from whom the data was collected has consented to it or if a waiver of consent is granted.

12.16 Researchers should consider whether it is possible, appropriate and practical to seek consent to store data overseas.

12.16.a If consent is being sought, researchers must ensure participants and/or individuals understand that privacy protections in other countries may be different to those offered in New Zealand and that there may be no New Zealand representation on overseas organisations which make decisions about data use.

Non-identifiable data may be sent overseas without consent where, due to the nature of the information, it is not possible, appropriate or practical to seek consent.

12.17 Generally, the risk of sending non-identifiable data overseas is lower than sending identifiable data; however, in this case researchers should consider the fact that:

  • other countries may have lower levels of data protection than New Zealand, and
  • overseas researchers are unlikely to work with data in a way that is culturally appropriate for the New Zealand context, or have connections or understanding of the communities that the data relates to. For example, they may not be aware of the importance of avoiding a deficit model when discussing health data related to Māori, Pacific peoples and other groups.

Directly-collected new data

These standards are about collecting new information from individuals or communities.

12.18 Researchers must collect new data from participants and/or individuals in a manner that is lawful and fair, and that does not intrude to an unreasonable extent on the personal affairs of participants and/or individuals (Rule 4, HIPC).

12.19 Researchers should pay attention to participants and/or individuals’ preferences (e.g. they may wish to have whānau or family members present) and cultural sensitivities.

12.20 Determining whether a particular means of collection is unreasonably intrusive may depend on the context and sensitivity of the information. For example:

12.20.a information may be particularly sensitive where it relates to a person’s sexual life, ethnicity or HIV status; diseases or conditions carrying social stigma; mental health; life expectancy; or addiction.

12.20.b privacy may be at risk if the physical environment at the time of collection is a prison, rest home, school, educational institution, hospital or place of employment.

12.21 People collecting data must be suitably trained, experienced and culturally knowledgeable. If they are new researchers, they must be supported by a suitably trained, experienced and culturally knowledgeable person.

12.22 When collecting new data from participants and/or individuals, researchers must ensure that they are informed of, and consent to, the collection and use of their new data for the study.

12.22.a Note that this applies to the collection of new data. If a researcher is accessing identifiable or re-identifiable data that has already been collected, a waiver may be required.

12.23 Researchers must only collect data necessary for the specified purposes of their study.

12.24 Researchers must obtain consent from a participant and/or individual from whom data has been collected in a study (‘the original study’) to use that data for future studies.

12.25 Protocols, by design, should specify the category to which the future unspecified research falls under, and should provide adequate rationale as to how the risks and benefits justify the proposed future unspecified research. Additionally, protocols should specify if results arising for future research will be made available to the participant and/or individual.

Re-use of existing data

Increasingly, data that has been collected for a specific purpose, for example clinical care or administrative data through government agencies, is re-used and or linked for health research. These standards are about re-using data. For a further resource on the ethical secondary use of existing data, see Stats NZ’s Five Safes framework.

Determining sensitivity, level of consultation and level of data management

Table 12.3 below summarises key Māori concepts relevant to questions that help assess the level of sensitivity of the data, and the corresponding requirement to consult for re use, and the appropriate level of data management (Hudson et all 2017) [16].

12.26 Taking into account the table below, researchers should carefully consider whether they should undertake robust, active and ongoing engagement with relevant communities and stakeholders to establish whether the proposed data use is acceptable.

12.27 Any such engagement should be transparent and fair, done in good faith, be truthful, and consistent with the concepts and practice of the Te Ara Tika principles.

Concept Characteristic Assessment Question
Table 12.3 – Assess and determine data sensitivity
Tapu Level of sensitivity
  • How sensitive is the data?
Noa Level of accessibility
  • How accessible should this data be?
Tika Level of value
  • How does the use of this data add value to the community?
Pono Level of trust
  • Will the community support this use of the data?
Mauri Level of originality
  • How unique is the data?
Wairua Nature of the application
  • Is the data being used in the same spirit as its original use?
Whakapapa Level of relationship
  • Does the user have an existing relationship with the data?
Pukenga Level of expertise
  • Does the user have the expertise and experience to use data in a culturally appropriate manner?
Kaitiaki Level of authority
  • Will the data be protected from inappropriate use?
Wānanga Level of responsibility
  • Does the institution have the necessary infrastructure to ensure the use of the data in a culturally appropriate and ethical manner?

Waiver of consent for secondary re-use of identifiable health data

Gaining informed consent to use previously collected identifiable data (including linking) should always be the default starting point. Where researchers propose to use identifiable without specific consent for a study or project (e.g. where data was collected for care, or the proposed data use is not consistent with the scope of the original research consent), they must:

12.28 Satisfy national data standards and local data governance requirements.

12.29 Justify to an ethics committee that the nature, degree and likelihood of possible benefits (including to participant and/or individuals and the value of the research to the public) outweigh the nature, degree and likelihood of possible harms (including to any participant and/or individual, other individuals, whānau, hapū, iwi, Māori communities and any other groups or communities). In determining whether to grant a waiver of consent an Ethics Committees may also have regard to the following factors:

12.29.a There are scientific, practical, or ethical reasons why consent cannot be obtained.

12.29.b Appropriate data governance plans are in place.

12.29.c The researchers have identified whether consultation is required, and if required they have undertaken appropriate consultation with cultural or other relevant groups, and those consulted support the proposed use.

12.30 When considering a waiver, researchers should identify if there is any known or likely reason to expect that the participant and/or individual(s) would not have consented if they had been asked.

12.30.a It should be understood that a waiver of consent is not a waiver of responsibility, e.g. should there be an actionable incidental finding then it should be disclosed to the participant and/or individual.[17]

Data-linking

Data-linking is a technique for connecting pieces of information that are thought to relate to the same person, family, place or event. If these different pieces of information can be connected to a person in a way that does not breach their privacy or cause harm, linking them can create a rich resource for research to answer complex questions and improve health outcomes (Data Linkage Western Australia 2019).

When data sets are linked, the risks of identification and adverse public reaction are likely to be greater, especially when the different data sources (which may apply to individual people, households or organisations), may have been designed and collected without the intention of using them together. The process may give rise to concerns that the combined format produces a detailed picture of individuals that they did not consent to when they supplied the data. Privacy is a major consideration in data linkage work.

12.31 Researchers involved in data-linking must weigh the potential benefits of their research against the risk that individuals will be identifiable within their results. See ‘Benefits and harms from data use’ and ‘Research benefits and harms’.

12.32 Researchers must either seek consent from participants and/or individuals or obtain a waiver from a local data governance committee or an ethics committee for research that involves data-linking with identifiable and re-identifiable data.

12.33 Consent from participants and/or individuals or a waiver from an ethics committee is not required for use of linked non-identifiable data, but researchers should be aware of the type and size of data sets being linked, and how these factors increase the risk of identification.

12.33.a Data linked by a third party at the request of a researcher, but provided in a non-identifiable format, is a way of controlling risk of re-identification in research involving linkage.

12.33.b Use of linked data that has been rendered non-identifiable presents lower risks than linked identifiable or re-identifiable data; however, risks in relation to interpretation harms and re-identification remain, and researchers must consider them.

12.34 Researchers must respect any conditions concerning data-linking expressed within participants and/or individuals’ existing consent. In the absence of direct participant and/or individual consent, a waiver must be sought from an ethics committee.

12.35 The amount of data that is linked should be fit-for-purpose. Researchers must be able to justify re-use of requested data.

12.36 Researchers should be aware that if their research includes data linkage the methods by which that data was collected may result in systematic biases. This in turn may have implications for the validity of the research results.

12.36.a Researchers should consider these limitations when designing their research and mitigate the impacts of these biases where possible. They should also be recognised when reporting research results.

12.37 Researchers should account for the destruction of any linked data. If an explicit destruction plan is not specified, then the rationale for archiving should be provided. Any long-term data storage must adhere to local data governance, national standards, and law as applicable.

12.37.a In considering how long to hold linked data, researchers must undertake a balancing exercise between the advantages of the robustness of data linkage and the ability to validate data linkage and protection of privacy, and benefits of re-use of data.

12.37.b Researchers should be prepared to provide local data governance committees (for example, a research office at a DHB) or ethics committees with a detailed plan of linked data storage, an accounting of the risks of storage, and plans to mitigate the risk of storage.

12.38 Researchers must work within established organisational governance structures, as well as develop specific data management plans that ensure the data is being accessed and linked in an appropriate and responsible manner.

12.39 Researchers must address the privacy risks of linking data by analysing the primary and secondary uses of the data, considering not just re-identification risks but also inference risks.

12.39.a Analysis should take into account not only whether a person can be directly associated with a particular attribute, but also the extent to which attributes that may be revealed or inferred depend on an individual’s data and the potential harm that may result. In addition, it should take into account the potential uses and analysis of the data, which in turn affect data governance and management.

Databanks (registries)

The term ‘databanks’ in these Standards encompasses a wide range of data types and methodologies, from registries[18] to databanks.[19]

Databanks provide a major resource for many public health and epidemiological research activities, ranging from disease prevention to resource allocation. Researchers can use them to significantly accelerate understanding of health; diseases; and the effectiveness, efficiency, safety and quality of preventive, diagnostic and therapeutic interventions.

However, databanks raise issues of dignity, autonomy, privacy, confidentiality and discrimination. Researchers should address these issues in accordance with the following general principles.

  • Research using databanks should benefit society, particularly in terms of public health objectives.
  • Researchers have ethical and legal obligations to respect the dignity, autonomy, privacy and confidentiality of individuals when using data from databanks.

Government agencies may establish mandatory registries and databanks (e.g. the New Zealand Cancer Registry) in which participants and/or individuals are obliged to provide data rather than volunteering or consenting to do so. Research using such registries and databanks may be mandated (e.g. one of the purposes of the New Zealand Cancer Registry is to provide a basis for cancer survival studies and research programmes) and may not require ethical review or a waiver of consent.

However, for research studies that use identifiable or re-identifiable data from such databanks or registries and combine it with other data (e.g. data collected from participant and/or individuals via questionnaires), researchers must obtain participant and/or individuals’ consent or if it is not practical to do so, seek a waiver of consent.

12.40 When planning to contact people because their data is included in a databank, researchers must bear in mind that some people may be unaware that their data was submitted to a databank or may be unfamiliar with the process by which researchers gain access to such data.

12.41 Researchers must seek a waiver or obtain participant and/or individuals’ consent to submit their data to databanks, paying particular attention to the parameters of consented future uses. Researchers must respect any conditions that participant and/or individuals have placed on the use of their data stored in databanks.

12.41.a In limited circumstances, researchers may use identifiable data stored in databanks without consent; in these circumstances, they must first justify such use to an ethics committee and receive approval.

12.42 Databanks must have a governance structure in place to protect the rights, dignity, autonomy, privacy and confidentiality of participant and/or individuals and their communities.

12.43 Researchers should make relevant information on the governance of databanks available to the public.

Governance, policy, and principles of databanks

12.44 Robust governance of databanks is important, to maintain the public’s trust in research that uses data from them. Some databanks may have distributed governance arrangements, where different parties are responsible for different aspects of governance. A databank’s governance structure, policy and principles must address:

  • the purpose of the databank
  • in broad terms, the types of research for which the databank may be used, and the types that are not permitted, or are permitted only after individuals have re-consented
  • procedures for obtaining consent from participant and/or individuals for submitting data into the databank and using data stored in the databank, including the documentation of restrictions on future uses of participant and/or individuals’ data, conditions on the identifiability of data, and other issues (e.g. intellectual property rights), to ensure they are traceable and respected
  • criteria for determining when researchers may use participant and/or individuals’ data without consent and the procedures that they must follow in this case
  • procedures for participant and/or individuals’ withdrawal of consent, and circumstances under which it is not possible for participant and/or individuals to withdraw consent
  • criteria for determining when participant and/or individuals need to be re-contacted, and procedures researchers must follow in this situation
  • criteria for determining who may access and use participant and/or individuals’ data and under what circumstances
  • methods for ensuring researchers and others accessing and using participant and/or individuals’ data will be held accountable for unauthorised access to, or inappropriate or unauthorised use of, participant and/or individuals’ data
  • measures for the physical and electronic protection of participant and/or individuals’ data
  • procedures for quality control of data collection
  • procedures for research involving data-linking, including maintenance of the confidentiality of the link between collected data and personal identifiers
  • mechanisms for keeping participant and/or individuals informed of research outcomes
  • procedures for participatory engagement with patient groups or the wider community
  • methods for ensuring the transparency of the databank’s operations
  • procedures for allowing participant and/or individuals to request corrections to mistakes and omissions of their data
  • arrangements for the storage, disposal and destruction of participant and/or individuals’ data (unless data is stored indefinitely, which requires an ethical justification)
  • the person or people who are responsible for the governance of the databank
  • arrangements for dealing with participant and/or individuals’ data if the databank has a change of ownership or closes
  • arrangements for protecting the privacy, rights and welfare of participant and/or individuals whose data is stored in the databank
  • procedures to be followed if a researcher is considering reconstructing pre-existing data into a format that suggests it will become a new databank (in this case, the researcher should attempt to identify custodians of the original data and seek advice about governance issues from these custodians)
  • procedures for receiving and addressing enquiries and complaints.

[1] Administrative data can be defined as data that is collected by government agencies or private organisations in the course of conducting their business or services

[3] See the Te Mana Raraunga website for more information.

[12] hardware/software/system policies, named individuals with accountability, standards (or near to) e.g. ISO27001, researcher registration, user training etc

[16] "He Matapihi ki te Mana Raraunga” - Conceptualising Big Data through a Māori lens 2017

[18] Registries have organised systems that use observational study methods to collect uniform data (clinical and other) to evaluate specified outcomes for a population defined by a particular disease, condition or exposure, and that serve one or more predetermined scientific, clinical or policy purposes. Such registries are variously described as patient registries, clinical registries, clinical data registries, disease registries and outcomes registries.

[19] Health databanks have organised systems for collecting, organising and storing health information. Databanks may pursue a specific, focused research agenda, collecting data for a limited time to answer a specific research question. Alternatively, they may collect data over an indefinite time to answer a variety of existing and emerging research questions. See further CIOMS and WHO 2016; WMA 2006; and NHMRC 2018, Chapter 3.2.