Healthcare Data in Research

Last update: December 2024
Contact: Aurélie Halsband

I. Introduction

As part of medical treatments such as check-ups or acute therapies, data regarding the medical treatment of individuals is collected and stored in patient files and other formats. Until now, this data has not been used for medical research, or just to a very limited extent. However, data on the diagnosis and treatment of patients promises great potential for biomedical findings and thus also for improved patient care.

1. The potential of treatment data for medical research

The particular potential of using clinical data for medical research lies firstly in the fact that data already collected in the treatment context can be reused for research purposes. Furthermore, the data collected in the course of actual treatment is a more realistic reflection of healthcare practice than the data collected in clinical studies. This proximity to everyday medical practice is important because more detailed insights into patient care also offer more insights into possible approaches for improving it. For example, data from healthcare generally provides a more accurate representation of the various social groups, as women, older people, children and people with rare diseases are often underrepresented in clinical studies.

In addition, treatment data can be used to realise a range of study types, some of which are very different. The spectrum ranges from studies on the efficacy of drugs to studies on the retrospective assessment of the success of treatment trials, the early detection of chains of infection in hospitals and epidemiological studies on the distribution of diseases in certain population groups. The hopes associated with this multitude of possible studies are directed towards the possibility of more personalised medicine, a general improvement in prevention options and a reduction in side effects in existing therapies.

The intention to exploit this potential has also grown in recent years because the possibilities for evaluating large amounts of data have been greatly expanded by the further development of information technologies and artificial intelligence. However, many applications that use artificial intelligence for this purpose require extensive data sets from everyday treatment for training purposes.

Nonetheless, it is important to distinguish between scientific projects in medical research, which involve analysing large datasets, such as treatment data, and clinical studies. The difference between these types of studies can be determined, for example, by the significance of a scientific hypothesis in the respective studies. For example, drug studies are designed to test in several phases whether a certain active ingredient can be used to cure or alleviate a certain disease. Here, the hypothesis is tested as to whether the active substance is tolerable, safe and effective, also in comparison to other drugs already in use. Data-rich research, on the other hand, is often geared towards identifying correlations that are identified in large data sets. Only in a later step are these correlations examined with a hypothesis on possible causal relationships. In this way, data-based studies can prepare the testing of hypotheses in drug trials. Furthermore, broad access to treatment data promises to improve the planning of clinical trials, for example by allowing the number and spatial distribution of potential test subjects to be estimated more accurately in advance. Clinical trails and medical research on clinical (treatment) data can therefore complement each other and yet each contributes differently to the overall gain in knowledge in medicine (see also Part II, The demarcation of medical treatment and medical research).

2. Challenges of working with treatment data

Treatment data, or clinical data, is part of the overarching area of so-called health data. Clinical data originates from medical practices, clinics or health insurance companies and is collected and archived as part of medical examinations. Clinical data can also come from non-clinical sources such as fitness apps. The focus of discussions and efforts on the secondary use of clinical data for research is primarily on clinical data from practices, clinics and health insurance companies.

Although the secondary use of treatment data for medical research involves the use of data that has already been collected, this data cannot always be used directly for research. In order to be able to use them, so-called data work is required. Data are not raw materials that can be processed depending on the purpose, but rather information that has been collected from certain social contexts with a view to specific objectives. Data on patient care must also be stored, networked, and made available in specific ways in order to be used for research purposes. Such data collection practices require an appropriate technical and organisational framework.

Technical preconditions

At the technical level, a major challenge is to ensure the interoperability of data systems. For example, databases, software and other systems of the data-collecting institutions, i.e. clinics and medical practices in particular, must be set up in such a way that data can be transferred to each other and the various research institutions without falsification or major intermediate steps.

Furthermore, data must be as complete as possible, as information gaps can lead to distortions in interpretation. The harmonisation and standardisation of data collection and storage required for this must also be implemented across national borders in the long term to gain broad medical knowledge, which includes additional regulatory challenges.

Finally, it is essential to ensure that data acquired in a treatment context is valid for research purposes. To this end, the original context of data collection—i.e., the treatment context—must be clearly traceable in research projects. Furthermore, the treatment context must be considered when integrating the data into research projects to avoid distortions.

Organisational preconditions

The technical challenge is closely linked to the organisational challenge of ensuring that this systematic approach is integrated into patient care processes. In addition to the technical compatibility of the systems, it must also be ensured that healthcare staff systematically record and store the data in parallel with regular patient care. Previous filing methods for patient treatment information, such as doctor's letters, cannot be integrated directly into these systems but must be stored in other formats.

Furthermore, additional data that is not directly relevant to treatment must be collected for use in medical research. For example, information on the treatment of severe courses of disease in specialist clinics must reflect the specialisation of the clinic and the associated higher number of patients with severe courses of disease in order not to distort the overall distribution of these cases in society. The associated circumstances for medical and nursing staff must not be at the expense of the quality of patient treatment (see Part II, Healthcare professions, patient welfare and medical research). From the patient's perspective, it is also crucial that the treatment data stored and ultimately used is protected against misuse (see Part II, Patient autonomy and medical research).

The sometimes fragmented legal basis for the secondary use of treatment data for research also represents a hurdle in which, among other things, the legal standards at state, federal and EU levels on data protection and informed consent must be reconciled.

With regard to the goal of (further) using patient data in medical research in an interoperable, secure and low-threshold manner, comprehensive technical, regulatory and organisational requirements must be created. In Germany, this goal is being funded by the state and pursued with the so-called Medical Informatics Initiative (MII) and with the establishment of a National Research Data Infrastructure (NFDI).

II. Ethical aspects

An overarching line of conflict in the ethical discourse on the secondary use of treatment data for medical research lies in the contradiction between the promotion of patient welfare on the one hand and the common good on the other. Although medical research and patient care both aim to achieve the overarching goal of improving patient care, they differ in terms of their internal goals.

Therefore, the distinction between medical care and treatment and medical research will be examined first. Essential aspects of a potential trade-off between patient welfare and public welfare are then examined in detail, focusing on patients' willingness to release treatment data for medical research and the extent to which sensitive patient data can be protected against unauthorised disclosure and misuse. This consideration is supplemented by possible collisions between the professional ethos of medical and nursing staff, which is primarily focused on patient welfare, and the requirement to collect treatment data in such a way that it can also contribute to research and the common good. Although it can be assumed that the patient's well-being is paramount, freedom of research and the gain of knowledge are also highly protected interests. A further section explains how ethical considerations could help to guide the creation of an ethical-legal governance structure for the research use of treatment data in the interests of both patients and community.

1. The demarcation of medical treatment and medical research

Both in ethics and in law, the area of medical treatment has so far been strongly separated from that of medical research. Taking the contrast between patient welfare and the common good as a starting point, the difference between treatment and medical research has been described as follows: Patient care is directly concerned with improving or restoring the welfare of an individual person undergoing medical treatment. Medical research, on the other hand, is only indirectly aimed at improving the well-being of the individual, as it is initially aimed directly at gaining medical knowledge. Only in a further step can newly acquired knowledge be used to improve the well-being of individuals.

This distinction is also widely cited as the reason why clinical trials are subject to authorisation by an ethics committee, but not medical treatments. Since the former does not directly aim to benefit the individual test subjects, but does expose some test subjects to risks, ethical standards must be upheld separately here. However, in response to the increasing interlocking of healthcare and medical research, particularly in the context of the widespread analysis of patient data in the context of medical research, this distinction is being called into question more and more.

While, e.g., the attainment of generalisable knowledge and a systematic investigation geared towards this was previously attributed solely to the field of medical research, the treatment of patients also increasingly appears to exhibit these characteristics. For example, the systematic collection of data on patients and their treatment is increasingly becoming an integral part of medical practice and should therefore not be attributed exclusively to the field of research. This systematic collection and analysis of data from treatment is not only carried out in research, but also for health policy measures or health insurance companies. Data on the medical treatment of individual patients is therefore increasingly embedded in larger contexts beyond the individual patient's well-being.

Medical practice is also directly linked to the production of generalisable findings, as it constantly integrates findings from studies and, in response, again provides data to research, which ideally investigates these further scientifically. The fact that research is geared solely towards improving the common good, while medical practice focuses exclusively on the well-being of individual patients, therefore seems to be based on a sharp contrast, especially with regard to the secondary use of treatment data in research, which increasingly often cannot be identified in practice and instead tends to encounter intertwined contexts.

2. Patient autonomy and medical research

Another facet of the trade-off between research orientated towards the common good on the one hand and medical treatment orientated towards the individual good on the other can be seen in the debate on the obligations and rights of patients when their medical data is passed on to research.

Patient autonomy, data security and medical research

Treatment data is usually sensitive information about a person's health. This data is sensitive because it can allow conclusions to be drawn about a person's medical disposition or lifestyle, for example, which in turn could be misused by third parties. The data’s security is therefore of particular importance with regard to patient autonomy.

The potential for damage arises from possible breaches of data protection. A distinction can be made between different forms of damage: Damage that can arise (psychologically) from the mere knowledge of unauthorised disclosure of one's own health data. Furthermore, damage that can result from the publication of health data and the associated stigmatisation, for example as a result of a published diagnosis of a mental illness. And finally, damage in the form of disadvantages or discrimination by institutions such as an employer or insurance company that deny a certain resource or the pursuit of an activity, such as employment, as a result of access to personal health data. In the past, for example, in one case of data misuse in Denmark, data on mental health was passed on between state institutions without authorisation, so that applications to join the military or to apply for a driving licence were delayed or rejected. Unauthorised use can include unwanted personalised advertising, but also more serious forms of misuse of data such as discrimination, stigmatisation, blackmail or identity theft.

The severity of the possible damage can also be measured according to the stigmatisation potential of the respective health data: For example, the unauthorised disclosure and possible misuse of information about a mental disorder harbours greater potential for damage to a person than the unwanted release and evaluation of billing data, for example.

When weighing up the release of treatment data for research against patient autonomy, it is important to consider how high the actual risks of such data protection violations are, to what extent they can be effectively reduced by suitable framework conditions, such as supervisory bodies (see The regulation of research on treatment data) and how much the possible damage should be weighted in comparison to the potential of the research described. Particularly with regard to the research potential, it remains to be considered to what extent the often personalised storage of health data in hospitals, for example, does not harbour a similar risk of possible data protection violations, although in this context it is largely accepted by society and widely practised (see Autonomy versus duties of assistance and public welfare).

Variants of informed consent

The protection of patients' ability of disposal over the collection, use and disclosure of their clinical data can be traced back to the overarching principle of respecting the autonomy of individuals. Derivatives of autonomy are the institution of medical confidentiality and the fundamental right to informational self-determination. They take into account the sensitivity of treatment data and the person´s right to dispose of this data and its possible use. In the special context of medical research, the institution of informed consent is in itself a central expression of the respect for personal autonomy.

Respect for the autonomy of test subjects and patients is ensured in the context of treatments such as medical research on humans by, among other things, the requirement of informed consent. In research on treatment data, both contexts coincide: Data is collected on patients, which is then included in a study and thus becomes subject data. However, informed consent procedures established in medical research to date can only be partially applied to research on treatment data. In contrast to clinical studies, in which the subjects involved are informed about a specific study and can give or withhold their consent, treatment data is often included several times and over longer periods under ever new questions. Informed consent for each individual study is made more difficult because at the time the data is released for research, it is often not yet clear in which study the data will be examined and for what purpose.

In response to this, different variants of informed consent were developed and discussed. In contrast to informed consent, so-called broad consent does not require detailed information about the specific subsequent use of the data in a study; only general, exemplary forms of use and objectives are explained and consent is then requested. One difficulty with this model is that the broad consent is only partially “informed”. One way to mitigate this difficulty is to create so-called fiduciary structures, i.e. neutral bodies that represent the interests of the test subjects in the longer term and oversee the use of the treatment data.

The dynamic consent variant, on the other hand, is based on constantly updated consent for the further use of treatment data in a new study. It requires appropriate technologies, such as apps, which individuals can use to repeatedly consent to the use of their treatment data for each new study or to refrain from doing so. One particular difficulty with this type of consent are the technological requirements and the fact that data donors may find it tedious to have to go through repeated explanations about studies and consent procedures.

Another variant, the so-called dissent solution, initially requires the consent of the individuals to the use of their treatment data. This model is linked to the possibility for individuals to object to the use of their data at a low threshold. The dissent solution enables broad access to treatment data and requires little effort on the part of patients. From an ethical perspective, however, there is debate as to whether this solution sufficiently respects autonomy, which is usually taken into account in medicine through active consent.

An overarching challenge of providing information in the course of all forms of informed consent in research on treatment data is to distinguish the benefits of this use for research from those for the individual. It is particularly important to prevent the therapeutic misunderstanding, in which individuals wrongly assume that participation in a scientific study will or could generate direct benefits for themselves. This is rarely the case with research on treatment data in particular.

Autonomy versus duties of assistance and public welfare

While variants of informed consent shed light on the conditions under which the use of treatment data for medical research by the respective patients could be possible while preserving their autonomy, another line of discussion focuses on the importance of the common good. The idea of “data donation” or even “data altruism” is contrasted here with a possible obligation to provide the data. From this perspective, individual patients, as members of a community of solidarity, have an obligation to contribute to the common good, in this case to medical progress. Whether concerns regarding the protection of personal data on the part of patients are sufficient to weaken or undermine the obligation to contribute is controversial.

Advocates of an obligation to provide treatment data for medical research point out, for example, that treatment data is recorded anyway and is therefore already exposed to the (low) risk of misuse. In addition, the risk of unauthorised data use is considered to be low overall. The transfer of treatment data to research institutions under appropriate data protection precautions therefore does not increase the risk to such an extent that a fundamentally different handling of this data would be justified. Furthermore, the potential yield of medical research and the contribution of the individual data sets must be given particular weight. It is not merely a matter of foregoing a selective improvement in healthcare in order to protect personal data. Rather, the price of not using health data is very high for the general public, as research on them could identify treatment errors that frequently occur in everyday clinical practice and thus avert damage on a large scale.

3. Healthcare professions, patient welfare and medical research

The fundamental tension that can arise between promoting patient welfare and supporting medical research is also reflected in potential role conflicts for healthcare professionals. Healthcare professionals are first and foremost committed to patient welfare. At the same time, however, good healthcare is also based on maintaining and, at best, improving the quality of care through medical research. Even if one therefore agrees with the “primacy of patient welfare”, healthcare professionals are at least secondarily committed to promoting medical research.

One possible conflict between the use of treatment data for medical research and the promotion of patient welfare is the workload for medical and nursing staff when collecting the relevant data. It must be ensured that the additional effort required to collect this additional data does not have negative effects on patient care, such as less time to talk to patients.

Another risk is, for example, the shift in focus from patients' interests to the “logic” of data collection for research. In response to these challenges, suitable regulations are required for the appropriate recognition and remuneration of this additional service, as well as the provision of an appropriate technological infrastructure, such as easy-to-use software for entering data. Staff must also be appropriately trained to enable the subsequent utilisation of treatment data.

In addition to collecting data that is meaningful for research purposes, healthcare staff are also responsible for providing information on the use of treatment data for research purposes and then obtaining and documenting consent. Above all, this includes explaining the opportunities and risks of sharing data, such as the opportunity to contribute to a specific finding and the risk of data leaks. In this role, doctors in particular may find themselves confronted with additional expectations from patients, e. g. with regard to the quality of the information provided, as the primary addressee in the event of any data protection violations and generally as service providers for questions relating to research. The relationship of trust, which is a prerequisite for a good relationship between doctors and patients, can be compromised in this way.

However, since the overall aim of medical practice and medical research on treatment data is to improve patient care, these conflicts of interest tend to arise at the implementation level and may be prevented or mitigated by appropriate regulations and framework conditions.

4. The regulation of research on treatment data

Exploiting the potential of medical research on treatment data outlined above requires an institutional structure that minimises the associated potential for harm. Based on the legal regulation of clinical research on humans in particular, a governance structure would have to specify which types of research projects on treatment data require the vote of a research ethics committee. Since research on treatment data falls somewhere between general quality controls of healthcare, which typically do not require an ethics committee review, and research on human subjects, which does require an ethical review, attribution and regulation seem necessary here. Providing ethical oversight for this type of research, which both ensures patient welfare and supports high-level research, will likely require the implementation of more dynamic models of ethical review.

In line with the principles of respect for autonomy, beneficence, nonmaleficence and justice, which are widely recognised in biomedical ethics, further requirements for a governance structure that should frame research on treatment data have been discussed so far. In terms of the benefits and the orientation of healthcare towards the individual patient's well-being, the framework conditions for treatment data research must be designed in such a way that medical and nursing staff are not so overburdened by data collection that the quality of care for individuals suffers as a result. At the same time, the need to enable research on treatment data to improve healthcare and to create the necessary framework conditions for this can be derived from the benefits.

Considerations of autonomy have already emphasised in the previous sections the need to identify suitable forms of informed consent for research on one's own treatment data. This also includes the creation of fiduciary structures that protect the autonomy of the individual in the event of subsequent use of treatment data in their name.

In order to protect autonomy and uphold the principle of justice, a structure must also be established that imposes sanctions in the event of actual misuse of data and can counteract a loss of trust in society in the long term.

Suggested citation

German Reference Centre for Ethics in the Life Sciences (2025): In Focus: Healthcare Data in Research. URL https://www.drze.de/en/research-publications/in-focus/healthcare-data-in-research [date of access]