The Use of Real-world Evidence in Rare Diseases and Patient Identification

The use of real-world evidence (RWE) in the reporting of treatment/prescribing patterns and the plethora of outcomes data is widely published in scientific literature. 

RWE can also be used for post approval studies, establishing standards of care, and reporting treatment patterns in the real-world setting. Information that is incredibly valuable for patients, clinicians, and pharmaceutical researchers.

The use of RWE in regulatory approval has arrived with the FDA publishing draft guidelines in December 2021 for the use of RWE to help support approval of a new indication for a drug already approved or to help support post-approval study requirements, with the promise of future guidelines to follow. 

The important question is, how can RWE help in the rare disease sphere?

Rare diseases, by definition, affect less than 5 people in 10,000, according to European regulatory guidelines for orphan disease designation or less than 200,000 people in the United States, according to the Orphan Drug Act of 1983. The NIH estimates there are approximately 7,000 rare diseases in the US.

Rare disease publications are typically limited to case studies or studies with small numbers due to difficulty in finding these patients. In the US, claims data sets with large patient numbers over expansive geographical areas are a goldmine for rare disease research. We have seen RWE from claims datasets as well as EHR (electronic health records) be used to provide insights into the epidemiology, treatments, and outcomes of rare diseases.  

However, there are challenges when carrying out research using RWE data sets, and more so for rare diseases; especially for those with no ICD or SNOMED code. So how do we identify these rare disease patients? 

Patient Identification and ICD Coding

Claims datasets have limitations, particularly as everything is coded primarily for billing purposes, not for outcomes research. A challenge for all claims and EHR datasets is if the disease does not have a specific ICD-10 code or SNOMED code, the data for the disease cannot be found easily as the patients will be coded under an umbrella code, typically along the lines of “other disorders of…”. 

This umbrella code effect occurs in many UK EHR datasets where ICD-10 codes are only routinely reported to the 4th digit and not the 5th. An example of this is alpha-1-antitrypsin deficiency (AATD) which has an ICD-10 code E88.01 but in systems, only reporting to 4 digits, AATD patients would come under E88.0 which includes plasminogen deficiency and “other disorders of plasma-protein metabolism, not elsewhere classified”. To identify the AATD patients, we need to use other disease-specific metrics that might be coded for in the RWE dataset such as what might be known about the patients’ characteristics, treatments, procedures, or genetic testing.

Patient Identification: An Algorithm-Based Approach

Patients with a particular rare disease without an assigned code might be identified if there is a procedure specific to their condition or an indicated treatment that would differentiate them from the other diseases captured by the broader code grouping. Otherwise, published studies may indicate other characteristics such as a specific mutation, the typical age range at diagnosis, or specific biomarkers.

Other methods involve building AI-based algorithms based on the characteristics from confirmed diagnosed patients and then running these algorithms across the dataset to estimate the number of other probable patients. AI can be built into these algorithms adapting as more cases are diagnosed or as awareness of the condition increases. These algorithms can be applied in healthcare settings, to aid clinicians and patients by speeding up diagnoses of conditions that sometimes take months or years to finally diagnose.

The Future of RWE

RWE-based research is limited by the codes. Free text searches of EHR or NLP (natural language processing) to pick out disease-defining terminology can be used to identify patients with the rare disease of interest.

Increased granularity of coding is essential for rare diseases. Future versions of the ICD coding system will have enhanced granularity, with ICD-11 coming into effect on the 1st of January 2022 (WHO, 2022). 

However, the lag in implementing the ICD-11 coding throughout RWE datasets coupled with the time taken to generate longitudinal cohorts will result in it potentially taking a couple of years before these changes are reflected in research, as patients won’t be retrospectively assigned an ICD-11 code. 

Publicly available disease registries typically have a one to two-year reporting lag to allow for data collection and cleaning before publication, so we can expect a delay of up to four years for some hospital databases such as the Healthcare Cost and Utilization Project (US), HES Hospital Episode Statistics (UK), and the Federal Statistical Office of Germany.

Other RWE sources such as the claims and hospital EHR will implement these coding systems sooner and time delays will only be for their recording systems to be updated to support ICD-11. We expect that 2022 will see the emergence of a new expanded landscape for rare disease RWE research.

At RwHealth, we work with clients to simplify and accelerate these challenges, bringing together the various sources of data to identify the patients most likely to benefit from a new medicine and continually monitoring their outcomes as a therapy area evolves and new treatments become available.  This live view of the treatment landscape massively reduces the time and costs of large-scale late phase clinical studies.

To learn more about the DSP and how we can best support you, email

Press enquiries

If you are a journalist looking for expert comment for news or features – or to arrange interviews, filming and photography – please email