Cultivating an Analytics Culture in the Treatment of Rare Diseases


Rare diseases are so called as they affect a very small percentage of the population. In the US, a condition is called rare if less than 200,000 people are affected by it, although currently there is no global consensus on the cut-off number.

According to the World Economic Forum, there are more than 7,000 rare diseases, and it is estimated that 50% of those afflicted with rare diseases are children. Estimates place the number of those affected by rare diseases at a staggering 350 million people worldwide, with millions more having specialized conditions. Not to mention the families of patients and loved ones whose lives are severely impacted as a result. And most rare diseases, sometimes called ‘orphan’ diseases, are genetic and therefore present throughout a patient’s lifetime.


Challenges in Diagnosis and Treatment of Rare Diseases

Diagnosis and treatment of rare diseases continues to be a challenge due to basic economics. The rarity of these ‘orphan’ diseases precludes them from investment of proper funds and resources, given profitability concerns. In June 2019, according to Tufts Center for the Study of Drug Development, the cost of developing just one drug from preclinical testing to market approval was estimated to be $2.6 billion. Artificial intelligence (AI) and machine learning (ML) could help circumvent this issue by streamlining the process of drug discovery and development. According to Insider Intelligence AI could curb drug discovery costs in general by as much as 70%.

The cost of drug discovery and development aside, rare diseases have an added challenge: an accurate diagnosis. Based on research conducted in 2014 for Global Genes, a patient with a rare disease will visit more than seven physicians on average, and it will take almost five years from symptom onset to reach an accurate diagnosis. A misdiagnosis or a lack of knowledge remain significant challenges for people with rare diseases.

From the physician’s perspective, the patient may present symptoms associated with more common conditions, and/or the physician may have never seen a specific rare disease before or even be aware that it exists.

From the researcher’s perspective, the fundamental challenge is gaining access to enough patient data. Patients are difficult to find, information is siloed, and complications arise due to perceived privacy issues and data ownership. Limitations such as informed consent agreements, researchers who expect to control data for their work, and variance across health systems and payer structures further complicate matters.

As a result, the onus comes to lie with parents and caregivers who are building patient advocacy groups, technology platforms, biotech companies, and data models. Still, 85% or more of those afflicted have no access to any kind of coordinated care or structured support.

Fig. 1. Facts and statistics on rare diseases provided by Global Genes

How AI and ML Can Address Rare Disease Challenges There is an ever-increasing amount of data being collected in biomedicine with a need to rapidly and efficiently collect, analyze, and characterize all the information. AI tools can quickly and accurately store and analyze oceans of data from multiple sources. AI, especially deep learning techniques, is already being applied to basic research, diagnosis, drug discovery, and clinical trials.

Essentially, AI is the theory and development of computer systems that store and process vast amounts of data to execute tasks that would typically require human intelligence, such as facial recognition, speech recognition, decision-making, and analytics. Machine learning (ML) - a subset of AI - is where computers are programmed to learn from experience, and then adjust their processing according to new information or data.





AI and ML are like bloodhounds when it comes to sniffing out trends and patterns in data.

This means they can identify indicators and red flags that humans might easily miss or wouldn’t have the time to sift through the enormous amounts of data.


The rare disease space, which is severely underrepresented, is in a unique position to take full advantage of the improvements in complex patient diagnostics and care that AI and analytics make possible.


AI can also play a significant role in providing patients with a powerful platform to build networks and support communities, and create channels to accumulate valuable insights, experiences, and knowledge in a real-world setting.


Using AI and ML to Diagnose and Treat Rare Diseases There are two major initiatives which currently play a significant role in addressing rare disease challenges, and each relies heavily on AI technologies. Research-Focused Initiative The first is a research-centered initiative started by the NIH, called Rare Diseases Clinical Research Network (RDCRN). With the aim to advance medical research on rare diseases, it provides support for clinical studies and facilitates collaboration, study enrollment, and data sharing. It is designed to promote highly collaborative, multisite, patient-centric, translational and clinical research. Through this network, scientists from various disciplines at multiple clinical sites from all over the world can work with patients and advocacy groups to study over 280 rare diseases. Collaborative activities, such as multisite longitudinal studies and clinical trials, as well as data management that facilitates high quality data standards, collection, storage and sharing of data, all rest largely on AI technology’s shoulders.

Genomic analysis using next generation sequencing (NGS) and other “omics technologies” have taken the diagnosis and molecular understanding of rare diseases to the next level. The vast amounts of data gathered through these technologies needs to be identified, analyzed, and integrated. For this reason, the need for the automation of tasks that currently require human intervention is on the rise. Additionally, electronic health records (EHR) and biomedical literature contain extensive clinical information that will significantly increase the likelihood of identifying data trends.

AI technologies have been developed to analyze diverse arrays of ‘big’ data, ranging from genomic analyses, and individual clinical phenotypes in EHRs, to large and multiparametric patient cohort analysis, towards mitigating the unique challenges associated with rare diseases. Patient-Focused Initiative The second initiative is a patient-centered 501(c)(3) organization called the National Organization for Rare Disorders (NORD) with the motto, ‘Alone we are rare. Together we are strong’.

It is a patient advocacy organization with more than 300 patient organization members, citing a commitment to identify, treat, and cure rare diseases. It has information for patients and their families, patient organizations, and clinicians and researchers. Figure 2 (below) provides an overview of some of the information and resources provided by NORD with the help of AI tools and platforms.

Fig. 2. An overview of information and resources provided by NORD.

Small patient populations dispersed throughout the globe, a poor understanding of the natural history of a rare disease and its progression, unclear clinical endpoints and patient enrollment, as well as retention challenges, are all inherent issues for rare diseases.

To address these special needs, NORD created a Natural Histories Patient Registry Platform which utilizes a cloud-based modern design that is mobile friendly. It allows patients and organizations to inform and shape medical research of rare diseases by providing high quality data captured by these customized registries. These registries are used to collect the patient data needed to define the natural progression of the disease, which can contribute significantly to better diagnosis, treatment, and product development.

This AI-powered platform helps in the identification of patients for clinical trials, builds connections among patients, supports research goals with inputs from FDA, NIH, and experts in the field, and helps develop standards of care for patients. It allows patients to tell their story, participate in a community, and contribute to a cure.

The Need to Cultivate an Analytics Culture is Now Considering how the move towards patient-centric medicine is growing, the need to cultivate an analytics culture in the treatment of diseases - especially rare diseases - is ubiquitous. And the role that AI will play in this is significant.

In their publication titled, ‘Rare diseases 2030: how augmented AI will support diagnosis and treatment of rare diseases in the future’, Hirsch et al see the development of an integrated medical system that will combine inputs from patients and physicians, medical data repositories such as EHRs, and disease-specific experts, to create a seamless, easy-to-use, ethical platform that both patients and physicians can trust (Fig. 3).

Such a system will take the patient from symptom assessment and diagnosis, all the way through to expert consultation and customized treatment, while continuously recording pertinent medical data and monitoring patient progress.

Insights from these individual cases can then be analyzed in conjunction with other related cases, checked by an expert panel and returned into the system’s knowledge base to enable deep learning that improves with the knowledge added by every case.


Fig. 3. How augmented AI will support diagnosis and treatment of rare diseases in the future. Information sourced from a publication in Pubmed. In the age of AI, patients with rare diseases will have access to applications and software tools that will analyze their symptoms, obtain a diagnosis, and search for a medical expert in the area. Physicians will have access to data repositories that provide ontology-based up-to-date knowledge on rare diseases. These repositories will serve as a benchmark for hybrid intelligent systems regulated by risk-managing methodologies. ML algorithms will analyze these data sets to point out interesting patterns to experts who will explore the biomedical basis of disease causation, and work collaboratively to provide their knowledge back into the system. The AI machinery will then use this proven knowledge to feed back their observations to the experts to support data-driven decisions.

Cultivating an analytics culture in treatment of rare diseases is therefore essential. SEDGE: Using AI and ML to Accelerate an Analytics Culture

While Hirsch’s article proposed that the use of AI in the treatment and diagnosis of rare disease is still years away, SEDGE brings the power of analytics into the here and now. Using AI and ML to sort through multiple data sources of various data types, SEDGE is the easy-to-use solution that gives decision-makers the insights they need to act faster and with greater confidence. SEDGE Analyses of Rare Diseases Here are some examples of how data scientists could use an AI tool like SEDGE to focus their work, understand their data, and help leaders make data-backed decisions. Geographical Location of Top Ten Rare Diseases To determine the geographical locations of the top 10 rare diseases, the data scientists at SEDGE used data found on Pubmed as their data source. More than twenty thousand articles were reviewed, and data on the geographical location of patients with rare disease was extracted from about 700 selected articles using manual curation.

The data was then fed into SEDGE AI data visualization to show the continental distribution of the top ten rare diseases. The contrasting colors of the bar stacks help elucidate the predominance of some diseases in particular locations.

Fig. 4. Continental distribution of the top ten rare diseases in alphabetical order. Note: The data used is from publications in Pubmed and has a bias, as the publications were predominantly from North America, Europe, and Japan. Research on Rare Diseases is Limited In order to determine the research that is being done on rare diseases, SEDGE data scientists first needed a complete list of known rare diseases. For this they referenced the NORD website (7000+ currently listed). Then, with the argument being that the number of articles published on a disease will be reflective of the extent of research being conducted on it, a Pubmed search was conducted to check for publications on each rare disease.

The resultant graph reveals astonishingly low results. Nearly half of the listed rare diseases (47%) have fewer than 100 articles published (combining the 2nd, 3rd, and 4th bar) while almost 10% have no articles at all. It seems that the most rare diseases fall within the 100 - 500 articles range. In fact, once you look for rare diseases with a high volume of published research, the numbers start to dwindle considerably.


Fig. 5. The number of Pubmed publications for each rare disease listed with NORD. The complete list of rare diseases was downloaded from the NORD website, manually curated to remove any redundancy, and then each disease name was checked for the number of Pubmed publications.

Morgellons Disease: Using Data to Understand Research Trends Morgellons, one of the top ten rare diseases, is currently a poorly understood condition characterized by small fibers or other particles emerging from skin sores. Research on this disease has yielded conflicting results with some studies allocating it to infection with Borrelia spirochetes, to others classifying it as a delusional infestation. It is clear that more research is needed to define the disease better.

A good first step towards understanding the status of research on Morgellons disease would be to identify the top researchers working on it, as this would provide information on the current research focus and what potential resources would need to be channelized.

A PubMed search on Morgellons yielded 90 articles. The names of author/s from each article were extracted and analyzed for frequency of occurrence. This was done with SEDGE using “word cloud” functionality. This visual tool is a great way to visually understand vast amounts of data: the more frequently a specific word appears in a source of textual data, the bigger and bolder it appears in the word cloud. In this way it was easy to see who the most popular and prominent authors are.

Fig. 6. SEDGE produced a word cloud showing the predominant authors in the field of Morgellons research. There are also other great ways to visually display data to understand trends within the data; in this instance how many authors have published articles about Morgellons within a certain time period.

Fig. 7. SEDGE displays the top researchers on Morgellons disease shown by the number of articles published.


Fig. 8. SEDGE shows the increase in the number of publications on Morgellons over the past few decades.

The findings of these analyses reveal that the top researchers in this field are Marianne Middelveen and Raphael B. Stricker, both of whom are colleagues, closely followed by John Koo. Using the data within a timeline revealed that there was very little meaningful research less than two decades ago, but that interest has more than doubled since 2016.

The benefit of identifying the top 10 researchers is that it allows people to focus on the work of the most prolific researchers in the field.

SEDGE Saves You Time, Frees up Your Resources, and Speeds Up Results In the above examples, our data scientists used a relatively small dataset. Most scientists and analysts would have to sift through, sort, and analyze vast amounts of mixed data manually to get any kind of insights. This would take a significant amount of time and be prone to human error. Compound that with the time it takes to draw up reports, presentations, and graphs.

However, with a complete AI and ML platform such as SEDGE, all this could be accomplished in a matter of minutes. Additionally, scientists and researchers could work on large datasets and run complete explorations, such as statistical analysis and data mining, without the need for a massive data science team. That would mean considerable savings in valuable resources and manpower, which could then be freed up to do other more impactful and profitable tasks that aid in the treatment of rare diseases.

So, for a much faster, more accurate and effective identification of trends in data, you need SEDGE. With SEDGE you see data-led probabilities in the future, allowing decision-makers to act preemptively for quicker, more accurate diagnoses, and customized patient care.


The information or views expressed in this article is authentic to the best of our knowledge, and as such, it is prone to errors and the absence of some key information, for more information please review our Terms of Use.