Artificial intelligence (AI) overall, and natural language processing (NLP) in particular, have been shown to aid in patient chart summarization.[1] The academic literature on AI is constantly exposing new uses for the technology for understanding medical records, and using that data to automate tasks and assist health-care workers.[2] AI’s presence in the popular press is even more astounding. This includes articles written about AI and, more astonishingly, articles written by AI models.[3] In almost all of these contexts, AI is often described as a monolithic technology. It is portrayed as a magical “black box,” where raw data flows in and useful insights flow out. The technology is referred to as “the AI,” “the model,” or “the platform,” or even anthropomorphized as “he,” “she,” Siri, Alexa,® or ChatGPT.® Although convenient, and useful for marketing, such a presentation of the technology has drawbacks. It makes it harder for AI consumers to understand what the technology is doing, identify what solutions are best for specific tasks, and determine how AI can best be integrated into existing uses, cases, and workflows. More generally, it mystifies the technology into something exciting yet intimidating. In this paper, we will pull back the curtain on a clinical AI platform, show its component parts, as an illustrative example of how the technology works.
Many commercial grade AI solutions are not one single model, but rather a series of modeling technologies working in concert. The work of “being intelligent” is broken down into a set of rudimentary precursor tasks, often performed sequentially to create what is commonly called an AI pipeline. Understanding these discrete tasks is the key to understanding how AI works.
To illustrate how commercial AI solutions could be composed of discrete functional elements, we describe below the example of KAID Health’s® artificial intelligence pipeline for summarizing unstructured medical records. Via the KAID Health AI pipeline, an unstructured medical note is ingested by a computer, clinical facts are extracted, and insights about the patient that users request are reported. As we have previously written about,[4] such a toolkit can streamline billing & coding, medical management, care management, medical necessity review, and more.
To understand the KAID pipeline, let us imagine we want to list out all the diseases a patient has. Her entire medical chart is a single PDF document with the just the following paragraph:
“Family history of kidney and cardiovascular disease. Pertinent negatives include HTN, HL. Declining eGFR, now 48 ml min 1.7m2, suggests at least CKD 3A, and puts her at long term risk for ESRD. Recommend renal dosing Metformin if EGFR continues to fall, or replacing with less nephrotoxic glycemic control regimen.”
To analyze this paragraph, KAID Health’s pipeline has seven-steps. To aid in recollection, we have named each step with a phrase beginning with “C.” We will keep the “Seven Cs” puns to a minimum as we sail along.
- Character recognition. First, for the sentence to be analyzed, the words in the PDF need to be turned into computer readable text. To do this, a specific AI tool called optical character recognition (OCR) translates the visual representation of the words into characters. For example, the pictures of the letters that make up the word “declining,” is translated into a “d,” then an “e,” then a “c,” and so on. There are a variety of OCR models readily available, each with their own features, limitations, and costs.
- Culling concepts. The next step is extracting those elements in the text that are clinical concepts. There are several clinical concepts in the above text, including CKD, ESRD, renal disease, cardiovascular disease, eGFR, and Metformin. To do this, a form of AI called span classification tags the words that represent the clinical facts. It is worth noting a concept need not be a contiguous string of words. For example, in the patient’s family history, renal disease is expressed as “renal or cardiovascular disease.” Two concepts, renal and cardiovascular disease, are compressed into one phase by the author for conciseness.
It is important to note the power of AI versus say a clinical dictionary in recognizing clinical concepts. By using context, AI can recognize clinical concepts it has never seen before, just like a human reviewer. In 2020, KAID Health’s AI understood that the sentence “Patient appears to have Sars-CoV-2” referred to a new disease, despite the model not having been exposed to the phrase “Sars-CoV-2” or “COVID” during training. To cull concepts from medical text via span classification, a transformer-based model such as BERT can help. They are reasonably affordable to operate for large bodies of text. - Characterizing concepts. After the concepts are identified, another AI model is used to classify the concepts into categories such as diagnoses (e.g., CKD and ESRD), lab values such as eGFR, or medications (e.g., Metformin). There are numerous technological approaches to this. The most advanced ones rely on both the term itself and the context. The term “Wilms tumor” almost always refers to a type of renal cancer, regardless of the rest of sentence; however, this is not the case for every term. The phrase, “patient’s high cholesterol puts her at risk,” refers to a laboratory finding. In, “have been treating patient’s high cholesterol,” the words “high cholesterol” refer to the condition hypercholesterolemia. As with the previous step of culling the concepts, many transformer-based AI models can perform this task well.
- Clarifying concepts. Medical text is messy, and ambiguities abound. In our example paragraph, the phrase “EGFR” refers to estimated glomerular filtration rate. This is a measure of kidney function. The same four letters in a cancer patient’s chart, “EGFR” can refer to a mutation in the EGFR gene of the tumor. Other sources of ambiguity besides abbreviations include nonspecific terms such as “cortical atrophy.” Adrenal cortical atrophy can be distinguished from cortical atrophy of the brain only through consideration of the context. To add clarity in these ambiguous situations, it is worthwhile to engage more sophisticated types of AI, such as proprietary solutions and/or the new large language models (LLMs), to process the sentence or paragraph of interest. Although it is not practical to have an such advanced models digest an entire EMR’s contents, judiciously passing small pieces of text through these more expensive models can significantly increase accuracy for a modest investment.
- Coding concepts. For data from medical notes to be usable for analytics, it must be expressed as medical codes. In the sample paragraph above, the “CKD 3A” is best represented by the ICD-10 code, N18.31. The EGFR could be represented as a LOINC code, 98979-8 or 98980-6, and the Metformin as a set of RxNorm codes. Without such coding, searching the data is practically impossible. Imagine trying to find patients with heart failure. Without codes, one would need to guess all the potential phraseologies used to describe the condition (e.g., “CHF,” “HF,” “heart failure,” “cardiomyopathy,” or aortic insufficiency”). Similarly, for adult-onset diabetes, one would need to look for “T2DM,” “non-insulin dependent diabetes,” “the sugars,” and so on. By coding concepts with an established clinical taxonomy, it becomes possible to quickly find the concepts of interest. Assigning the relevant code to an identified clinical concept requires its own AI model.
- Contextualizing concepts. As expected, many of the clinical concepts in the above medical note are related to one another. The “48 ml min 1.7m2” finding is related to the concept of “EGFR.” Such a relationship between test and value could be called a “value.” The “EGFR” relates to the concept of “CKD 3A” as a “test for” relationship. Again, older transformer-based language models are useful to parse these relationships for most sentences, with more complex sentences benefiting from more advanced solutions including LLMs. Thus, the challenge of the AI pipeline is to not just understand the relationships between the concepts but also quantify the complexity of the sentence. In doing so, the AI pipeline can choose to refer complex sentences to more complex, and hence more expensive models, to analyze.
- Choosing what to include. Finally, the AI pipeline must answer the intended question, which as noted in the above for this example is determining what diseases the patient has. Although “HTN” (hypertension) and “HL” (hyperlipidemia) are in the sample patient’s chart, it says she specifically does not have them. Similarly, the mentioned renal and cardiovascular disease are diseases afflicting family members, not the patient. “ESRD” (end stage renal disease), noted in the sample paragraph, is a disease the patient is at risk for developing in the future; fortunately, it is not something she has today. The only disease noted in the sample is “CKD3.” This is the answer to the question. If the question was broadened to what disease the patient “might have” today, the last sentence suggests the patient has some form of diabetes, as they are on Metformin, a common medication used for diabetes, and needs alternative therapies for controlling blood sugars. Determining the disease the patient actually has noted in their chart, CKD 3A, and those suggested like diabetes, requires yet another set of AI models. Among these, KAID Health uses a proprietary Strength-of-Evidence™ (SOE), a document-classifier model, score to quantify the likelihood a disease found in a medical note does in fact refer to the patient.
As the sample note floats through KAID Health’s Seven Cs AI pipeline, it is digested by several AI technologies. OCR turns it into text, span-classifiers based on transformer models find clinical concepts of interest and link them, more sophisticated (and costly) AI tools like LLMs remove ambiguities and analyze complicated sentences, coding tools translate the words to ICD10 and other medical taxonomies, and document classifier score each clinical attribute in the note for relevance.
Almost all serious AI clinical solutions involve the coordination of multiple AI models and technologies to accomplish their intended task. The specific modules used to detect malignancies in mammograms or identify patients at significant risk of future adverse events from claims will use a different set of underlying components than the KAID Health’s pipeline for summarizing medical records. However, the general approach is the same: break the problem into parts, and then match the different forms of AI to the parts by factoring in both performance and cost.
A typical car buyer often cares little about innovations in the engine, transmission, or safety. However, those responsible for buying a fleet of cars and integrating them into their business may care very much. Purchasers of AI solutions for health-care organizations who acquaint themselves with how the technology works and where each solution is strongest and weakest, are more likely to choose the right solution and best leverage their capabilities. The author Arthur C. Clarke famously said, “any sufficiently advanced technology is indistinguishable from magic.”[5] While clinical AI can certainly seem magical, understanding how they work, and thus their uses and limitations, is certainly possible.
By Kevin Agatstein, Founder & CEO at KAID Health, Inc
Prior to founding KAID, Kevin founded Agate Consulting. Via Agate, and before that McKinsey & Company, Kevin advised providers, payers, healthcare IT companies, life-sciences organizations, and healthcare venture-capital and private-equity firms. Kevin led operations, corporate development, and marketing for CareKey®, a leading medical- management application, acquired by TriZetto in 2005. Kevin holds a degree in chemical engineering from MIT. At MIT, he was also a researcher at the MIT Sloan School of Management.
[1] Wu, H., Wang, M., Wu, J. et al. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022. npj Digit. Med. 5, 186 (2022).
[2] BL Jimma, Artificial intelligence in healthcare: A bibliometric analysis, Telematics and Informatics Reports; 9: 2023.
[3] AI Contentify. AI-generated news articles: accuracy & reliability. Nov 2023. Found online.
[4] Population Health Management, 2023.
[5] Clark AE. Profiles of the Future. 1962.