Researchers at MUSC Hollings Cancer Center have developed an artificial intelligence (AI) tool that can “read” doctors’ notes to identify the original source of cancers that spread to the brain, paving the way for more precise treatments.

Published in JCO Clinical Cancer Informatics, the study was led by Jihad Obeid, M.D., and Mario Fugal, Ph.D. Their team used natural language processing (NLP)—a form of AI that interprets human language—to extract key diagnostic details from clinical notes. This approach outperformed traditional medical coding systems, accurately identifying the primary cancer type in over 90% of cases.

“Medical records were never designed for research. They are often messy and imperfect,” said Obeid. “But if we can make sense of them, we can improve both research and patient care.”

Why It Matters

Most brain tumors treated with stereotactic radiosurgery (SRS) are metastases—cancers that originate elsewhere in the body and travel to the brain. Knowing where the cancer started is essential because different cancer types respond differently to radiation.

  • Lung cancers are highly sensitive to radiation and require lower doses.
  • Kidney cancers are more resistant and need prolonged treatment.

Tailoring therapy based on the primary cancer type reduces risks like radiation-induced brain damage while improving outcomes. However, the details needed for such precision are often buried in clinical notes, making them difficult to access quickly.

How the AI Works

The NLP tool was trained to scan doctors’ notes for keywords and phrases that reveal the cancer’s origin, such as “ductal” for breast cancer or “melanoma” for skin cancer.

In a dataset of 82,000 clinical notes from 1,400 patients treated with SRS, the AI achieved:

  • 97% accuracy for common cancers like lung, breast, and skin.
  • Successful identification of lung cancer subtypes, which standard International Classification of Diseases (ICD) codes cannot distinguish.

“The clinical note is the closest to the truth,” said Fugal. “It captures the nuance that ICD codes lack.”

A Lightweight, Scalable Solution

Unlike large generative AI models, this NLP approach is simple, efficient, and ethical:

  • It doesn’t require massive datasets or advanced computing power.
  • It can be easily adopted by hospitals with limited resources.

“This approach is lightweight and scalable,” said Obeid. “Other hospitals could use it to improve classification and care without adding to doctors’ workloads.”

What’s Next?

The team is now applying NLP to predict patients at risk of radiation necrosis—a rare but serious complication of SRS. They also envision expanding the tool to other cancer types, integrating imaging and lab data for even richer insights.

“Automating data extraction from clinical notes builds accurate, up-to-date datasets,” Obeid said. “That opens the door to faster discoveries and truly personalized care.”


About MUSC Hollings Cancer Center
MUSC Hollings Cancer Center is South Carolina’s only National Cancer Institute-designated cancer center, with more than 230 faculty cancer scientists and over 200 active clinical trials. Learn more at hollingscancercenter.musc.edu.

Journal: JCO Clinical Cancer Informatics
DOI: 10.1200/CCI-24-00268
Article Title: Classifying Stereotactic Radiosurgery Patients by Primary Diagnosis Using Natural Language Processing