healthcare worker working at computer
Getty Images

(HealthDay News) — Large language model-based solutions can enhance clinical trial screening performance and reduce costs by automating the screening process, according to a study published online June 17 in NEJM AI.

Ozan Unlu, MD, from Brigham and Women’s Hospital in Boston, and colleagues evaluated the utility of a Retrieval-Augmented Generation (RAG)-enabled GPT-4 system to improve the accuracy, efficiency, and reliability of screening for a trial involving patients with symptomatic heart failure. Findings were based on clinical notes from 100, 282 and 1,894 patients for development, validation and test datasets, respectively.

The researchers reported that answers from the RAG-Enabled Clinical Trial Infrastructure for Inclusion Exclusion Review (RECTIFIER), a clinical note-based, question-answering system powered by RAG and GPT-4, closely aligned with the expert clinicians’ answers across the target criteria, with accuracy ranging from 97.9 to 100% (Matthews correlation coefficient [MCC], 0.837 and 1) for RECTIFIER versus 91.7 and 100% (MCC, 0.644 and 1) for the study staff. Compared with study staff, RECTIFIER performed better at determining symptomatic heart failure (accuracy 91.7 and 97.9%, respectively; MCC, 0.721 and 0.924). With RECTIFIER, the sensitivity and specificity for determining patient eligibility were 92.3 and 93.9%, respectively, versus 90.1 and 83.6%, respectively, for the study staff. The single-question approach to determining eligibility with RECTIFIER resulted in an average cost of 11 cents per patient, and the combined-question approach resulted in an average cost of 2 cents per patient.

“Integrating such technologies requires careful consideration of potential hazards and should include safeguards such as final clinician review,” the authors write.

Several authors disclosed ties to the pharmaceutical industry.

Abstract/Full Text