AI Chatbots Surpass Human Teams in Medical Data Analysis
Why Predicting Preterm Birth Is So Important
Quicker and more advanced data analysis has the potential to significantly improve diagnostic tools for preterm birth—the leading cause of newborn mortality and a major factor in long-term motor and cognitive impairments. In the United States alone, nearly 1,000 babies are born prematurely each day.
Despite its serious consequences, the underlying causes of preterm birth remain poorly understood. To investigate further, Sirota’s research team gathered microbiome data from approximately 1,200 pregnant women across nine different studies, monitoring each pregnancy through delivery.
Figure 1. AI Chatbots Outperform Human Researchers in Medical Data Analysis.
“This type of research depends on open data sharing, bringing together the experiences of many women and the expertise of numerous researchers,” explained Tomiko T. Oskotsky, MD, co-director of the March of Dimes Preterm Birth Data Repository, associate professor at UCSF BCHSI, and co-author of the study. Figure 1 shows AI Chatbots Outperform Human Researchers in Medical Data Analysis.
The extensive size and complexity of the dataset posed major analytical challenges. To overcome this, the team turned to a global competition called DREAM (Dialogue on Reverse Engineering Assessment and Methods). Sirota co-led one of three DREAM pregnancy challenges, concentrating on vaginal microbiome data. Over 100 research teams from around the world participated, developing machine learning models to detect patterns associated with preterm birth. While most groups completed their work within the three-month competition period, synthesizing the results and publishing the findings took nearly two years.
Testing Generative AI on Pregnancy Data
To explore whether artificial intelligence could speed up the research process, Sirota’s team collaborated with scientists led by Adi L. Tarca, PhD, co-senior author and professor at the Center for Molecular Medicine and Genetics at Wayne State University in Detroit, Michigan. Tarca had previously led the other two DREAM challenges, which aimed to improve techniques for determining pregnancy stage.
Together, the researchers tasked eight AI systems with independently developing algorithms using the same datasets from the three DREAM challenges—without any human programming assistance.
Each system was given carefully structured natural language prompts. Much like ChatGPT, the AI tools were guided through detailed written instructions, ensuring they analyzed the medical data in a manner comparable to the original DREAM participants.
The goal mirrored that of the initial competition: assess vaginal microbiome data to identify signs of preterm birth and analyze blood or placental samples to estimate gestational age. Although pregnancy dating is typically an estimate, it is essential for clinical decision-making. Inaccurate estimates can make it difficult for healthcare providers to properly prepare for labor and delivery.
Once the AI-generated code was tested on the datasets, the results were striking. Four out of the eight AI systems produced prediction models that matched the performance of the DREAM teams’ models—and in certain cases, even surpassed them. Remarkably, the entire generative AI study, from initial idea to journal submission, was completed in just six months.
Source: NETWORK WORLD
Cite this article:
Priyadharshini S (2026), AI Chatbots Surpass Human Teams in Medical Data Analysis, AnaTechMaz, pp.182

