Evaluating the Reporting Quality of 21,041 Randomized Controlled Trial Articles with Large Language Models: A Large-Scale Transparency Analysis

Srinivasan A. Kivelson S. Friedrich N. Berkowitz J. Tatonetti N (2025). Evaluating the Reporting Quality of 21,041 Randomized Controlled Trial Articles with Large Language Models: A Large-Scale Transparency Analysis. Lecture Notes in Computer Science, 428-437. https://doi.org/10.1007/978-3-031-95838-0_42

Overall rating
(4.0) 1 review
Authors
Apoorva Srinivasan, Sophia Kivelson, Nadine A. Friedrich, Jacob Berkowitz, Nicholas Tatonetti
Journal
Lecture Notes in Computer Science
First published
2025
Type
Book Chapter
DOI
10.1007/978-3-031-95838-0_42
ISBN
9783031958373, 9783031958380

Reviews

Informative Title

100%
Appropriate
Slightly Misleading
Exaggerated

Methods

100%
Sound
Questionable
Inadequate

Statistical Analysis

100%
Appropriate
Some Issues
Major concerns

Data Presentation

100%
Complete and Transparent
Minor Omissions
Misrepresented

Discussion

100%
Appropriate
Slightly Misleading
Exaggerated

Limitations

100%
Appropriately acknowledged
Minor Omissions
Inadequate

Data Available

100%
Completely Available
Partial data available
Not Open Access

Sign in to add a review. Help the research community by sharing your assessment of this book-chapter.

BurgundyPhMeter Jul 16, 2025

I found the study informative and included it in a review I am writing. However, the limitation stated at the end was about the coarse assessment of CONSORT reporting items being included without assessing the quality of the reporting. If this is the case, a casual reader could be greatly misled by skimming the paper. The authors could have randomly sampled from the data and assessed reporting quality. Why was this not the case? I think the numbers may be inflated. We could also compare these high-accuracy values to a recent paper using older models. It seems unlikely that moving from GPT 3.5 to GPT-4 solved all problems: Woelfle, T., Hirt, J., Janiaud, P., Kappos, L., Ioannidis, J. P. A., & Hemkens, L. G. (2024). Benchmarking Human–AI collaboration for common evidence appraisal tools. Journal of Clinical Epidemiology, 175, 111533. doi: 10.1016/j.jclinepi.2024.111533