Evaluating the Reporting Quality of 21,041 Randomized Controlled Trial Articles with Large Language Models: A Large-Scale Transparency Analysis
Srinivasan A. Kivelson S. Friedrich N. Berkowitz J. Tatonetti N (2025). Evaluating the Reporting Quality of 21,041 Randomized Controlled Trial Articles with Large Language Models: A Large-Scale Transparency Analysis. Lecture Notes in Computer Science, 428-437. https://doi.org/10.1007/978-3-031-95838-0_42
- Overall rating
-
- Authors
- Apoorva Srinivasan, Sophia Kivelson, Nadine A. Friedrich, Jacob Berkowitz, Nicholas Tatonetti
- Journal
- Lecture Notes in Computer Science
- First published
- 2025
- Type
- Book Chapter
- DOI
- 10.1007/978-3-031-95838-0_42
- ISBN
- 9783031958373, 9783031958380
Reviews
Informative Title
Methods
Statistical Analysis
Data Presentation
Discussion
Limitations
Data Available
I found the study informative and included it in a review I am writing. However, the limitation stated at the end was about the coarse assessment of CONSORT reporting items being included without assessing the quality of the reporting. If this is the case, a casual reader could be greatly misled by skimming the paper. The authors could have randomly sampled from the data and assessed reporting quality. Why was this not the case? I think the numbers may be inflated. We could also compare these high-accuracy values to a recent paper using older models. It seems unlikely that moving from GPT 3.5 to GPT-4 solved all problems: Woelfle, T., Hirt, J., Janiaud, P., Kappos, L., Ioannidis, J. P. A., & Hemkens, L. G. (2024). Benchmarking Human–AI collaboration for common evidence appraisal tools. Journal of Clinical Epidemiology, 175, 111533. doi: 10.1016/j.jclinepi.2024.111533