Evaluating the Reporting Quality of 21,041 Randomized Controlled Trial Articles with Large Language Models: A Large-Scale Transparency Analysis

Rate and Review Access Work

Overall rating: (4.0) 1 review
Authors: Apoorva Srinivasan, Sophia Kivelson, Nadine A. Friedrich, Jacob Berkowitz, Nicholas Tatonetti
Journal: Lecture Notes in Computer Science
First published: 2025
Type: Book Chapter
DOI: 10.1007/978-3-031-95838-0_42
ISBN: 9783031958373, 9783031958380
Licence: https://www.springernature.com/gp/researchers/text-and-data-mining

Reviews

4.0

Based on 1 review

Informative Title

100%

Appropriate

Slightly Misleading

Exaggerated

Methods

100%

Sound

Questionable

Inadequate

Statistical Analysis

100%

Appropriate

Some Issues

Major concerns

Data Presentation

100%

Complete and Transparent

Minor Omissions

Misrepresented

Discussion

100%

Appropriate

Slightly Misleading

Exaggerated

Limitations

100%

Appropriately acknowledged

Minor Omissions

Inadequate

Data Available

100%

Completely Available

Partial data available

Not Open Access

BurgundyPhMeter Jul 16, 2025

I found the study informative and included it in a review I am writing. However, the limitation stated at the end was about the coarse assessment of CONSORT reporting items being included without assessing the quality of the reporting. If this is the case, a casual reader could be greatly misled by skimming the paper. The authors could have randomly sampled from the data and assessed reporting quality. Why was this not the case? I think the numbers may be inflated. We could also compare these high-accuracy values to a recent paper using older models. It seems unlikely that moving from GPT 3.5 to GPT-4 solved all problems: Woelfle, T., Hirt, J., Janiaud, P., Kappos, L., Ioannidis, J. P. A., & Hemkens, L. G. (2024). Benchmarking Human–AI collaboration for common evidence appraisal tools. Journal of Clinical Epidemiology, 175, 111533. doi: 10.1016/j.jclinepi.2024.111533