Assessing citation integrity in biomedical publications: corpus annotation and NLP models

Overall rating: (4.0) 1 review
Authors: Maria Janina Sarol, Shufan Ming, Shruthan Radhakrishna, Jodi Schneider, Halil Kilicoglu
Journal: Bioinformatics
First published: 2024
Number of citations: 1
Type: Journal Article
DOI: 10.1093/bioinformatics/btae420
Licence: https://creativecommons.org/licenses/by/4.0/

Abstract

Abstract

Motivation
Citations have a fundamental role in scholarly communication and assessment. Citation accuracy and transparency is crucial for the integrity of scientific evidence. In this work, we focus on quotation errors, errors in citation content that can distort the scientific evidence and that are hard to detect for humans. We construct a corpus and propose natural language processing (NLP) methods to identify such errors in biomedical publications.

Results
We manually annotated 100 highly-cited biomedical publications (reference articles) and citations to them. The annotation involved labeling citation context in the citing article, relevant evidence sentences in the reference article, and the accuracy of the citation. A total of 3063 citation instances were annotated (39.18% with accuracy errors). For NLP, we combined a sentence retriever with a fine-tuned claim verification model to label citations as ACCURATE, NOT_ACCURATE, or IRRELEVANT. We also explored few-shot in-context learning with generative large language models. The best performing model—which uses citation sentences as citation context, the BM25 model with MonoT5 reranker for retrieving top-20 sentences, and a fine-tuned MultiVerS model for accuracy label classification—yielded 0.59 micro-F1 and 0.52 macro-F1 score. GPT-4 in-context learning performed better in identifying accurate citations, but it lagged for erroneous citations (0.65 micro-F1, 0.45 macro-F1). Citation quotation errors are often subtle, and it is currently challenging for NLP models to identify erroneous citations. With further improvements, the models could serve to improve citation quality and accuracy.

Availability and implementation
We make the corpus and the best-performing NLP model publicly available at https://github.com/ScienceNLP-Lab/Citation-Integrity/.

Reviews

4.0

Based on 1 review

Informative Title

100%

Appropriate

Slightly Misleading

Exaggerated

Methods

100%

Sound

Questionable

Inadequate

Statistical Analysis

100%

Appropriate

Some Issues

Major concerns

Data Presentation

100%

Complete and Transparent

Minor Omissions

Misrepresented

Discussion

100%

Appropriate

Slightly Misleading

Exaggerated

Limitations

100%

Appropriately acknowledged

Minor Omissions

Inadequate

Data Available

100%

Completely Available

Partial data available

Not Open Access

GreyPhMeter Jul 23, 2025

Overall, this was a bit of a unique paper in that PLMs and LLMs competed against humans on a research evaluation task that is rarely investigated. I found the authors' choice of problem and methods sound (though I'm a bit of a newbie in this area). I appreciated reporting multiple metrics like recall, precision, macro F1, and micro F1. At times, there were results in the Discussion added haphazardly and the authors could have clearly stated the claims arising from their results. I had to conjecture that the central claims were myself. It may also be nice to play around with the Leung et al. 2017 article and related cases to see if these automated systems can catch older errors. Figures showing how each model works would have been helpful as well, especially MultiVerS.

Assessing citation integrity in biomedical publications: corpus annotation and NLP models

Sarol M. Ming S. Radhakrishna S. Schneider J. Kilicoglu H (2024). Assessing citation integrity in biomedical publications: corpus annotation and NLP models. Bioinformatics, 40(7), https://doi.org/10.1093/bioinformatics/btae420

Abstract

Reviews

Informative Title

Methods

Statistical Analysis

Data Presentation

Discussion

Limitations

Data Available