A Dataset Of Peer Reviews (Peerread): Collection, Insights And NLP Applications

May 16, 2018, 5:10 p.m. By: Kirti Bakshi


Peer reviewing is a central component of the scientific publishing process. The first public dataset of scientific peer reviews is made available for research purposes (PeerRead v1), providing an opportunity to study this important artefact.

The data collection process and report interesting observed phenomena are shown in the peer reviews. They propose two novel NLP tasks based on this dataset and provide simple baseline models. In the first task, compared to the majority baseline simple models are shown to predict whether a paper is accepted with up to 21% error reduction. In the second task, they also predict the numerical scores of review aspects and show that simple models can outperform the mean baseline for aspects with high variance such as ‘originality’ and ‘impact’.


Prestigious scientific venues use peer reviewing to decide which papers to include in their journals or proceedings. While this process seems essential to scientific publication, it is often a subject of debate. Recognizing the important consequences of peer reviewing, several researchers studied various aspects of the process, including consistency, bias, author response and general review quality.

The goal of this paper is to lower the barrier to studying peer reviews for the scientific community by introducing The first public dataset of peer reviews for research purposes: PeerRead.

There are three strategies used here to construct the dataset:

  • Collaboration with conference chairs and conference management systems to allow authors and reviewers to opt-in their paper drafts and peer reviews, respectively.

  • Crawl publicly available peer reviews and annotate textual reviews with numerical scores for aspects such as ‘clarity’ and ‘impact’.

  • Crawl arXiv submissions which coincide with important conference submission dates and check whether a similar paper appears in proceedings of these conferences at a later date.

In total, the dataset including a subset of 3K papers for which 10.7K textual reviews are written by experts consists of 14.7K paper drafts and the corresponding accept/reject decisions. Periodic releases of PeerRead are planned to be made every year, adding more sections for new venues. More details on data collection are provided in §2.

The PeerRead dataset can be used in a variety of ways. A quantitative analysis of the peer reviews can provide insights to help better understand (and potentially improve) various nuances of the review process.

Organization and Preprocessing:

They organize v1.0 of the PeerRead dataset in five sections:

  • CoNLL 2016,

  • ACL 2017,

  • ICLR 2017,

  • NIPS 2013–2017

  • arXiv 2007–2017.14

Since the data collection varies across sections, different sections may have different license agreements. The papers in each section are further split into standard training, development and test sets with 0.9:0.05:0.05 ratios. In addition to the PDF file of each paper, they also extract its textual content using the Science Parse library.15 Each of the splits is represented as a json-encoded text file with a list of paper objects, each of which consists of paper details, accept/reject/probably-reject decision, and a list of reviews.


Introduction of PeerRead, the first publicly available peer review dataset for research purposes, containing 14.7K papers and 10.7K reviews. Analyzed the dataset, showing interesting trends such as a high correlation between overall recommendation and recommending an oral presentation. Also defined two novel tasks based on PeerRead:

  • predicting the acceptance of a paper based on textual features

  • predicting the score of each aspect in a review based on the paper and review contents.

The experiments performed experiments show that certain properties of a paper, such as having an appendix, are correlated with higher acceptance rate. The primary goal is to motivate other researchers to explore these tasks and develop better models that outperform the ones used in this work. More importantly, it is hoped that other researchers will identify novel opportunities which have not been explored yet in order to analyze the peer reviews in this dataset. As a concrete example, it would be interesting to study if the accept/reject decisions reflect author demographic biases.

For More Information: GitHub

Link To PDF: Click Here