We summarized several useful links and references for ROC and precision-recall with simple descriptions in this page. Please note some pages of this site also have their own reference section.
Our paper
The main concept and all the contents of this site are based on our paper (Saito2015).
[Saito2015] Takaya Saito and Marc Rehmsmeier (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 10(3):e011843
Links for the paper (external sites)
- PLOS ONE web site (DOI: 10.1371/journal.pone.0118432)
- PubMed (PMID: 25738806)
- PubMed Central (PMCID: PMC4349800)
Download the paper (Google Drive)
Download the citation of the paper (PLOS ONE site)
Our tool
We have developed a CRAN library called Precrec that calculates fast and accurate precision-recall curves (Saito2016).
[Saito2016] Takaya Saito and Marc Rehmsmeier (2016) Precrec: fast and accurate precision-recall and ROC curve calculations in R. Bioinformatics.
Links for the paper (external sites)
Links for the libary (external sites)
Other relevant papers
We have specifically selected the following six papers among the papers that are related to classifier evaluations with imbalance datasets,
This milestone article by Davis and Goadrich (Davis2006) established a deep connection between ROC and precision-recall curves.
[Davis2006] Jesse Davis and Mark Goadrich (2006) The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd international conference on Machine learning. pp.233–240
He and Garcia summarized and discussed important aspects and potential issues in the field of machine learning with imbalanced datasets (He2009).
[He2009] Haibo He and Edwardo Garcia (2009) Learning from Imbalanced Data. IEEE Trans Knowl Data Eng. 21(9):1263-1284
The article by Berrar and Flach (Berrar2012) showed pitfalls of the ROC plot especially for the AUC (area under the curve) scores when they are used in microarray analysis .
[Berrar2012] Daniel Berrar and Peter Flach (2012) Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them). Brief Bioinform. 13(1):83–97
Although our analysis strongly indicate that precision-recall plots are more informative than the other plots when used with imbalanced datasets, several ROC alternatives, such as Cost Curves (CC) (Drummond2000) and Concentrated ROC (CROC) (Swamidass2010), can be used additionally in some cases.
[Drummond2000] Chris Drummond and Robert Holte (2000) Explicitly Representing Expected Cost: An Alternative to ROC Representation. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp.198–207
[Swamidass2010] S. Joshua Swamidass, Chloe-Agathe Azencott, Kenny Daily and Pierre Baldi (2010) A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics. 26(10):1348–1356
This powerful tool called pROC developed by Robin et al. (Robin2011) is a good solution for some complicated ROC curve related analysis.
[Robin2011] Xavier Robin, Natacha Turck, Alexandre Hainard, Natalia Tiberti, Frédérique Lisacek, Jean-Charles Sanchez and Markus Müller (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 12:77
Moreover, our Tools page summarizes several import and useful tools to calculate ROC, precision-recall, and ROC alternative curves.
Wonderful resource. Very helpful. Thank you.
LikeLike