Abstract | ||
---|---|---|
This paper shows that two commonly used evaluation metrics for generative models, the Frechet Inception Distance (FID) and the Inception Score (IS), are biased - the expected value of the score computed for a finite sample set is not the true value of the score. Worse, the paper shows that the bias term depends on the particular model being evaluated, so model A may get a better score than model B simply because model A's bias term is smaller. This effect cannot be fixed by evaluating at a fixed number of samples. This means all comparisons using FID or IS as currently computed are unreliable.We then show how to extrapolate the score to obtain an effectively bias free estimate of scores computed with an infinite number of samples, which we term (F) over bar ID infinity and (I) over barS(infinity). In turn, this effectively bias free estimate requires good estimates of scores with a finite number of samples. We show that using Quasi-Monte Carlo integration notably improves estimates of FID and IS for finite sample sets. Our extrapolated scores are simple, drop-in replacements for the finite sample scores. Additionally, we show that using low discrepancy sequence in GAN training offers small improvements in the resulting generator. The code for calculating (F) over bar ID infinity and (I) over barS is at https: // github.com/mchong6/FID_IS_infinity. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/CVPR42600.2020.00611 | 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) |
DocType | ISSN | Citations |
Conference | 1063-6919 | 0 |
PageRank | References | Authors |
0.34 | 18 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Min Jin Chong | 1 | 2 | 1.03 |
D. A. Forsyth | 2 | 9227 | 1138.80 |