The reliability of a test score is usually underestimated and the deflation may be profound, 0.40–0.60 units of reliability or 46–71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within the selected measurement model, (3) inefficiency in forming of the score variable (X) as the manifestation of the latent trait θ, (4) non-optimal characteristics of the items (gi) in relation to the estimator, and (5) inefficient weight factor, that is, coefficient correlation (wi) that links θ with the observed values of the test item (xi), (6) a small sample size, (7) extreme test difficulty, and (8) a narrow scale in the score. If willing to maximize the probability that the estimate of reliability would be as close as possible the true, population value, these sources should be avoided, or their effect should be corrected by using deflation-corrected estimators of reliability.