•  
  •  
 

DOI

https://doi.org/10.7275/qf69-7k43

Abstract

There has been much debate in the literature regarding what to do with extreme or influential data points. The goal of this paper is to summarize the various potential causes of extreme scores in a data set (e.g., data recording or entry errors, motivated mis-reporting, sampling errors, and legitimate sampling), how to detect them, and whether they should be removed or not. Another goal of this paper was to explore how significantly a small proportion of outliers can affect even simple analyses. The examples show a strong beneficial effect of removal of extreme scores. Accuracy tended to increase significantly and substantially, and errors of inference tended to drop significantly and substantially once extreme scores were removed.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Share

COinS