Detection of outliers in the analysis of the data sets dates back to 18th century. Bernoulli (1777) pointed out the practice of deleting the outliers about 200 years ago. Deletion of outliers is not a proper solution to handle the outliers but this remained a common practice in past. To address the problem of outliers in the data, the first statistical technique was developed in 1850 (Beckman and Cook, 1983).
Some of the researchers argued that extreme observations should be kept as a part of data as these observations provide very useful information about the data. For example, Bessel and Baeuer (1838) claimed that one should not delete extreme observations just due to their gap from the remaining data (cited in Barnett, 1978). The recommendation of Legendre (1805) is not to rub out the extreme observations “adjusted too large to be admissible”. Some of the researchers favored to clean the data from extreme observations as they distort the estimates. An astronomer of 19th century, Boscovitch, put aside the recommendations of the Legendre and led them to delete (ad hoc adjustment) perhaps favoring the Pierce (1852), Chauvenet (1863) or Wright (1884). Cousineau and Chartier (2010) said that outliers are always the result of some spurious activity and should be deleted. Deleting or keeping the outliers in the data is as hotly discussed issue today as it was 200 years ago.
Bendre and Kale (1987), Davies and Gather (1993), Iglewicz and Hoaglin (1994) and Barnett and Lewis (1994) have conducted a number of studies to handle issues of outliers. Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a dataset known to be distance based outlier detection technique. Saad and Hewahi (2009) introduced Class Outlier Distance Bases (CODB) outlier’s detection procedure and proved that it is better than distance based outlier’s detection method. Surendra P. Verma (1997) emphasize for detection of outliers in univariate data instead of accommodating the outliers because it provides better estimate of mean and other statistical parameters in an international geochemical reference material (RM).
No comments:
Post a Comment