Saturday, June 2, 2018

What is Outlier?

  Manoj       Saturday, June 2, 2018

🔎 Introduction

In statistics, an outlier is a data point that lies far away from the rest of the observations. Outliers can distort averages, inflate variances, and sometimes reveal interesting hidden patterns.


📘 Definition of Outlier

An outlier is an observation that deviates significantly from the other values in a dataset. It may be caused by measurement error, unusual experimental conditions, or genuine variability.


⚠️ Why Are Outliers Important?

  • They can influence mean and standard deviation, giving misleading results.
  • They may indicate errors in data collection.
  • Sometimes they represent rare but important events (e.g., fraud detection, anomalies).

🛠 Methods to Detect Outliers

1. Z-Score Method

A data point is considered an outlier if its Z-score is very high (commonly greater than 3 in absolute value):

$$ Z = \frac{x - \bar{x}}{s} $$

where \(\bar{x}\) is the mean and \(s\) is the standard deviation.

2. Interquartile Range (IQR) Rule

Compute quartiles \(Q_1\) and \(Q_3\), then the interquartile range:

$$ IQR = Q_3 - Q_1 $$

An observation is an outlier if it lies below \(Q_1 - 1.5 \times IQR\) or above \(Q_3 + 1.5 \times IQR\).

3. Boxplot Visualization

A boxplot is a simple way to visualize outliers. Points outside the whiskers are potential outliers.

                 
Figure 1: Boxplot with outliers shown as individual points beyond the whiskers.



📊 Example

Suppose we have exam scores: 45, 48, 50, 52, 55, 57, 95.

  • The score 95 lies much higher than the rest.
  • Z-score method: \(Z \approx 3.2\), so it is an outlier.
  • IQR method: 95 > \(Q_3 + 1.5 \times IQR\), so it is also flagged as an outlier.

🛠 Handling Outliers

  • Investigate whether it is a data entry or measurement error.
  • If genuine, decide whether to keep or remove it, depending on study goals.
  • Sometimes use robust statistics (median, IQR) instead of mean & variance.

📝 Key Takeaways

  • Outliers are observations that deviate significantly from the rest of the data.
  • They affect mean, variance, and regression results.
  • Detection methods include Z-scores, IQR rule, and boxplots.
  • Always investigate outliers before deciding to remove them.

👉 Related Posts


Thank you for reading!
Like, Share, and Subscribe for more Statistics Lectures.
Drop your questions in the comments 👇


logoblog

Thanks for reading What is Outlier?

Previous
« Prev Post

No comments:

Post a Comment

Statistics becomes simple when we share and discuss. Drop your questions or suggestions below — I’ll be happy to respond!