what are outliers in statistics

what are outliers in statistics

1 year ago 88
Nature

Outliers are data points that differ significantly from other observations in a dataset. They can be an indication of novel data, measurement error, or variability in the measurement. Outliers can cause problems in statistical analyses because they can cause tests to either miss significant findings or distort real results. However, outliers can also contain valuable information about the process under investigation or the data gathering. There is no rigid mathematical definition of what constitutes an outlier, and determining whether or not an observation is an outlier is ultimately a subjective exercise.

There are various methods of outlier detection, some of which are graphical, such as normal probability plots, and others are model-based. Here are some common ways to find outliers in a dataset:

  • Box plots: A box plot is a graphical representation of the distribution of data based on five-number summary, which includes the minimum, first quartile, median, third quartile, and maximum values. Box plots can help identify outliers by showing the distribution of the data and the presence of any extreme values.

  • Interquartile range (IQR): The IQR is the range between the first quartile (Q1) and the third quartile (Q3) of the data. Any data point that falls below Q1 - 1.5(IQR) or above Q3 + 1.5(IQR) is considered an outlier.

  • Z-scores: A z-score measures how many standard deviations a data point is from the mean. Any data point with a z-score greater than 3 or less than -3 is often considered an outlier.

  • Subject-area knowledge: Outlier detection depends on subject-area knowledge and an understanding of the data collection process. For example, if you were measuring childrens nose length, a child with an unusually long nose might be an outlier.

Its important to note that outliers should be investigated carefully, as they can contain valuable information about the data.

Read Entire Article