python箱线图代码找出异常_matplotlib中的箱线图:标记和异常值

论坛 期权论坛 编程之家     
选择匿名的用户   2021-5-28 15:22   26   0

1586010002-jmsa.png

I have some questions about boxplots in matplotlib:

Question A. What do the markers that I highlighted below with Q1, Q2, and Q3 represent? I believe Q1 is maximum and Q3 are outliers, but what is Q2?

FBqdp.png

Question B How does matplotlib identify outliers? (i.e. how does it know that they are not the true max and min values?)

解决方案

Here's a graphic that illustrates the components of the box from a stats.stackexchange answer. Note that k=1.5 if you don't supply the whis keyword in Pandas.

ty5wN.png

The boxplot function in Pandas is a wrapper for matplotlib.pyplot.boxplot. The matplotlib docs explain the components of the boxes in detail:

Question A:

The box extends from the lower to upper quartile values of the data, with a line at the median.

i.e. a quarter of the input data values is below the box, a quarter of the data lies in each part of the box, and the remaining quarter lies above the box.

Question B:

whis : float, sequence, or string (default = 1.5)

As a float, determines the reach of the whiskers to the beyond the

first and third quartiles. In other words, where IQR is the

interquartile range (Q3-Q1), the upper whisker will extend to last

datum less than Q3 + whis*IQR). Similarly, the lower whisker will

extend to the first datum greater than Q1 - whis*IQR. Beyond the

whiskers, data are considered outliers and are plotted as individual

points.

Matplotlib (and Pandas) also gives you a lot of options to change this default definition of the whiskers:

Set this to an unreasonably high value to force the whiskers to show

the min and max values. Alternatively, set this to an ascending

sequence of percentile (e.g., [5, 95]) to set the whiskers at specific

percentiles of the data. Finally, whis can be the string 'range' to

force the whiskers to the min and max of the data.

分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:3875789
帖子:775174
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP