Blue Flower

Many standard statistical procedures require normally distributed data. One way to assess if your data is normally distributed is quantile-quantile plot or q-q plot. In this approach quantiles of a tested distribution are plotted against quantiles of a known distribution as a scatter plot. If distributions are similar the plot will be close to a straight line. We will plot our data against a normal distribution to test if our data is distributed normally.

 If you have Scientific Python distributions installed fire up a Jupyter notebook and paste code below in a cell.

import numpy as np
import numpy.random as random
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

data=pd.read_clipboard(header=None).values.flatten()
data.sort()
norm=random.normal(0,2,len(data))
norm.sort()
plt.figure(figsize=(12,8),facecolor='1.0') 

plt.plot(norm,data,"o")

#generate a trend line as in http://widu.tumblr.com/post/43624347354/matplotlib-trendline
z = np.polyfit(norm,data, 1)
p = np.poly1d(z)
plt.plot(norm,p(norm),"k--", linewidth=2)
plt.title("Normal Q-Q plot", size=28)
plt.xlabel("Theoretical quantiles", size=24)
plt.ylabel("Expreimental quantiles", size=24)
plt.tick_params(labelsize=16)
plt.show()

I assume you have your data in Excel or other spreadsheet application. Copy it into a clipboard then run the Python script in Jupyter notebook. The script will

  • import data from clipboard as described in Read clipboard data with Pandas
  • flatten the array
  • sort it
  • generate an array of normally distributed random values
  • sort it too and
  • use two arrays as X and Y coordinates for a scatter plot.

Here is a plot for my not too normally distributed data

Quantile - quantile (q-q) plot with Python, numpy and matplotlib

Leave comment
Please register to add comment