Many standard statistical procedures require normally distributed data. One way to assess if your data is normally distributed is quantile-quantile plot or q-q plot. In this approach quantiles of a tested distribution are plotted against quantiles of a known distribution as a scatter plot. If distributions are similar the plot will be close to a straight line. We will plot our data against a normal distribution to test if our data is distributed normally.
If you have Scientific Python distributions installed fire up a Jupyter notebook and paste code below in a cell.
import numpy as np import numpy.random as random import pandas as pd import matplotlib.pyplot as plt %matplotlib inline data=pd.read_clipboard(header=None).values.flatten() data.sort() norm=random.normal(0,2,len(data)) norm.sort() plt.figure(figsize=(12,8),facecolor='1.0') plt.plot(norm,data,"o") #generate a trend line as in http://widu.tumblr.com/post/43624347354/matplotlib-trendline z = np.polyfit(norm,data, 1) p = np.poly1d(z) plt.plot(norm,p(norm),"k--", linewidth=2) plt.title("Normal Q-Q plot", size=28) plt.xlabel("Theoretical quantiles", size=24) plt.ylabel("Expreimental quantiles", size=24) plt.tick_params(labelsize=16) plt.show()
I assume you have your data in Excel or other spreadsheet application. Copy it into a clipboard then run the Python script in Jupyter notebook. The script will
- import data from clipboard as described in Read clipboard data with Pandas
- flatten the array
- sort it
- generate an array of normally distributed random values
- sort it too and
- use two arrays as X and Y coordinates for a scatter plot.
Here is a plot for my not too normally distributed data