Many standard statistical procedures require normally distributed data. One way to assess if your data is normally distributed is quantile-quantile plot or q-q plot. In this approach quantiles of a tested distribution are plotted against quantiles of a known distribution as a scatter plot. If distributions are similar the plot will be close to a straight line. We will plot our data against a normal distribution to test if our data is distributed normally.
IPython notebook is great for prototyping, making plots, interactive work, but loading data from file might feel tedious when all you want is to quickly test an idea or generate a plot. It would be nice to copy the data to clipboard from Excel or another spreadsheet application and access it from Python script. Sure you can paste it right into notebook as a multi line string in triple quotes and then split and convert into numbers. But it clutters your code. Clipboard access is usually available from GUI tool kits you don't want to use and might be system dependent, making your code not portable.
Databases are not just for storing data. If you know SQL, you begin to miss its features while working with Excel. Sure you can sort your data in Excel, but if you need to sort by an expression on several fields you have to add a calculated column. Combining data from several sheets is even more problematic. Sure you can run SQL queries on Excel file, but it is rather complicated.
Graphic programs with an intuitive user interface, such as Microsoft Excel, have allowed millions of people to use computers without learning how to program, but they add enough features over time that the user interface becomes so complex that it is not intuitive anymore. Users never use some features because they just cannot find them. On the other hand, programming languages have evolved to be simple and powerful. They are easy to learn, don't change with every version of software, and can express infinitely complex ideas - unlike graphic user interface. It is about time we switch from expensive proprietary software to the free scientific Python stack and one of its gems - the Matplotlib charting library. In this post, I will get you up to speed with one of the most popular plot types - the line plot with error bars.