This has taken a lot longer than I thought it would. But I now have a rough, early draft of a cheat sheet on the Python pandas DataFrame object. When dealing with such a rich environment, it is a challenging decision on what to include and exclude in a four page set of notes. As always, comments and suggestions for improvement are always welcome.
I have been using the R package stl() to undertake seasonal decompositions of time series data. However, it removes much more signal/noise than the approach used by by the Australian Bureau of Statistics. Because it produces a trend series that is much less noisy than that from the ABS, it is slower to identify turning points in the data (as can be seen in the next three charts).
I have been playing with some of my utility functions lately. The following function is one I am testing with data from the Australian Bureau of Statistics. I use it to load an ABS spreadsheet from my local file system into an R data frame. Almost all ABS spreadsheets have the meta data and actual data in the same row/column format.When the ABS releases new data, my usual practice is to download the spreadsheet to my local hard drive.
On the weekend I began exploring a non-linear model for aggregating polling. My test case produced nice looking graphs; but the results were in some large part an artifact of the priors I had chosen for the model (not good).In the comments to that post, I suggested that part of the problem may have been how I had defined the model.
I was a little surprised when I saw Simon Jackman suggest that Kevin Rudd had moved the two-party preferred voting intention by seven percentage points in Labor's favour. It was not consistent with my own analysis and only one pollster (Morgan) has data that supports a seven point movement.