Sunday 2011-10-23

I started collecting the mean, variance, skew, and kurtosis for collectd data in CollectdPlusR and I wanted to have it run fast and light, which implies writing it in something closer to the metal.

GNU has the GNU Scientific Library which has basic stats like collecting the moments of an empirical distribution:

	printf("%f,%f,%f,%f\n",
		gsl_stats_mean(data, 1, DATA_WINDOW),
		gsl_stats_variance(data, 1, DATA_WINDOW),
		gsl_stats_skew(data, 1, DATA_WINDOW),
		gsl_stats_kurtosis(data, 1, DATA_WINDOW) );

The bulk of the code in moments.c deals with the CSV. It seems odd that there doesn't appear to be a GNU library for CSV, so I'm probably overlooking something in gdbm or librec or something.

Now to start compiling a history of moments for further statistical abuse ;)