Wednesday 2013-12-18

Linkedin's Jay Kreps wrote a walkthrough of their log-centric infrastructure in which he remarks:

Event data records things that happen rather than things that are. In web systems, this means user activity logging, but also the machine-level events and statistics required to reliably operate and monitor a data center's worth of machines. People tend to call this "log data" since it is often written to application logs, but that confuses form with function.

Let's just say that log data space and event data space is large. Converting log data into event data goes beyond simple event correlation, as we're always using log data to form hypotheses about system state.

In CommandInWar, Creveld points out that the realization of this completely changed how battles/wars were managed:

Ideally, the regular reporting system should tell the commander which questions to ask, and the directed telescope should enable him to answer those questions. It was the two systems together, cutting across each other and wielded by Napoleon's masterful hand, which made the revolution in command possible.

By logging and testing, we can build test-by-test an iteratively better model of the underlying process. Even when that process is stochastic or otherwise unstable, we can usually find patterns in our log data to determine something useful.

That said, we have to have enough capital / time to get the model to satisfice.