So back in March, 2009 I blogged about a phenomenon I called the Anna Karenina Yield Anomaly. In short, I postulated that in the production of crops the idea of a national ‘good year’ pretty much means everyone had a good yield and a national ‘bad year’ meant that some had an OK year and some were having a terrible year. And thus I made myself seem more literate than I am by linking that phenomenon back to Tolstoy and his line “All happy families are happy in the same way. All miserable families are miserable in their own way.”
So a couple weeks ago a big ass storm conspired against me and resulted in me listening to the audio version of Guns Germs and Steel by Jared Diamond while stuck in the Minneapolis airport. I was struck that Diamond also used the Anna Kerenina quote when discussing the domestication of animals. He presented the idea that all domesticated animals are similar in a number of key characteristics and all the non-domesticated critters are different and failed to be put under the yolk of humans for their own special reasons.
Ok, so Diamond and I both like to make connections between Tolstoy and modern agriculture. And all us white guys like to look smarter than we probably are (actually he really is that smart, I’m just gonna fake it till I make it). But besides those obvious things, what’s behind the Tolstoy quote? How applicable is this idea to other situations?
I spent a long time in the Minneapolis Airport pondering what conditions result in what I began to call the Tolstoy Dichotomy: one outcome and all are similar vs. other outcome with different causes. OK, just for transparency, I was also trying not to tap my toe or do anything else obviously gay while in the pooper. That’s why I listen to audio books and not music when stuck in MSP.
While it’s not exactly profound, it seems to me that any time a given outcome of a process is dependent on many necessary and few/no sufficient conditions then the final set of actors who went through that process will comprise one group where all necessary conditions were met and another group where varying different conditions were not met. So all the actors in one group are the same but the actors in the other group are in that group together, but all for a different set of reasons.
In crop yields this makes a lot of sense. The growing season is 8 months + for corn in the Midwest US. During that period a LOT of things can go wrong. There can be a spring that is wet and delays planting, like this year. There can be a mid-season drought, like 1987. A big cloud can park itself over the Midwest and limit light and temp and increase flooding (1993). But all good years mean that there was moisture at the right times, not too much heat during pollination, etc. So with crops there are many necessary conditions and no sufficient conditions for a good crop.
One of the interesting analytical consequences of this dichotomy is that the two groups have very different correlations. If you are measuring the group that met all the conditions, they are a homogeneous bunch (i.e. high correlation). But the group that fell off the wagon for one reason or another… well they are all pretty heterogeneous (i.e. low correlation). Depending on your model it may be very important to consider the difference between the correlations in these groups and then vary your covariance accordingly. Or you can just assume that correlations are stable over time and find yourself kicked in the nuts by reality. Which ever. It’s your call.
So, as I said earlier, not exactly profound but every time you see a system with many necessary conditions and no sufficient conditions be like me and pull a Tolstoy Anna Kerenina quote from your butt. You’ll be glad you did.