I’m a huge O’Reilly Media fan boy. I can’t hide it. I hear Tim O’Reilly speak at conferences and I think to myself, “Screw being president, I want to be Tim O’Reilly.” I’ve been a subscriber to their online book services called Safari Books Online for years. Every month I see the bill for $43 come through and I think to myself, “Self, that’s the best $43 you spent all month.
A few months ago I switched my laptop from Windows to Ubuntu Linux. I had been connecting to my corporate SQL Server database using RODBC on Windows so I attempted to get ODBC connectivity up and running on Ubuntu. ODBC on Ubuntu turned into an exercise in futility. I spent many hours over many days and never was able to connect from R on Ubuntu to my corp SQL Server.
Over at stats.stackexchange.com recently, a really interesting question was raised about principal component analysis (PCA). The gist was “Thanks to my college class I can do the math, but what does it MEAN?” I felt like this a number of times in my life. Many of my classes were focused on the technical implementations they kinda missed the section titled “Why I give a shit.” A perfect example was my Mathematics Principles of Economics class which taught me how to manually calculate a bordered Hessian but, for the life of me, I have no idea why I would ever want to calculate such a monster.
[caption id="attachment_825” align="alignleft” width="250” caption="André-Louis Cholesky is my homeboy”][/caption] When I did a brief post three days ago I had no plans on writing two more posts on correlated random number generation. But I’ve gotten a couple of emails, a few comments, and some Twitter feedback. In response to my first post, Gappy, calls me out and says, “the way mensches do multivariate (log)normal variates is via Cholesky. It’s simple, instructive, and fast.
So after yesterday’s post on Simple Simulation using Copulas I got a very nice email that basically begged the question, “Dude, why are you making this so hard?” The author pointed out that if what I really want is a Gaussian correlation structure for Gaussian distributions then I could simply use the mvrnorm() function from the MASS package. Well I did a quick and, I’ll be damned, he’s right! The advantage of using a copula is the ability to simulate correlation structures where the correlation is different for different levels of values.
A friend of mine gave me a call last week and was wondering if I had a little R code that could illustrate how to do a Cholesky decomposition. He ultimately wanted to build a Monte Carlo model with correlated variables. I pointed him to a number of packages that do Cholesky decomp but then I recommended he consider just using a Gaussian Copula and R for the whole simulation.
[caption id="attachment_775” align="alignleft” width="250” caption="Radiant Heat System. Not in my house… yet! “][/caption] My wife and I bought a foreclosed house a few months ago. This house had been part of mortgage fraud and we bought it at auction. Interesting life experience, to say the least. The finished basement was built with radiant heat tubing poured into the concrete. These pipes are designed to be hooked to a hot water heater so the warm water can provide radiant heat through the floors in the basement.
I do some work from home, some work from an office in Chicago and some work on the road. It’s not uncommon for me to want to tunnel all my web traffic through a VPN tunnel. In one of my previous blog posts I alluded to using Amazon EC2 as a way to get around your corporate IT mind control voyeurs service providers. This tunneling method is one of the 5 or so ways I have used EC2 to set up a tunnel.
I’ve been continuing to muck around with using R inside of Amazon Elastic Map reduce jobs. I’ve been working on abstracting the lapply() logic so that R will farm the pieces out to Amazon EMR. This is coming along really well, thanks in no small part to the Stack Overflow [r] community. I have no idea how crappy coders like me got anything at all done before the Interwebs. One of the immediate hurdles faced when trying to use AMZN EMR in anger is that the default version of R on EMR is 2.
I’m kinda blown away by the number of folks who have joined the Chicago R User Group (RUG) in the last few weeks. As of this morning we have 65 people signed up for the group and 25 who have said that they are planning on attending the meetup this Thursday (yes, only 3 days away!) I’m very pleased that this many people in Chicago find the R language interesting and/or valuable.