howto

Fitting Distribution X to Data From Distribution Y

I had someone ask me about fitting a beta distribution to data drawn from a gamma distribution and how well the distribution would fit. I’m not a “closed form” kinda guy. I’m more of a “numerical simulation” type of fellow. So I whipped up a little R code to illustrate the process then we changed the parameters of the gamma distribution to see how it impacted fit. An exercise like this is what I call building a “toy model” and I think this is invaluable as a method for building intuition and a visceral understanding of data.

Your Analogy is bad... and you should feel bad!

A bad analogy can frame an entire conversation improperly. This is one of those “anecdotes from a middle-aged man posts.” So take it with a grain of salt. A number of years ago I worked in the risk management team for an insurance company that sold long term care (LTC) insurance. LTC insurance is a private product that covers home health care and nursing home care if the policyholder is unable to take care of themselves on their own.

Details of two-way sync between two Ubuntu machines

In a previous post I discussed my frustrations with trying to get Dropbox or Spideroak to perform BOTH encrypted remote backup and AND fast two way file syncing. This is the detail of how I set up for two machines, both Ubuntu 10.10, to perform two way sync where a file change on either machine will result in that change being replicated on the other machine. I initially tried running Unison on BOTH my laptop and the server and had the server Unison set to sync with my laptop back through an SSH reverse proxy.

Connecting to SQL Server from R using RJDBC

A few months ago I switched my laptop from Windows to Ubuntu Linux. I had been connecting to my corporate SQL Server database using RODBC on Windows so I attempted to get ODBC connectivity up and running on Ubuntu. ODBC on Ubuntu turned into an exercise in futility. I spent many hours over many days and never was able to connect from R on Ubuntu to my corp SQL Server.

Principal Component Analysis (PCA) vs Ordinary Least Squares (OLS): A Visual Explanation

Over at stats.stackexchange.com recently, a really interesting question was raised about principal component analysis (PCA). The gist was “Thanks to my college class I can do the math, but what does it MEAN?” I felt like this a number of times in my life. Many of my classes were focused on the technical implementations they kinda missed the section titled “Why I give a shit.” A perfect example was my Mathematics Principles of Economics class which taught me how to manually calculate a bordered Hessian but, for the life of me, I have no idea why I would ever want to calculate such a monster.

Third, and Hopefully Final, Post on Correlated Random Normal Generation (Cholesky Edition)

[caption id="attachment_825” align="alignleft” width="250” caption="André-Louis Cholesky is my homeboy”][/caption] When I did a brief post three days ago I had no plans on writing two more posts on correlated random number generation. But I’ve gotten a couple of emails, a few comments, and some Twitter feedback. In response to my first post, Gappy, calls me out and says, “the way mensches do multivariate (log)normal variates is via Cholesky. It’s simple, instructive, and fast.

Even Simpler Multivariate Correlated Simulations

So after yesterday’s post on Simple Simulation using Copulas I got a very nice email that basically begged the question, “Dude, why are you making this so hard?” The author pointed out that if what I really want is a Gaussian correlation structure for Gaussian distributions then I could simply use the mvrnorm() function from the MASS package. Well I did a quick and, I’ll be damned, he’s right! The advantage of using a copula is the ability to simulate correlation structures where the correlation is different for different levels of values.

Stochastic Simulation With Copulas in R

A friend of mine gave me a call last week and was wondering if I had a little R code that could illustrate how to do a Cholesky decomposition. He ultimately wanted to build a Monte Carlo model with correlated variables. I pointed him to a number of packages that do Cholesky decomp but then I recommended he consider just using a Gaussian Copula and R for the whole simulation.

You can Hadoop it! It's elastic! Boogie woogie woog-ie!

[caption id="attachment_594” align="alignleft” width="261” caption="This blog’s name in Chinese! “][/caption] I just came back from the future and let me be the first to tell you this: Learn some Chinese. And more than just cào nǐ niáng (肏你娘) which your friend in grad school told you means “Live happy with many blessings”. Trust me, I’ve been hanging with Madam Wu and she told me it doesn’t mean that. So how did I travel to the future to visit with Madam Wu, you ask?

Using the R multicore package in Linux with wild and passionate abandon

One of my primary uses for R is to build stochastic simulations of insurance portfolios and reinsurance treaties. It’s not uncommon for each of my simulations to take 20 seconds or more to complete (if you’re doing the math, that’s 55 hours for 10K sims or, approximately 453 games of solitaire) . Initially I ran my sims in R running on an Oracle VirtualBox (Oracle now owns Virtualbox! gasp ) running Ubuntu.

Not Just Normal... Gaussian

[caption id=”” align="alignleft” width="188” caption="Pretty Normal”][/caption] Dave, over at The Revolutions Blog,posted about the big ‘ol list of graphs created with R that are over at Wikimedia Commons. As I was scrolling through the list I recognized the standard normal distribution from the Wikipedia article on the same topic. Below is the fairly simple source code with lots of comments. Here’s the source. Run it at home… for fun and profit.

Box plot vs. Violin plot in R

So Andrew Gelman hates box plots. Not that you should give a buck what Gelman thinks. I’m just setting this blog post up, OK. So stick with me. Gelman also thought this XKCD cartoon was NOT funny : There’s some correlation as well as causation. I could be wrong, but I suspect that the reason Gelman does not like the XKCD cartoon is because he’s very literal, as geeks can be.