Given that I have a small version of myself running around my house, I think about how she’ll use the Internet when she gets older. Just the other day she, “Asked Google something” which made me realize that, although she’s just barely literate, my kid is going to “be online” for the rest of her life. Although I’m not really sure what “be online” will mean for her over the years.
I just spent nearly two full days in a bare knuckle brawl with my Macbook Pro trying to get it to talk to a corporate MS SQL Server. I had abandoned MSSQL more than a year ago in favor of PostgreSQL because of how much easier it is to work with PostgreSQL from a non-Microsoft stack. At that point I was R running on Linux and soon R running on OS X.
There’s a charming little brain teaser that’s going around the Interwebs. It’s got various forms, but they all look something like this: 8809=6 7111=0 2172=0 6666=4 1111=0 3213=0 7662=2 9313=1 0000=4 2222=0 3333=0 5555=0 8193=3 8096=5 7777=0 9999=4 7756=1 6855=3 9881=5 5531=0 2581=?SPOILER ALERT… The answer has to do with how many circles are in each number. So the number 8 has two circles in its shape so it counts as two.
I had someone ask me about fitting a beta distribution to data drawn from a gamma distribution and how well the distribution would fit. I’m not a “closed form” kinda guy. I’m more of a “numerical simulation” type of fellow. So I whipped up a little R code to illustrate the process then we changed the parameters of the gamma distribution to see how it impacted fit. An exercise like this is what I call building a “toy model” and I think this is invaluable as a method for building intuition and a visceral understanding of data.
A bad analogy can frame an entire conversation improperly. This is one of those “anecdotes from a middle-aged man posts.” So take it with a grain of salt. A number of years ago I worked in the risk management team for an insurance company that sold long term care (LTC) insurance. LTC insurance is a private product that covers home health care and nursing home care if the policyholder is unable to take care of themselves on their own.
In 2005 I was interviewing for a job as Risk Manager with Genworth Financial. I was working a gig up in Armonk, NY so I hopped a car to the GNW office and met with Mark Griffin, at that point the Chief Risk Office (CRO) for GNW. After some small talk, Mark asked me the single most interesting interview question I’ve ever been asked. I don’t recall the exact wording, but the gist was:
In a previous post I discussed my frustrations with trying to get Dropbox or Spideroak to perform BOTH encrypted remote backup and AND fast two way file syncing. This is the detail of how I set up for two machines, both Ubuntu 10.10, to perform two way sync where a file change on either machine will result in that change being replicated on the other machine. I initially tried running Unison on BOTH my laptop and the server and had the server Unison set to sync with my laptop back through an SSH reverse proxy.
I love the portability of a laptop. I have a 45 min train ride twice a day and I fly a little too, so having my work with me on my laptop is very important. But I hate doing long running analytics on my laptop when I’m in the office because it bogs down my laptop and all those videos on The Superficial get all jerky and stuff. I get around this conundrum by running much of my analytics on either my work server or on an EC2 machine (I’m going to call these collectively “my servers” for the rest of this post).
It’s been pointed out to me that I haven’t had any blog posts in a while. It’s true. I’m fairly slack. But in the last few months I’ve changed jobs (same firm, new role), written an R abstraction on top of Hadoop, been to China, and managed to stay married. While that sounds pretty awesome, I’m nothing compared to Hideaki Akaiwa. And you may have heard that the R Cookbook by Chicago’s own Paul Teeter has been printed!
I’ve been messing around with using Amazon Web Services for a while. I’ve had some projects where I wanted to upload files to S3 or fire off EMR jobs. I’ve been controlling AWS services using a hodgepodge of command line tools and the R system() function to call the tools from the command line. This has some real disadvantages, however. Using the command line tools means each tool has to be configured individually which is painful on a new machine.