Skip to content

Richard Careaga

Boomer (there's that). Lawyer (ouch!). Polymath (i.e., attention disorder). Big Thinker™️. Data science dilettante. Post-patriarchal.

Greater Seattle, Washington, US

Hidden missing data

No, you can’t always get what you want You can’t always get what you want You can’t always get what you want But if you try sometime You’ll find You get what you need – Mick Jagger and Keith Richards There’s nothing like missing values to

Members Public

The compiler will tell you what the user cannot

The compiler will always tell you about source code errors that prevent compiling. It can’t advise you if your code solves the problem that it was supposed to solve, even if you are confident in what that problem is. But have a thought for the user who posed the

Members Public

Less coding and more analysis

I urge you to install.packages("ExPanDaR") and then try the following: library(ExPanDaR) wb <- read.csv("https://joachim-gassen.github.io/data/wb_condensed.csv") ExPanD(wb, cs_id = "country", ts_id = "year") from Interactive panel EDA with 3 lines

Members Public

Color Map Atlas for Continuously Scaled Maps

## Please note: Alaska and Hawaii are being shifted and are not to scale. Color, Design and Communication Graphic design standards, especially for color, are an official Good Thing. Branding, visual cueing, consistency. Map colors are hard, especially when dealing with continuous data. An example of continuous data The U.S.

Members Public

R and Haskell, meant for each other?

R has a notoriously steep learning curve. I once moaned that help() needed it’s own help(help), which it does, but did nothing to help me understand the paradigm of function signatures. And why were control loops like for so frowned upon. All sorts of great functionality, but how

Members Public

n?

Got n? There was a 100% increase in 1-hour parking meter costs this year! (Imagined conversation in a small town.) It appears to be hardwired into our wetware to forget to look for the underlying integers. You might ask, for example from what to what? and learn that it went

Members Public

Month arithmetic in R, a quick guide

Here are questions you might want to ask about months: 1. How many more months until September 1? months_to 2. How many more calendar months until September 1? cal_months_to 3. How many months between this month and September 1? months_between 4. How many more calendar months

Members Public

Is this the original revised data or the revised revised data

Keeping track of the provenance of data can be a challenge, especially when drawing on published sources. Keeping a record of the origin, the date accessed, the transformations applied (e.g., converting from .xls to cvs and converting character strings such as “$1,250,321.21” to floats or date

Members Public

Data Science in the Board Room and C Suite

Arthur C. Clarke, best remembered for his screenwriting role in 2001: A Space Odyssey is also the source of a famous quotation Any sufficiently advanced technology is indistinguishable from magic. We can think of data science as an advanced technology that draws heavily on statistics that have developed in the

Members Public

Cognitive Bias and the Data Scientist

As data scientists, we are not immune from cognitive bias. We do a lot to minimize it, we may even try to quantify it. But it is a part of our protoplasm and, especially, the protoplasm of our untrained clients. A good place to start is the Monty Hall Problem.

Members Public

Ordinary Least Squares Linear Regression Walkthrough for Beginners

Ordinary Least Squares Linear Regression Ordinary linear least squares regression is one of the first statistical methods that anyone learns who is trying to understand how two sets of numbers relate. It has an undeserved reputation of being able to foretell the future, but it is mathematically tractable, simple to

Members Public

Deboning linear regression output in Excel

A while back (https://goo.gl/1W11Zu), I outlined interpretation of the output of a multiple linear regression of data on Seattle area housing prices (https://www.kaggle.com/harlfoxem/housesalesprediction?login=true), which provides a convenient way to illustrate the usual output of a multiple linear regression model output

Members Public

Cognitive styles of data science

Yihui Xie of RMarkdown fame has a long, thoughtful post (https://goo.gl/WjA7t8) on a debate for and against working in various types of interactive notebooks, such as Juypter for the Python platform, IDEs like RStudio in its so-called “notebook mode” and the old school heavy code commenting, among

Members Public

WaPo has some effective visualizations

One of the annoyances of online newspapers such as the NYT and WaPo is that they recycle content, usually off-the-news-cycle. But this sometimes gets offset by finding something you missed the first time. In a recent article on six common occcupations that have changed from above-average to below average pay,

Members Public