A Population Health Ranking System

While working as a population health data scientist, I built a population health ranking system that had a 50% higher R-Squared (i.e. explained variance) than commercial risk assessment systems. The best commercial health risk models top out at around a 27% R-Squared (See). The health ranking model I built produced a 39% R-Squared.

This superior performance was due to a few factors. The first factor was that there were many hundreds of various types of health predictors in the model. These predictors, by themselves, created similar R-Squared values to traditional commercial systems (~ 26% R-Squared). The 50% improvement in explained variance was due to a simple mathematical transformation that transformed the dependent variable from “annual patient spend” to something called a “probit.”

While the transformation is simple, it appears that the logic behind this transformation is not immediately apparent to most people. I would like to see widespread adoption of this model, and so I am providing a description of the logic and mechanics of building this model for others to follow. If you are curious about this, please feel free to contact me.