1. Introduction
There are two cultures in the statistical modeling of data. The traditional Data Modeling Culture assumes that the data are generated by a certain stochastic mechanism, a model. The Algorithmic Modeling Culture treats the mechanism as unknown and seeks to find a formula that operates on the data.
2. The Data Modeling Culture
The Data Modeling Culture is characterized by:
- Assumption of a stochastic mechanism that generates data
- Maximum likelihood estimation
- Hypothesis testing and p-values
- Focus on parameters and their significance
2.1 Example: Linear Regression
\begin{equation} y = X\beta + \epsilon \end{equation}Where $\epsilon \sim N(0, \sigma^2 I)$. The focus is on estimating $\beta$ and establishing statistical significance.
3. The Algorithmic Modeling Culture
The Algorithmic Modeling Culture is characterized by:
- Focus on predictive accuracy
- Complex models like neural networks and random forests
- Validation on test data
- Black-box methodology
The Algorithmic approach doesn’t make assumptions about the data-generating mechanism and instead focuses on finding algorithms that work well in practice.