The Two Cultures of Stats

1. Introduction

There are two cultures in the statistical modeling of data. The traditional Data Modeling Culture assumes that the data are generated by a certain stochastic mechanism, a model. The Algorithmic Modeling Culture treats the mechanism as unknown and seeks to find a formula that operates on the data.

2. The Data Modeling Culture

The Data Modeling Culture is characterized by:

Assumption of a stochastic mechanism that generates data
Maximum likelihood estimation
Hypothesis testing and p-values
Focus on parameters and their significance

2.1 Example: Linear Regression

\begin{equation} y = X\beta + \epsilon \end{equation}

Where $\epsilon \sim N(0, \sigma^2 I)$. The focus is on estimating $\beta$ and establishing statistical significance.

3. The Algorithmic Modeling Culture

The Algorithmic Modeling Culture is characterized by:

Focus on predictive accuracy
Complex models like neural networks and random forests
Validation on test data
Black-box methodology

The Algorithmic approach doesn’t make assumptions about the data-generating mechanism and instead focuses on finding algorithms that work well in practice.