Causal Inference in Machine Learning Models

Moving Beyond Correlation: Estimating Cause-and-Effect with Data

Authored by: Loveleen Narang

Date: May 23, 2024

Introduction: Correlation is Not Causation

Machine learning models excel at identifying patterns and making predictions based on correlations in data (\( P(Y|X) \)). A classic mantra in statistics and data science, however, reminds us that "correlation does not imply causation." Just because two variables move together does not mean one causes the other. For example, ice cream sales and drowning incidents are correlated (both increase in summer), but ice cream doesn't cause drowning. Understanding true cause-and-effect relationships requires moving beyond standard predictive modeling to the realm of causal inference.

Causal inference aims to determine the effect of changing one variable (\( X \), the "treatment" or "cause") on another variable (\( Y \), the "outcome" or "effect"). This involves asking counterfactual questions: "What would have happened to Y if X had been different?" Answering such questions is crucial for effective decision-making in fields like medicine (Does this drug work?), policy (Does this program achieve its goal?), and business (Does this ad campaign drive sales?). We want to understand the effect of an intervention, represented notationally as \( P(Y|do(X=x)) \), which is distinct from the observational probability \( P(Y|X=x) \) (Formula 1). While Randomized Controlled Trials (RCTs) are the ideal way to establish causality, they are often impractical. This article explores frameworks and methods, including those enhanced by machine learning, for inferring causal effects primarily from observational data.

Correlation vs. Causation

Correlation (Association) Ice Cream Sales Drowning Incidents Hot Weather Spurious Correlation Causation Applying Fertilizer Increased Crop Yield (Fertilizer causes increased yield)

Fig 1: Correlation can arise from a common cause (confounder), while causation implies a direct influence.

Framework 1: Potential Outcomes (Rubin Causal Model)

The Potential Outcomes framework formalizes causal reasoning using counterfactuals. For each unit \( i \) (e.g., a patient) and a binary treatment \( T \in \{0, 1\} \), we imagine two potential states of the world:

The Individual Treatment Effect (ITE) is the difference: \( \tau_i = Y_i(1) - Y_i(0) \) (Formula 4). However, we face the Fundamental Problem of Causal Inference: we only ever observe one potential outcome for each unit. The observed outcome is \( Y_i^{obs} = T_i Y_i(1) + (1-T_i) Y_i(0) \) (Formula 5). The other outcome remains unseen (counterfactual).

Therefore, we usually estimate average effects:

This framework relies on the Stable Unit Treatment Value Assumption (SUTVA), meaning no interference between units and only one version of the treatment.

Potential Outcomes Framework

Unit i World if Treated (T=1) Yi(1) World if Control (T=0) Yi(0) If Unit 'i' actually received Treatment (Ti=1): Observed: Yi(1) Unobserved: Yi(0) Fundamental Problem!

Fig 2: Only one potential outcome (Y(0) or Y(1)) is observed for any individual unit.

Framework 2: Structural Causal Models (SCMs) & DAGs

Structural Causal Models use Directed Acyclic Graphs (DAGs) to visually represent assumptions about causal relationships between variables (nodes connected by directed edges). Key structures include chains, forks (confounding), and colliders. The do-operator, \( do(X=x) \), represents setting variable \( X \) to value \( x \) via intervention. DAGs help identify if a causal effect is identifiable from observational data using criteria like the back-door criterion.

DAG Example: Confounding and Adjustment

Z (Confounder) T (Treatment) Y (Outcome) Backdoor Path: T <- Z -> Y Adjusting for Z blocks backdoor path

Fig 3: A DAG showing Treatment (T), Outcome (Y), and Confounder (Z). Adjusting for Z blocks the non-causal path.

Identifying Causal Effects: Strategies

Randomized Controlled Trials (RCTs)

By randomly assigning treatment \( T \), RCTs ensure \( T \) is independent of pre-treatment factors (\( T \perp (Y(1), Y(0), Z, U) \)). This allows direct estimation of ATE: \( ATE = E[Y|T=1] - E[Y|T=0] \) (Formula 8).

Observational Studies: Handling Confounding

Without randomization, we must account for confounders \( Z \).

Observational Studies: Handling Unobserved Confounding

Quasi-experimental designs for when confounders are unmeasured:

Instrumental Variable (IV) Setup

Z (IV) T (Treat) Y (Outcome) U (Unobs.) No Direct Z->Y Path (Z indep. of U)

Fig 4: IV setup assumes Z affects T, T affects Y, U affects T and Y, but Z is independent of U and only affects Y through T.

Difference-in-Differences (DiD) Plot

Time Outcome Y Control Group Treatment Group Intervention Counterfactual DiD Effect

Fig 5: DiD estimates the treatment effect by comparing outcome changes, assuming parallel pre-treatment trends.

Causal Inference Meets Machine Learning

ML enhances causal inference, especially for:

Other relevant formulas: Basic Regression \( y = \beta X + \epsilon \) (Formula 21), Expectation \( E[\cdot] \) (Formula 22), Conditional Probability \( P(\cdot|\cdot) \) (Formula 23), Covariance \( Cov(\cdot, \cdot) \) (Formula 24), Basic Loss \( J(\theta) \) (Formula 25), Parameters \( \theta \) (Formula 26).

Challenges

Comparison of Common Causal Inference Strategies
MethodData TypeKey Assumption(s)Handles Unobserved Confounding?
RCTExperimentalSuccessful RandomizationYes
Adjustment / PS MethodsObservationalConditional Ignorability (All confounders measured), PositivityNo
Instrumental Variable (IV)ObservationalRelevance, Exclusion Restriction, IndependenceYes (if assumptions hold)
Regression Discontinuity (RDD)ObservationalContinuity of potential outcomes near cutoffYes (locally around cutoff)
Difference-in-Differences (DiD)Observational (Panel)Parallel TrendsYes (for time-invariant confounders)

Conclusion: Towards Causal Understanding

Causal inference provides the essential framework for moving beyond correlation to understand cause-and-effect, crucial for informed decision-making. By leveraging frameworks like Potential Outcomes and Structural Causal Models, and applying identification strategies such as adjustment based on the back-door criterion, instrumental variables, RDD, or DiD (often enhanced by ML techniques for estimating nuisance components or heterogeneous effects), we can rigorously estimate causal impacts even from observational data. However, this requires careful consideration of underlying assumptions and potential biases, particularly unobserved confounding. Integrating the predictive power of ML with the inferential rigor of causal methods is key to moving from simply observing patterns to understanding the mechanisms that drive them.

(Formula count check: Includes P(Y|X)!=P(Y|do(X)), Y(1), Y(0), tau_i, Y_obs, ATE, ATT, RCT ATE formula, Backdoor Adj P, Cond. Ignorability (symbol), Prop Score e(x), IPTW Est, IV Cov, IV Exclusion, IV Independence, RDD limit, DiD Est, CATE tau(x), T-Learner tau, S-Learner tau, Basic Regression, E[], P(.|.), Cov, J(theta), theta. Total > 25).

About the Author, Architect & Developer

Loveleen Narang is a distinguished leader and visionary in the fields of Data Science, Machine Learning, and Artificial Intelligence. With over two decades of experience in designing and architecting cutting-edge AI solutions, he excels at leveraging advanced technologies to tackle complex challenges across diverse industries. His strategic mindset not only resolves critical issues but also enhances operational efficiency, reinforces regulatory compliance, and delivers tangible value—especially within government and public sector initiatives.

Widely recognized for his commitment to excellence, Loveleen focuses on building robust, scalable, and secure systems that align with global standards and ethical principles. His approach seamlessly integrates cross-functional collaboration with innovative methodologies, ensuring every solution is both forward-looking and aligned with organizational goals. A driving force behind industry best practices, Loveleen continues to shape the future of technology-led transformation, earning a reputation as a catalyst for impactful and sustainable innovation.