Making Regression More Reflective of (Social) Context. Interpretable Configurational Regression

Recording from AI Lund lunch seminar 3 May 2023

Topic: Making Regression More Reflective of (Social) Context. Interpretable Configurational Regression

When: 3 May 2023 at 12.00-13.15

Speaker: Chris Swader, Senior lecturer, Sociology, Lund University

Moderator: Anamaria Dutceac Segesten, Senior Lecturer, Strategic Communication, Lund University

Where: Online

Spoken language: English

Abstract

There has been a recognition that social phenomena cannot be meaningfully modeled with traditional regression’s focus on ‘average net effects’, which often hides subgroup differences (Ragin 2006). Instead, over the past decades, approaches such as Qualitative Comparative Analysis (QCA) — based on Boolean algebra and set theory — have highlighted that there may be multiple paths toward an outcome (‘equifinality’) and that particular combinations of variables/conditions, rather than variables in isolation, may be sufficient for outcomes (conjunctural causation: Schneider and Wagemann 2012).

The traditional interaction-effect approach cannot be used to efficiently find meaningful and complex subgroups in data, and, while QCA can find distinct pathways and their corresponding cases, it is limited to a small number of predictors. Further, typical machine learning approaches can achieve high accuracy by making use of many predictors and identifying nuanced latent patterns, but these (black box) models tend to be difficult to interpret, especially in the explanatory mode that social scientists need. What is needed is an approach that can simultaneously i) identify complex subgroups ii) based on a larger number of predictors iii) in a way that is still interpretable.

ICR uses a (‘genetic’) optimisation algorithm to build a cooperating group of solution formulas. The fitness function results in as few separate sets of regression coefficients as possible to explain as many unique cases as possible. We then condense those many formulas into a smaller set. This set’s case memberships are then streamlined to belong to individual models and ‘true regressions’ are run on these separate models. For interpretability purposes, these separate subgroups are then described through the help of a novel agglomerative population-based tree. This constructed tree is then transformed into a case-based format, and the models are re-assigned according to these new case-based rules. The outcome is a concise and clean case-based decision tree leading to the separate multiple outcome models in unique leaves. The closest existing techniques to our own include mixture regression models (Lamont et al. 2015) and the MAIHDA model (Merlo 2018).