Causal Modeling in R: Whole Game2020-07-29 (updated: 2020-07-28)1 / 33

Broad strokesSpecify causal question
Draw assumptions (causal diagram)
Model assumptions (propensity score)
Analyze propensities (diagnostics)
Estimate causal effects (IPW) 
2 / 33

Do people who quit smoking gain weight?3 / 33

library(cidata)
nhefs_complete_uc <- nhefs_complete %>% 
  filter(censored == 0)
nhefs_complete_uc

## # A tibble: 1,566 x 67
##     seqn  qsmk death yrdth modth dadth   sbp   dbp sex  
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
##  1   233     0     0    NA    NA    NA   175    96 0    
##  2   235     0     0    NA    NA    NA   123    80 0    
##  3   244     0     0    NA    NA    NA   115    75 1    
##  4   245     0     1    85     2    14   148    78 0    
##  5   252     0     0    NA    NA    NA   118    77 0    
##  6   257     0     0    NA    NA    NA   141    83 1    
##  7   262     0     0    NA    NA    NA   132    69 1    
##  8   266     0     0    NA    NA    NA   100    53 1    
##  9   419     0     1    84    10    13   163    79 0    
## 10   420     0     1    86    10    17   184   106 0    
## # … with 1,556 more rows, and 58 more variables: age <dbl>,
## #   race <fct>, income <dbl>, marital <dbl>, school <dbl>,
## #   education <fct>, …

4 / 33

Did those who quit smoking gain weight?

5 / 33

Did those who quit smoking gain weight?

# ~2.5 lbs gained for quit vs. not quit
nhefs_complete_uc %>% 
  group_by(qsmk) %>% 
  summarize(
    mean_weight_change = mean(wt82_71), 
    sd = sd(wt82_71),
    .groups = "drop"
  )

## # A tibble: 2 x 3
##    qsmk mean_weight_change    sd
##   <dbl>              <dbl> <dbl>
## 1     0               1.98  7.45
## 2     1               4.53  8.75

6 / 33

draw your assumptions7 / 33

8 / 33

9 / 33

What do I need to control for?10 / 33

11 / 33

Multivariable regression: what's the association?

lm(
  wt82_71~ qsmk + sex +
    race + age + I(age^2) + education +
    smokeintensity + I(smokeintensity^2) +
    smokeyrs + I(smokeyrs^2) + exercise + active +
    wt71 + I(wt71^2),
  data = nhefs_complete_uc
) %>%
  tidy(conf.int = TRUE) %>% 
  filter(term == "qsmk")

12 / 33

Multivariable regression: what's the association?

lm( 
  wt82_71~ qsmk + sex +  
    race + age + I(age^2) + education + 
    smokeintensity + I(smokeintensity^2) + 
    smokeyrs + I(smokeyrs^2) + exercise + active + 
    wt71 + I(wt71^2), 
  data = nhefs_complete_uc 
) %>% 
  tidy(conf.int = TRUE) %>% 
  filter(term == "qsmk")

## # A tibble: 1 x 7
##   term  estimate std.error statistic  p.value conf.low
##   <chr>    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>
## 1 qsmk      3.46     0.438      7.90 5.36e-15     2.60
## # … with 1 more variable: conf.high <dbl>

13 / 33

model your assumptions14 / 33

counterfactual: what if everyone quit smoking vs. what if no one quit smoking15 / 33

Fit propensity score model

propensity_model <- glm(
  qsmk ~ sex +
    race + age + I(age^2) + education + 
    smokeintensity + I(smokeintensity^2) + 
    smokeyrs + I(smokeyrs^2) + exercise + active + 
    wt71 + I(wt71^2), 
  family = binomial(), 
  data = nhefs_complete_uc
)

16 / 33

Calculate inverse probability weights

nhefs_complete_uc <- propensity_model %>% 
  # predict whether quit smoking
  augment(type.predict = "response", data = nhefs_complete_uc) %>%
  # calculate inverse probability
  mutate(wts = 1 / ifelse(qsmk == 0, 1 - .fitted, .fitted))

17 / 33

diagnose your model assumptions18 / 33

What's the distribution of weights?

19 / 33

20 / 33

21 / 33

estimate the causal effects22 / 33

Estimate causal effect with IPW

ipw_model <- lm(
  wt82_71 ~ qsmk,
  data = nhefs_complete_uc, 
  weights = wts
) 
ipw_estimate <- ipw_model %>% 
  tidy(conf.int = TRUE) %>% 
  filter(term == "qsmk")

23 / 33

Estimate causal effect with IPW

ipw_estimate

## # A tibble: 1 x 7
##   term  estimate std.error statistic  p.value conf.low
##   <chr>    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>
## 1 qsmk      3.44     0.408      8.43 7.47e-17     2.64
## # … with 1 more variable: conf.high <dbl>

24 / 33

Let's fix our confidence intervals with the bootstrap!25 / 33

Let's fix our confidence intervals with the bootstrap!

# fit ipw model for a single bootstrap sample
fit_ipw_not_quite_rightly <- function(split, ...) { 
  # get bootstrapped data sample with `rsample::analysis()`
  .df <- analysis(split)
  # fit ipw model
  lm(wt82_71 ~ qsmk, data = .df, weights = wts) %>% 
    tidy()
}

25 / 33

fit_ipw <- function(split, ...) {
  .df <- analysis(split)
  # fit propensity score model
  propensity_model <- glm(
    qsmk ~ sex + 
      race + age + I(age^2) + education + 
      smokeintensity + I(smokeintensity^2) + 
      smokeyrs + I(smokeyrs^2) + exercise + active + 
      wt71 + I(wt71^2), 
    family = binomial(), 
    data = .df
  )
  # calculate inverse probability weights
  .df <- propensity_model %>% 
    augment(type.predict = "response", data = .df) %>% 
    mutate(wts = 1 / ifelse(qsmk == 0, 1 - .fitted, .fitted))
  # fit correctly bootsrapped ipw model
  lm(wt82_71 ~ qsmk, data = .df, weights = wts) %>% 
    tidy()
}

26 / 33

Using {rsample} to bootstrap our causal effect27 / 33

Using {rsample} to bootstrap our causal effect

# fit ipw model to bootstrapped samples
ipw_results <- bootstraps(nhefs_complete, 1000, apparent = TRUE) %>%
  mutate(results = map(splits, fit_ipw))

27 / 33

Using {rsample} to bootstrap our causal effect

# get t-statistic-based CIs
boot_estimate <- int_t(ipw_results, results) %>%
  filter(term == "qsmk")
boot_estimate

28 / 33

Using {rsample} to bootstrap our causal effect

# get t-statistic-based CIs
boot_estimate <- int_t(ipw_results, results) %>% 
  filter(term == "qsmk")
boot_estimate

## # A tibble: 1 x 6
##   term  .lower .estimate .upper .alpha .method  
##   <chr>  <dbl>     <dbl>  <dbl>  <dbl> <chr>    
## 1 qsmk    2.49      3.45   4.37   0.05 student-t

29 / 33

30 / 33

Our causal effect estimate: 3.5 lbs (95% CI 2.4 lbs, 4.4 lbs)31 / 33

Review the R Markdown file... later!32 / 33

Resources

Causal Inference: Comprehensive text on causal inference. Free online.

Causal Inference Notebook: R code to go along with Causal Inference

Bootstrap confidence intervals with {rsample}

33 / 33

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help