#### Taking blogdown for a test drive

##### Trying out RStudio's new blogging framework

`R`

`data science`

`tools`

`productivity`

Recently, I came across the blogdown R
package, a variant of RStudio’s popular
bookdown R package, made by Yihui Xie
and Amber Thomas. Blogdown allows the user to write blog posts with code chunks,
in any of the large variety of languages supported by RMarkdown, allowing for
computationally reproducible writing and programming. It also plays well with
the new static site engine Hugo. Here, I’m mostly just
going to take `blogdown`

for a spin.

First, let’s generate some data and try doing some very simple summary statistics:

```
# we're going to be simulating...set seed and constants
set.seed(6142)
n <- 1000
tx_mean <- 25
# generate variables randomly and by structural equation
W <- replicate(3, rnorm(n))
A <- rnorm(n, mean = tx_mean, sd = 2*sqrt(tx_mean))
Y <- as.numeric(A > tx_mean & W[, 1] > 0)
O <- as.data.frame(cbind(Y, A, W))
colnames(O) <- c("Y", "A", paste0("W", seq_len(3)))
head(O)
```

```
## Y A W1 W2 W3
## 1 1 28.40771 1.6361789 0.3361618 2.2859933
## 2 1 43.26174 2.0236876 0.3815559 0.1296723
## 3 1 30.45214 0.1319931 0.3985546 0.7973811
## 4 0 23.52230 -0.4502451 0.7176603 -1.0905016
## 5 1 29.88353 1.7895223 -0.3369940 0.7153515
## 6 0 21.33805 1.7653295 -0.8362112 -0.7107431
```

`skim(O)`

```
## Skim summary statistics
## n obs: 1000
## n variables: 5
##
## ── Variable type:numeric ──────────────────────────────────────────────────
## variable missing complete n mean sd p0 p25 p50 p75 p100
## A 0 1000 1000 25.18 9.77 -4.68 18.12 25.37 32.17 56.11
## W1 0 1000 1000 0.045 1 -3.48 -0.59 0.036 0.7 3.41
## W2 0 1000 1000 -0.025 0.99 -3.68 -0.65 -0.038 0.65 3.51
## W3 0 1000 1000 -0.023 1.01 -3.37 -0.66 -0.034 0.61 2.99
## Y 0 1000 1000 0.27 0.44 0 0 0 1 1
## hist
## ▁▂▅▇▇▅▁▁
## ▁▁▃▇▇▅▁▁
## ▁▁▃▇▇▅▁▁
## ▁▁▃▇▇▅▂▁
## ▇▁▁▁▁▁▁▃
```

Look at that! In the above, we generate background covariates (\(W\)) based on the standard Normal distribution, a treatment (\(A\)) based on a Normal distribution with specified mean (\(\mu = 25\)) and standard deviation (\(\sigma = 2 \cdot \sqrt{\mu} = 10\)), and an outcome (\(Y\)) based on a specified structural equation of the form:

\[Y = I(A > 25) \cdot I(W_1 > 0),\] for \(n = 100\).

Having just recently returned from ACIC ’17, I have causal inference on my mind. You’ll notice that in specifying the data generating processes above, I provided a structural equation for \(Y\) – now, let’s play with marginal structural models (MSMs) just a little bit:

```
msm <- glm(Y ~ ., family = "binomial", data = O)
summary(msm)
```

```
##
## Call:
## glm(formula = Y ~ ., family = "binomial", data = O)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.67224 -0.38138 -0.10080 0.07985 2.31857
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.523376 0.687346 -13.855 <2e-16
## A 0.272499 0.021257 12.820 <2e-16
## W1 2.589087 0.200157 12.935 <2e-16
## W2 0.053711 0.119842 0.448 0.654
## W3 -0.006588 0.108947 -0.060 0.952
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1168.50 on 999 degrees of freedom
## Residual deviance: 516.94 on 995 degrees of freedom
## AIC: 526.94
##
## Number of Fisher Scoring iterations: 7
```

Above, we projected the observed data (\(O = (W, A, Y)\)) onto a working MSM for
the outcome, using a logistic regression with terms for all of the baseline
covariates and the treatment. From the `summary`

of the resultant `glm`

object,
we notice that the estimated coefficients for the terms corresponding to the
treatment (\(A\)) and first baseline covariate (\(W_1\)) are statistically
significant – recall that these were the variables we used in specifying the
structural equation for \(Y\) above.

I think I’ll wrap up this experiment by trying some simple plotting:

```
p <- ggplot(O, aes(A, Y)) + geom_point()
p <- p + stat_smooth(method = "glm", method.args = list(family = "binomial"),
formula = y ~ x)
p <- p + xlab("A (treatment)") + ylab("Y (outcome)") + theme_nima()
print(p)
```

We can plot the relationship between the treatment and outcome using a logistic
regression fit. Pretty awesome that `blogdown`

supports such nice graphics so
easily. In my old blog, I had to come up with a custom R script to use `knitr`

to process R Markdown to standard markdown,
allowing them to be rendered on my old Jekyll blog.

#### Related

#### Back to Linux - Adventures with the X1 Carbon

##### A review of getting a new 6th-generation X1 Carbon set up with Ubuntu Linux

`computing`

`productivity`

`open source`

#### 2018 - a midyear review

##### A midway review of my professional life in 2018

`statistics`

`graduate school`

`productivity`