2a. There are 48 observations in total that fall outside the range of the other group. There are 40 observations from the treatment group that have p-values larger than the largest p-value for the non treatment group, and there are 8 values in the non-treatment group with p-values smaller than the minimum in the treatment group.
2b. The covariates do not appear to be all that much more balanced than they were previously. When looking at the 0.1 absolute value threshold, all of the following fall outside of this range: i_sex, i_educ_4, i_educ_5, i_educ_6, com_t, pcs_sd, mcs_sd. The only covariate that is no longer out of range after the change from the first one is the race variable.
2c. The causal effect Q can be estimated by taking the average satisfaction status of the patients in the treatment group and subtracting it from the average satisfaction status of the patients in the non-treatment group. The data for the satisfaction status is categorical in my dataset, so I temporarily created a numeric copy of the column and used this to calculate the causal effect and confidence interval.
I found that the causal effect of the treatment group in this study is -0.19 and the confidence interval ranges from -0.28 to -0.11. In context, this means that patients in the treatment group were 19% less likely to indicate that they were satisfied than members of the non treatment group. We are 95% confident that the true causal effect of the treatment group is between 11% to 28% less likely to indicate satisfaction. This means that being in the treatment group has a statistically significant negative effect on satisfaction rating.
## [1] 0.04632011
2d. Using a logistic regression with the satisfaction score as the dependent variable, I constructed a model using the main effects of all pretreatment variables on the matched data. Looking at the output of my model, the causal odds ratio is .32 for the treatment group. The variable is statistically significant judging from the 95% confidence interval I constructed which ranges from an odds ratio of .1 to .94. In the context of this problem, it means that being a part of the treatment group decreases the likelihood that you are satisfied by the log odds of 1.15.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -9.24 | 6.094 | -1.516 | 0.1295 |
| pg1 | -1.151 | 0.5808 | -1.982 | 0.04744 |
| i_age | 0.03152 | 0.03264 | 0.9657 | 0.3342 |
| i_sex1 | 0.07931 | 0.4161 | 0.1906 | 0.8488 |
| i_race1 | 2.263 | 1.742 | 1.299 | 0.1939 |
| i_race2 | -17.25 | 2219 | -0.007776 | 0.9938 |
| i_race3 | 20.41 | 1899 | 0.01075 | 0.9914 |
| i_race4 | 1.977 | 1.269 | 1.557 | 0.1194 |
| i_educ2 | 16.34 | 1874 | 0.008717 | 0.993 |
| i_educ3 | -0.4017 | 1.462 | -0.2747 | 0.7835 |
| i_educ4 | 0.2417 | 1.123 | 0.2153 | 0.8295 |
| i_educ6 | -0.668 | 0.4533 | -1.474 | 0.1406 |
| i_insu2 | 0.2248 | 0.7374 | 0.3049 | 0.7605 |
| i_insu5 | 0.9669 | 1.177 | 0.8215 | 0.4114 |
| i_drug | 3.73 | 2.203 | 1.693 | 0.09044 |
| i_seve1 | 0.895 | 0.7114 | 1.258 | 0.2084 |
| i_seve2 | 0.1123 | 0.4581 | 0.2452 | 0.8063 |
| i_seve4 | -1.03 | 0.8461 | -1.218 | 0.2233 |
| com_t | 0.7779 | 0.7245 | 1.074 | 0.2829 |
| pcs_sd | -0.1019 | 0.07398 | -1.378 | 0.1682 |
| mcs_sd | -0.01214 | 0.03211 | -0.3782 | 0.7053 |
| distance | 6.317 | 6.429 | 0.9826 | 0.3258 |
(Dispersion parameter for binomial family taken to be 1 )
| Null deviance: | 241.6 on 193 degrees of freedom |
| Residual deviance: | 196.3 on 172 degrees of freedom |
Waiting for profiling to be done…
| 2.5 % | 97.5 % | |
|---|---|---|
| (Intercept) | -21.61 | 2.47 |
| pg1 | -2.359 | -0.06347 |
| i_age | -0.03192 | 0.09718 |
| i_sex1 | -0.7449 | 0.8949 |
| i_race1 | -1.113 | 5.783 |
| i_race2 | NA | 265.2 |
| i_race3 | -55.96 | 672.1 |
| i_race4 | -0.4793 | 4.545 |
| i_educ2 | -225.6 | NA |
| i_educ3 | -3.305 | 2.51 |
| i_educ4 | -1.968 | 2.466 |
| i_educ6 | -1.575 | 0.2112 |
| i_insu2 | -1.219 | 1.694 |
| i_insu5 | -1.371 | 3.361 |
| i_drug | -0.3705 | 8.446 |
| i_seve1 | -0.4664 | 2.351 |
| i_seve2 | -0.7797 | 1.028 |
| i_seve4 | -2.72 | 0.6217 |
| com_t | -0.6453 | 2.221 |
| pcs_sd | -0.2508 | 0.04152 |
| mcs_sd | -0.07647 | 0.05035 |
| distance | -6.249 | 19.17 |
| weights | NA | NA |
2e. Rerunning this analysis using the one to many matching approach with the five nearest neighbors caused slight changes in results.It resulted in 91 matched datapoints with 14 dropped. It improved the covariate balance, as sex and standard mental comorbidity scale (mcs_sd) are no longer imbalanced, though all the other variables discussed in 2b still have scores with an absolute value greater than 0.1. These variables are: i_educ_4, i_educ_5, i_educ_6, com_t, and pcs_sd.
The average Causal effect of this analysis is -0.15 with a 95% confidence interval of -0.06 to -0.23. This means that the result was statistically significant, and it aligns with our previous finding that there is a negative effect between being in the treatment group and reported satisfaction. This analysis indicates that members of the treatment group are 15% less likely to report they are satisfied.
Putting the one-to-many matched data into a logistic regression model with all other variables, it supported the notion of our previous findings and the causal effect we examined above. The causal odds ratio in this regression is 0.46, supporting the idea that patients in the treatment group are less likely to be satisfied than members of the non-treatment group. The 95% confidence interval which spans from 0.22 to 0.98 which indicates that this study finds the negative relationship between the treatment group and satisfaction rating statistically significant.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -7.12 | 5.326 | -1.337 | 0.1812 |
| pg1 | -0.767 | 0.3884 | -1.975 | 0.04829 |
| i_age | 0.02434 | 0.02931 | 0.8305 | 0.4062 |
| i_sex1 | 0.2093 | 0.3709 | 0.5642 | 0.5726 |
| i_race1 | 1.79 | 1.548 | 1.156 | 0.2477 |
| i_race2 | -17.6 | 2239 | -0.007857 | 0.9937 |
| i_race3 | 19.24 | 1677 | 0.01147 | 0.9908 |
| i_race4 | 1.486 | 1.132 | 1.313 | 0.189 |
| i_educ2 | 16.63 | 2004 | 0.0083 | 0.9934 |
| i_educ3 | -0.4016 | 1.242 | -0.3234 | 0.7464 |
| i_educ4 | 0.001405 | 0.9984 | 0.001408 | 0.9989 |
| i_educ6 | -0.7877 | 0.4264 | -1.847 | 0.06469 |
| i_insu2 | -0.07064 | 0.6738 | -0.1048 | 0.9165 |
| i_insu5 | 0.4861 | 1.083 | 0.4491 | 0.6534 |
| i_drug | 3.138 | 2.045 | 1.535 | 0.1249 |
| i_seve1 | 0.8067 | 0.6443 | 1.252 | 0.2105 |
| i_seve2 | 0.1986 | 0.4097 | 0.4847 | 0.6279 |
| i_seve4 | -0.8435 | 0.7239 | -1.165 | 0.244 |
| com_t | 0.6132 | 0.6389 | 0.9598 | 0.3372 |
| pcs_sd | -0.08831 | 0.06569 | -1.344 | 0.1788 |
| mcs_sd | -0.01449 | 0.0289 | -0.5014 | 0.6161 |
| distance | 4.746 | 5.452 | 0.8705 | 0.384 |
(Dispersion parameter for binomial family taken to be 1 )
| Null deviance: | 281.3 on 223 degrees of freedom |
| Residual deviance: | 238.7 on 202 degrees of freedom |
Waiting for profiling to be done…
| 2.5 % | 97.5 % | |
|---|---|---|
| (Intercept) | -17.87 | 3.127 |
| pg1 | -1.549 | -0.0192 |
| i_age | -0.03293 | 0.08281 |
| i_sex1 | -0.5236 | 0.9368 |
| i_race1 | -1.237 | 4.888 |
| i_race2 | NA | 269.1 |
| i_race3 | -43.65 | 619.6 |
| i_race4 | -0.7389 | 3.744 |
| i_educ2 | -242 | NA |
| i_educ3 | -2.847 | 2.063 |
| i_educ4 | -1.955 | 1.979 |
| i_educ6 | -1.641 | 0.03784 |
| i_insu2 | -1.4 | 1.256 |
| i_insu5 | -1.67 | 2.699 |
| i_drug | -0.6711 | 7.52 |
| i_seve1 | -0.4288 | 2.118 |
| i_seve2 | -0.5985 | 1.016 |
| i_seve4 | -2.283 | 0.5715 |
| com_t | -0.6276 | 1.892 |
| pcs_sd | -0.22 | 0.03906 |
| mcs_sd | -0.07221 | 0.0418 |
| distance | -5.882 | 15.61 |
The standard errors for the two models are virtually identicle