1. I assessed the balance between the two groups by running an ASD analysis. Generally, if a covariate has an ASD score that has an absolute value greater than 0.1, it is said to be unbalanced. There are a number of covariates in this dataset which fall outside this range, they include: i_sex, i_race_1, i_race_2, i_educ_4, i_educ_5, com_t, and pcs_sd.

2a. There are 48 observations in total that fall outside the range of the other group. There are 40 observations from the treatment group that have p-values larger than the largest p-value for the non treatment group, and there are 8 values in the non-treatment group with p-values smaller than the minimum in the treatment group.

2b. The covariates do not appear to be all that much more balanced than they were previously. When looking at the 0.1 absolute value threshold, all of the following fall outside of this range: i_sex, i_educ_4, i_educ_5, i_educ_6, com_t, pcs_sd, mcs_sd. The only covariate that is no longer out of range after the change from the first one is the race variable.

2c. The causal effect Q can be estimated by taking the average satisfaction status of the patients in the treatment group and subtracting it from the average satisfaction status of the patients in the non-treatment group. The data for the satisfaction status is categorical in my dataset, so I temporarily created a numeric copy of the column and used this to calculate the causal effect and confidence interval.

I found that the causal effect of the treatment group in this study is -0.19 and the confidence interval ranges from -0.28 to -0.11. In context, this means that patients in the treatment group were 19% less likely to indicate that they were satisfied than members of the non treatment group. We are 95% confident that the true causal effect of the treatment group is between 11% to 28% less likely to indicate satisfaction. This means that being in the treatment group has a statistically significant negative effect on satisfaction rating.

## [1] 0.04632011

2d. Using a logistic regression with the satisfaction score as the dependent variable, I constructed a model using the main effects of all pretreatment variables on the matched data. Looking at the output of my model, the causal odds ratio is .32 for the treatment group. The variable is statistically significant judging from the 95% confidence interval I constructed which ranges from an odds ratio of .1 to .94. In the context of this problem, it means that being a part of the treatment group decreases the likelihood that you are satisfied by the log odds of 1.15.

  Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.24 6.094 -1.516 0.1295
pg1 -1.151 0.5808 -1.982 0.04744
i_age 0.03152 0.03264 0.9657 0.3342
i_sex1 0.07931 0.4161 0.1906 0.8488
i_race1 2.263 1.742 1.299 0.1939
i_race2 -17.25 2219 -0.007776 0.9938
i_race3 20.41 1899 0.01075 0.9914
i_race4 1.977 1.269 1.557 0.1194
i_educ2 16.34 1874 0.008717 0.993
i_educ3 -0.4017 1.462 -0.2747 0.7835
i_educ4 0.2417 1.123 0.2153 0.8295
i_educ6 -0.668 0.4533 -1.474 0.1406
i_insu2 0.2248 0.7374 0.3049 0.7605
i_insu5 0.9669 1.177 0.8215 0.4114
i_drug 3.73 2.203 1.693 0.09044
i_seve1 0.895 0.7114 1.258 0.2084
i_seve2 0.1123 0.4581 0.2452 0.8063
i_seve4 -1.03 0.8461 -1.218 0.2233
com_t 0.7779 0.7245 1.074 0.2829
pcs_sd -0.1019 0.07398 -1.378 0.1682
mcs_sd -0.01214 0.03211 -0.3782 0.7053
distance 6.317 6.429 0.9826 0.3258

(Dispersion parameter for binomial family taken to be 1 )

Null deviance: 241.6 on 193 degrees of freedom
Residual deviance: 196.3 on 172 degrees of freedom

Waiting for profiling to be done…

  2.5 % 97.5 %
(Intercept) -21.61 2.47
pg1 -2.359 -0.06347
i_age -0.03192 0.09718
i_sex1 -0.7449 0.8949
i_race1 -1.113 5.783
i_race2 NA 265.2
i_race3 -55.96 672.1
i_race4 -0.4793 4.545
i_educ2 -225.6 NA
i_educ3 -3.305 2.51
i_educ4 -1.968 2.466
i_educ6 -1.575 0.2112
i_insu2 -1.219 1.694
i_insu5 -1.371 3.361
i_drug -0.3705 8.446
i_seve1 -0.4664 2.351
i_seve2 -0.7797 1.028
i_seve4 -2.72 0.6217
com_t -0.6453 2.221
pcs_sd -0.2508 0.04152
mcs_sd -0.07647 0.05035
distance -6.249 19.17
weights NA NA

2e. Rerunning this analysis using the one to many matching approach with the five nearest neighbors caused slight changes in results.It resulted in 91 matched datapoints with 14 dropped. It improved the covariate balance, as sex and standard mental comorbidity scale (mcs_sd) are no longer imbalanced, though all the other variables discussed in 2b still have scores with an absolute value greater than 0.1. These variables are: i_educ_4, i_educ_5, i_educ_6, com_t, and pcs_sd.

The average Causal effect of this analysis is -0.15 with a 95% confidence interval of -0.06 to -0.23. This means that the result was statistically significant, and it aligns with our previous finding that there is a negative effect between being in the treatment group and reported satisfaction. This analysis indicates that members of the treatment group are 15% less likely to report they are satisfied.

Putting the one-to-many matched data into a logistic regression model with all other variables, it supported the notion of our previous findings and the causal effect we examined above. The causal odds ratio in this regression is 0.46, supporting the idea that patients in the treatment group are less likely to be satisfied than members of the non-treatment group. The 95% confidence interval which spans from 0.22 to 0.98 which indicates that this study finds the negative relationship between the treatment group and satisfaction rating statistically significant.

  Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.12 5.326 -1.337 0.1812
pg1 -0.767 0.3884 -1.975 0.04829
i_age 0.02434 0.02931 0.8305 0.4062
i_sex1 0.2093 0.3709 0.5642 0.5726
i_race1 1.79 1.548 1.156 0.2477
i_race2 -17.6 2239 -0.007857 0.9937
i_race3 19.24 1677 0.01147 0.9908
i_race4 1.486 1.132 1.313 0.189
i_educ2 16.63 2004 0.0083 0.9934
i_educ3 -0.4016 1.242 -0.3234 0.7464
i_educ4 0.001405 0.9984 0.001408 0.9989
i_educ6 -0.7877 0.4264 -1.847 0.06469
i_insu2 -0.07064 0.6738 -0.1048 0.9165
i_insu5 0.4861 1.083 0.4491 0.6534
i_drug 3.138 2.045 1.535 0.1249
i_seve1 0.8067 0.6443 1.252 0.2105
i_seve2 0.1986 0.4097 0.4847 0.6279
i_seve4 -0.8435 0.7239 -1.165 0.244
com_t 0.6132 0.6389 0.9598 0.3372
pcs_sd -0.08831 0.06569 -1.344 0.1788
mcs_sd -0.01449 0.0289 -0.5014 0.6161
distance 4.746 5.452 0.8705 0.384

(Dispersion parameter for binomial family taken to be 1 )

Null deviance: 281.3 on 223 degrees of freedom
Residual deviance: 238.7 on 202 degrees of freedom

Waiting for profiling to be done…

  2.5 % 97.5 %
(Intercept) -17.87 3.127
pg1 -1.549 -0.0192
i_age -0.03293 0.08281
i_sex1 -0.5236 0.9368
i_race1 -1.237 4.888
i_race2 NA 269.1
i_race3 -43.65 619.6
i_race4 -0.7389 3.744
i_educ2 -242 NA
i_educ3 -2.847 2.063
i_educ4 -1.955 1.979
i_educ6 -1.641 0.03784
i_insu2 -1.4 1.256
i_insu5 -1.67 2.699
i_drug -0.6711 7.52
i_seve1 -0.4288 2.118
i_seve2 -0.5985 1.016
i_seve4 -2.283 0.5715
com_t -0.6276 1.892
pcs_sd -0.22 0.03906
mcs_sd -0.07221 0.0418
distance -5.882 15.61
  1. I feel more comfortable with the results of the one-to-many neighbor matching in this assignment as opposed to the nearest neighbor one-to-one matching. Using the many-to-one method, there were fewer imbalanced covariates (five) than when we used the one to one nearest neighbor matching (seven). Having more balanced covariates increases my confidence in the result of the analysis. Perhaps since this data set is relatively small, having multiple neighbor matches is better. Hypothetically more data points in a large dataset would make it easier to match points to closer neighbors and not have such large gaps in values.

The standard errors for the two models are virtually identicle