################################################################################################### #### Week 13 Lab: ANCOVA, long to wide, more repeated measures #################################### ################################################################################################### # We'll start by taking a different approach to one of last week's data sets, dealing with startle. # Then we'll look at within-subject conditions with more than 2 levels, converting them from long to wide as well. ######################################################## #### ANCOVA ############################################ ######################################################## rm(list=ls()) #clear all objects from workspace # read in the data (same as last week): d = dfReadDat('Lab13_DemoData1.dat') # What's going on here again? varDescribe(d) # Participants are in one of three conditions: alcohol, placebo, or control. They rate the anxiety they # feel after shocks that are / are not predictable. # Let's imagine the researchers consider the "predictable" shocks to be a good symbol of baseline anxiety # in response to electric shocks. There also may be only a moderate correlation between anxiety in the # predictable and unpredictable condition. # What is a difference approach we could use to anlayze the data? # ANCOVA! # ANCOVA is an acronym for analysis of covariance. This is a term that was coined in the ANOVA # literature before the GLM became more popular in psychology. ANCOVA is just a fancy term for # a model in which we're also controlling for a variable (i.e., where we're not allowing a control # variable - a covariate - to contribute to the effect of the focal predictor[s]). # What is the main benefit of using ANCOVA over a difference score? # increased power # How does it achieve this benefit? # It doesn't assume a correlation of 1 between pre-test and post-test (without intervention) # What is the correlation between predictable and unpredictable scores? cor(d$Predictable, d$Unpredictable) # .87. That's pretty high. Maybe we should check the correlation in the control group control = d[d$BG == 'CON',] cor(control$Predictable, control$Unpredictable) # .76. ok so actually a little lower. # Let's prep our BG contrasts (copied from last week) d$BG = factor(d$BG, levels = c('CON', 'PLA', 'ALC')) contrasts(d$BG) = varContrasts(d$BG, Type='POC', POCList = list(c(-1,-1,2),c(-1,1,0)),Labels = c('A_CP', 'P_C')) # but let's use varRegressors to get effect sizes d = varRegressors(d, 'BG', c('A_CP','P_C')) # Let's do an ANCOVA mCV = lm(Unpredictable ~ A_CP + P_C + Predictable, data=d) modelSummary(mCV) modelEffectSizes(mCV) # What's going on here? # Unsurprisingly, predictable shock anxiety is a very strong predictor of unpredictable shock anxiety. # When controlling for predictable shock, our first contrasts continues to have a significant effect # on unpredictable scores, indicating alcohol participants have lower anxiety in unpredictable shocks # compared to others based on what we'd expect from their predictable shock anxiety alone. # This is a complex sentence, but it's important all the elements are there. When we see a significant # effect in an ANCOVA model, we must indicate this effect has an influence over and above the pre-test, # whatever that may be. # Say I measure how much sugar someone eats in one day, then randomly assign them to either receive an # intervention discussing the health risks of sugar or not, then measure their sugar intake the # following day. If the intervention has an effect, the effect occurs over and above the predictive # power of the previous day's sugar intake. # Why is ANCOVA still probably NOT the correct way to analyze the data from this experiment? # We didn't assign the participants to beverage group after collecting the predictable trails. To # do ANCOVA, pre-test scores should be collected before experimental assignment, and group should # not be correlated with the pretest. varDescribeBy(d$Predictable, d$BG) # ...but it appears as though there is some kind of relationship. ########################################################################## #### LONG TO WIDE ######################################################## ########################################################################## rm(list=ls()) #clear all objects from workspace # We're going to return to an example we've done before, with a prejudice reduction intervention and IAT scores. # Participants are assigned to one of two conditions, and their "implicit bias" is measured at three timepoints: # right after the intervention, 4 weeks later, and 8 weeks later. dL <- dfReadDat("Lab13_DemoData2.dat", SubID=NULL) varDescribe(dL) # Are these data in long or wide format? # Long! # How do you know? # Every participant has 3 lines in the data frame, one for each observation. # In order to do the analysis we'll learn today, they must be in wide format. ?reshape # We need to give this function a dataframe and some information about what kinds of variables # we have in the data. dW = reshape(dL, timevar = 'time', idvar = c('SubID','condition'), direction='wide') # timevar: the IV/predictor that varies within participants # idvar: the things that vary between participants # direction: are we going long to wide or wide to long? varDescribe(dW) # What do the startle columns represent now? # The IAT score for a given participant at one of the three time points # The only thing that's still unclear is which column is which. R names the columns based on the # levels of the timevar we give it, which is why it's just numbers here. We could rename the values # before transforming, or just rename them now. dW = varRename(dW, c('iat.1','iat.2','iat.3'), c('W0iat','W4iat','W8iat')) varDescribe(dW) ###################################################################################### #### 2 (between) x 3+ (within) designs ############################################### ###################################################################################### # Now we can work with these data to assess the effects of the prejudice reduction intervention over time. # To justify this analysis, there would have to be a prediction that the effect of the intervention is # somehow time-varying, maybe growing over time. # How do we test the main effect of condition? # Average IAT scores across time points and regress them on condition. dW$IATM = rowMeans(dW[,c('W0iat','W4iat','W8iat')]) m2 = lm(IATM ~ condition, data=dW) modelSummary(m2) modelEffectSizes(m2) # remember, intervention is coded 0 (control) and 1 (intervention) # Conclusion? # Averaging across time points, there is no effect of the intervention. # Now we want to see whether the effect of the intervention varies based on time. # The problem is we can't use "contrasts" with the outcome since our outcome technically involves 3 different # levels, so instead we'll use math! # Let's say we first want to see whether the impact is larger in week 8 than it is during the earlier time # points overall. How to do this with math? dW$W8_W40 = dW$W8iat - (dW$W0iat + dW$W4iat)/2 m8_40 = lm(W8_W40 ~ condition, data=dW) modelSummary(m8_40) # Let's break this down a little: # b0: how much an individuals week 8 scores differ from their pooled week 0 and 4 scores, on average # b1: how much this effect of time on iat scores differs by condition. It is non-significant. # *NOTE*: technically, our outcome "contrast" isn't unit-weight. So why is it we can interpret the # parameter as representing specifically the difference between the groups? Essentially, unit weighting matters # on the predictor side but not on the outcome side. We can show this using math if you're really curious. # In this example, it makes more sense to do two pairwise comparisons instead: week 8 to week 4 and week # 4 to week 0. Let's do both (and we'll practice doing so without creating new variables) m8_4 = lm(W8iat - W4iat ~ condition, data=dW) modelSummary(m8_4) # conclusion? # the effect of the time interval between 4 and 8 weeks on IAT scores doesn't differ by condition. m4_0 = lm(W4iat - W0iat ~ condition, data=dW) modelSummary(m4_0) # conclusion? # the effect of the time interval between 0 and 4 weeks on IAT scores differs by condition: the IAT # scores of those in the intervention condition change more greatly. m8_0 = lm(W8iat - W0iat ~ condition, data=dW) modelSummary(m8_0) # conclusion? # there is some marginal indication that the group effect differs between weeks 0 and 8 # How do we get the simple effects of condition at each time point? # easy, just regress the IAT score at that time point on condition! mW0 = lm(W0iat ~ condition, data=dW) modelSummary(mW0) mW4 = lm(W4iat ~ condition, data=dW) modelSummary(mW4) mW8 = lm(W8iat ~ condition, data=dW) modelSummary(mW8) # conclusion? # the only time the intervention condition differs from control is four weeks after the study, # though at 8 weeks the direction of the effect is maintained. #### CORRECTING FOR MULTIPLE COMPARISONS #### # We're going to use Holm-Bonferroni. Even though we're using a small enough number to do Fisher's, # the means of doing it in this situation is quite complex and, because of how the F stat is calculated # in this specific situation, possibly more conservative than doing holm-bonferroni. # Let's do it for our pairwise comparisons, and bear in mind we care about the interaction term. # make a vector to include our p values ps = c(0,0,0) ps[1] = modelSummary(m8_4)$coefficients[2,4] ps[2] = modelSummary(m4_0)$coefficients[2,4] ps[3] = modelSummary(m8_0)$coefficients[2,4] ps = sort(ps) ps p.adjust(ps, "holm") # More viewable round(p.adjust(ps, "holm"), digits = 5) # Compare to the old values: round(ps, digits=5) #################################################################### #### GRAPHING ###################################################### #################################################################### dW$conditionS = as.factor(varRecode(dW$condition, c(0,1), c('Control','Intervention'))) mW0 = lm(W0iat ~ conditionS, data=dW) mW4 = lm(W4iat ~ conditionS, data=dW) mW8 = lm(W8iat ~ conditionS, data=dW) pX0 = expand.grid(conditionS = c('Control','Intervention'), time = 1) pX4 = expand.grid(conditionS = c('Control','Intervention'), time = 2) pX8 = expand.grid(conditionS = c('Control','Intervention'), time = 3) pY0 = modelPredictions(mW0, pX0) pY4 = modelPredictions(mW4, pX4) pY8 = modelPredictions(mW8, pX8) pYs = rbind(pY0, pY4) pYs = rbind(pYs, pY8) library(ggplot2) library(cowplot) bp1 = ggplot(data=pYs, aes(x=as.factor(time), y = Predicted, fill=conditionS)) + geom_bar(mapping = aes(), stat='identity', width=.5, data=pYs, position_dodge()) + geom_errorbar(width=.25, aes(ymin = CILo, ymax = CIHi), stat='identity', data=pYs, position_dodge(.5)) + labs(x = 'Observation', y = 'IAT score', fill = 'Condition') + coord_cartesian(ylim = c(.028,.6), expand=T) bp1