######################################################## #### Lab 12: Categorical 2 X 3 / 3 X 2 designs #### #### and Intro to Repeated Measures: #### #### 1 Dec 17 #### ######################################################## # OVERVIEW OF TODAY # # 1. Contrast and Dummy coding in a 2 (Between-subjects) X 3 (Between-subjects) design # 2. Repeated measures intro with a 2 (WITHIN-subjects) X 3 (Between-subjects) design library (lmSupport) library(ggplot2) library (car) # 1. Contrast and Dummy coding in a 2 (Between-subjects) X 3 (Between-subjects) design # For our first example: # A group of educational psychologists devises an intervention that they believe will help # boost the performance of students in a large University class. The educational psychologists # also wonder if this intervention would be more effective for minority students in the class, # in effect reducing the achievement gap. # The codebook for this dataset is given below. d <- dfReadDat("Lab12_DemoData1.dat") # Codebook for "Lab12_DemoData1.dat" # Columns Variable Description Values # 1 id Student ID 1 - 120 # 2 cond Experimental condition "control", "intervention" # 3 race Student race "black", "hispanic", "white", # 4 perf Student's class performance 0 - 100 (DV) # Get a sense of what's going on str(d) head(d) # For sake of time, we will assume the data is clean and that model assumptions are all met # Let's see if we can get a rough sense of how things are behaving based on the means varDescribe(d) varDescribe(d$perf) varDescribeBy(d$perf, d$condition) # intervention scored higher, descriptively # So let's test some hypotheses # Hypothesis 1: There will be an overall effect of Condition such that performance is better in the intervention # than control groups averaged across all races # Hypothesis 2. There is a Race x Condition interaction such that the intervention has a different effect among White students # than among non-White (minority) students (i.e., Race moderates the Condition effect) # We want to translate our hypotheses into contrasts for both condition and race. # We can actually create our contrasts in a way that will allow us to test both hypotheses! # first see what data type condition is class(d$condition) # It is already a factor # Sometimes your data will already read in as factors and sometimes it will not. It is generally good practice # to set your factors yourself. If for no other reason, this will allow you to order the levels in the way you want d$condition = factor(d$condition, levels = c('control', 'intervention')) # Up until now, you have mostly been manually centering your variables. # However, we know from last lab that we can use varContrasts to set contrasts for this factor contrasts(d$condition) <- varContrasts(d$condition, Type = "POC", POCList = list(c(-.5, .5)), Labels = "Int_ConC") # we can always see these labels by using contrasts contrasts(d$condition) # Note: we did not change the name of the factor but it is now centered. If you think you might forget if a variable is # centered or not, you can always change the name of it or make a new variable with the regressors coded. ## Create Race Contrasts ## d$race = factor(d$race, levels = c('black', 'hispanic', 'white')) levels(d$race) contrasts(d$race) <- varContrasts(d$race, #The factor Type = "POC", #type of contrast POCList = list(c(-1, -1, 2), c(1, -1, 0)), # provide the contrasts (in order!) Labels = c("White_Minority", "Black_Hispanic")) # Name the contrasts # We can now see these contrasts contrasts(d$race) # Run the interactive model. m1 <- lm(perf ~ race*condition, data=d) modelSummary(m1) modelEffectSizes(m1) #But what if we want to get the effect sizes for the individual contrasts? d <- varRegressors(d, "race", c("WvM", "BvH")) # Then could have used this code: m2 <- lm(perf ~ (WvM + BvH)*condition, data=d) modelSummary(m2) modelEffectSizes(m2) # now we get effect sizes separated by contrast. #### Interpret the results #### # Averaging across races, performance in the Intervention condition was 17.7 points # higher than in the Control condition, t(114) = 7.32, p < .01, pn^2 = .24. This result is consistent # with our hypothesis that there is, on average, an effect of the intervention # Our data support the hypothesis that the intervention has a different effect among white students # than among non-White students The effect of the intervention was 21.75 less among white students # than non-White children, t(114) = -4.1, p < .05, pn^2 = .13. # Now say you submit this study for peer review and Reviewer 1 asks if the effect of the intervention # differs for Black vs. Hispanic individuals. Then Reviewer 2 asks if the effect of the intervention differs # for Black vs. White individuals. THEN Reviewer 3 asks if the effect of the intervention differs for Hispanic # vs White individuals. You realize that yourself or others could have reasonably had planned hypotheses that line up # with these tests. How can you test all these contrasts? # Dummy codes! # However if you were going to plan these tests you would need to do what? # Correct for multiple comparisons! # How could you do this correction? # You can use Fisher's LSD! # We need to see if race by condition interaction is significant # As we learned last week, we can use the Anova function m3 <- lm(perf ~ race * condition, data = d) Anova(m3, type=3) # Do we satisfy the condition of Fisher LSD? # Yep! ### Interaction of condition and white v black and white v hispanic ### levels(d$race) contrasts(d$race) <- varContrasts(d$race, Type = "Dummy", RefLevel = 3, Labels= c('B_W', 'H_W')) # Fit a model that tests the reviewers' questions; White students # are the reference group # We'll want the individual contrasts variables in our data frame to get # an effect size for each of them, so code them out d <- varRegressors(d, "race", c("bvw", "hvw")) some(d) m4 <- lm(perf ~ (bvw + hvw) * condition, data = d) modelSummary(m4) modelEffectSizes(m4) # Remember this gives you the same estimates as m5 <- lm(perf ~ race * condition, data = d) modelSummary(m5) modelEffectSizes(m5) # But this one does not allow you to get the individual effect sizes #Interpret the relevant coefficients # wvb:condition: The effect of the intervention was 17.70 units greater for # Black students than for White students # wvh:condition: The effect of the intervention was 24.65 units greater for # Hispanic students than for White students # To get the third test, we have to make a different racial group the reference group contrasts(d$race) <- varContrasts(d$race, Type = "Dummy", RefLevel = 2, Labels=c('B_H', 'W_H')) d <- varRegressors(d, "race", c("bvh", "wvh")) # Obtain the third test of whether the effect of the intervention varies across the different # pairwise comparisons of racial groups m6 <- lm(perf ~ (bvh + wvh) * condition, data = d) modelSummary(m6) modelEffectSizes(m6) # bvh:condition: The effect of the intervention was 6.95 units more negative for Black students # than for Hispanic students, but this difference was not different from 0 # wvh:condition: same as above! # Note: Alternatively, you could have skipped testing the main effect and used the Holm-Bonferroni approach # by correcting the p values of the tests (see last week's demo) #### Bar plot of the main effect of race #### #windows() mod <- lm(perf ~ race*condition, data = d) predictorX <- expand.grid(race = levels(d$race), condition = levels(d$condition)) predictorX predictedY <- modelPredictions(mod, predictorX) predictedY library(cowplot) barplot <- ggplot(data=d, aes(x = race, y = perf, fill=condition)) + geom_bar(mapping = aes(y=Predicted), data = predictedY, stat = "identity",width = 0.5, position_dodge()) + #geom_point(colour='darkgrey', position = position_jitter(w = .1, h=0)) + geom_errorbar(data = predictedY, width=.25, aes(y = Predicted, ymin = CILo, ymax = CIHi), stat="identity", position_dodge(.5)) + labs(y = 'Performance', x = 'Race', fill='Condition') + coord_cartesian(ylim = c(15,105), expand=T) barplot #### 2. Repeated measures intro with a 2 (WITHIN-subjects) X 3 (Between-subjects) design #### # Note: The repeated measures data we will work with today will already be in wide format: # a participant's repeated responses are in a single row, and each response is in a separate column. # Some data, especially repeated measures data, will often start in long format: # each row is one time point per participant. So each participant will have data in multiple rows. # Any variables that don't change across time will have the same value in all the rows. # Next week we will show you how to go from wide format to long format in R. rm(list=ls()) #clear all objects from workspace d <- dfReadDat("Lab12_DemoData2.dat") # For your intro to repeated measures, we will use real (modified) data from a study in John and Daniel's # lab. Evidence suggests that the pharmacological properties of alcohol reduces anxiety about # unpredictably bad events more so than anxiety about predictably bad events. In this experiment, # participants were divided into three groups and given an alcohol beverage, a placebo beverage # (deceptively told alcohol but only got an alcohol flavored juice drink), or a control beverage (truthfully # told no alcohol and actually got a regularly flavored juice drink). All participants were hooked up to # electrodes and given mild electric shocks whenever they saw cues come up on a computer screen. # Some cues signaled predictable shock (shock would be of a specified intensity that was previewed to # the participants before the study started) and some cues signaled unpredictable shock (shock intensity # would be of some unknown level). Cue types were counterbalanced. At the end of the experiment, # participants rated how anxious they were (scale of 0-5) after seeing each type of cue. #H ypothesis: # 1. Alcohol will significantly reduce anxiety about unpredictable shock. # 2. Furthermore, the effect of alcohol on self-reported anxiety to unpredictable shock is pharmacological, # so there should not be an expectancy effect from the participants thinking they drank alcohol (placebo group). # Codebook for "Lab12_DemoData2.dat" # Columns Variable Description Values # 1 Unpredictable Self-reported anxiety to Unpredictable shock 0 - 4.75 # 2 Predictable Self-reported anxiety to Predictable shock 0 - 4.25 # 3 BG Participant's beverage group assignment Control (no alcohol), Alcohool, Placebo # Get a sense of what's going on str(d) head(d) table(d$BG) varDescribe(d) # Again, for sake of time, we will assume the data is clean and that model assumptions are all met # Hypothesis: Alcohol but not placebo drinks will reduce self-reported anxiety to unpredictable shock. # First let's make sure BG is a factor class(d$BG) # yep # What are its levels? levels(d$BG) d$BG = factor(d$BG, levels=c('CON', 'PLA', 'ALC')) #but we can reorder the levels how we like levels(d$BG) # What set of orthoginal contrasts can we make to test this hypothesis? contrasts(d$BG) = varContrasts(d$BG, Type='POC', POCList = list(c(-1,-1,2),c(-1,1,0)),Labels = c('A_CP', 'P_C')) # Our questions are about the effect of alcohol on unpredictable shock vs predictable shock, but # there may be some designs where you first would want to test for a general, "main effect" of your focal variable # (in this case alcohol) on your dependent variable (in this case self reported anxiety). # To do that, we can first create a new variable that averages the two repeated measures together. d$UnPreAverage = (d$Unpredictable + d$Predictable)/2 m7 <- lm(UnPreAverage ~ BG, data=d) modelSummary(m7) modelEffectSizes(m7) # Interpret the intercept # Averaging across shock types and beverage groups, self-reported anxiety was 2.4 and this is different than 0. # Not important for our hypotheses # Interpret b1 # Averaging across shock types, self reported anxiety was .4 lower for alcohol versus the mean of the control and # placebo beverage groups and this was a significant difference. # Not important for our hypotheses (which have to do with predictability) # Interpret b2 # Averaging across shock types, self reported anxiety was .1 higher for placebo versus the control # groups but this was not a significant difference. # Also not important for our hypotheses # To test our hypotheses, we will need to make a difference score to remove the "dependence problem" # of the self-reported anxiety data. In other words, if our hypotheses is about an effect of alcohol # on self reported anxiety and a moderating effect of type of shock (i.e., a greater effect of alcohol # on self reported anxiety to unpredictable vs predictable shock) we need to create a new variable that # is a difference score for unpredictable vs predictable shock varDescribe(d) #It may make inerpretation easier to subtract the smaller score from the larger score d$UnPreDifference = d$Unpredictable - d$Predictable m8 <- lm(UnPreDifference ~ BG, data=d) modelSummary(m8) modelEffectSizes(m8) m9 <- lm(Unpredictable - Predictable ~ BG, data=d) #Note: you could also just do the math within the lm function # and that would give you the same result # What if we wanted to get effect size estimates for each contrast? # We need to code regressors d <- varRegressors(d, "BG", c("AvCP", "PvC")) m10 <- lm(UnPreDifference ~ (AvCP+PvC), data=d) modelSummary(m10) modelEffectSizes(m10) # Interpret the intercept # Averaging across beverage groups, self-reported anxiety was .38 higher # for unpredictable vs predictable shock # STILL not important for our hypotheses # Interpret b1 # Self reported anxiety is .16 lower for unpredictable vs predictable shock # in the alcohol beverage group compared to in the average of the placebo and control groups #Interpret b2 # Self reported anxiety is .10 lower for unpredictable vs predictable shock # in the placebo beverage group compared to in the control group but this is not a signficant difference # So how do we test the simple effects of the alcohol vs control/placebo contrasts for each type of shock? # We run seperate models for each shock type! m11 <- lm(Unpredictable ~ (AvCP+PvC), data=d) modelSummary(m11) modelEffectSizes(m11) # Interpret b1 # Self reported anxiety to Unpredictable shock is .5 lower in the alcohol beverage group compared to in the average of the # placebo and control groups # Interpret b2 # Self reported anxiety to Unpredictable shock is .11 higher in the placebo beverage group compared to in the control group # but this is not a signficant difference m12 <- lm(Predictable ~ (AvCP+PvC), data=d) modelSummary(m12) modelEffectSizes(m12) # Interpret b1 # Self reported anxiety to Predictable shock is .4 lower in the alcohol beverage group compared to in the average of the # placebo and control groups # Interpret b2 # Self reported anxiety to Predictable shock is .2 higher in the placebo beverage group compared to in the control group # but this is not a signficant difference # We will give you ggplot code to graph repeated measures data next week! # Final question to think about as you go on to your homework. What if our focal variable was unpredictable vs predictable shock # and we only wanted to test for signficant differences in unpredictable vs predictable shock anxiety within each of the # three beverage groups? What coding scheme would we use? # Dummy coding. We could set each beverage group as the reference group and test the intercept for each model