##################################################### #### Week 9 Lab: Introduction to Interactions #### #### Friday, November 3rd, 2017 #### ##################################################### # Open and read Lab9_Description.docx (CollaborationStudy) #### Preliminaries: Examine the data #### library(lmSupport) # Read in the data. d = dfReadDat("Lab9_DemoData.dat") str(d) # Get descriptives varDescribe(d) # Univariate plots varPlot(d$Age) varPlot(d$Condition) varPlot(d$Inferences) varPlot(d$Creativity) # For now, we're not going to pay attention to the creativity variable. # Bivariate correlation cor(d[, c("Age", "Condition", "Inferences")]) # Overarching question: # Test the hypothesis that older participants benefit more from collaboration than # younger participants. In other words, test the hypothesis that the effect of the Collaboration vs. # Observation condition on causal learning depends on age. # * you'll note that this is a dichotomous by continuous interaction, which you haven't yet # covered in lecture. We're starting with this example because it's easier to illustrate what's # going on. # Center the IVs d$AgeC = d$Age - mean(d$Age) d$ConditionC = varRecode(d$Condition, Old=c(0,1), New=c(-0.5, 0.5)) # Why do we need to center the variables, especially if we intend to estimate an interactive model? # Remember this week's lecture! ########################################### #### Review: The three parameter model #### ########################################### # First, let's estimate an additive model predicting the percentage of correct inferences drawn from Condition and Age. m1 = lm(Inferences ~ ConditionC + AgeC, data=d) modelSummary(m1) # Is there a significant effect of Condition? # Yes, students in the collaboration condition do better # How about Age? # Yes, older children also do better library(effects) plot(effect('AgeC', m1)) # Plotting the Age effect. plot(effect('ConditionC', m1)) # Plotting the Condition effect. ######################################################## #### Preliminaries cont.: Do we have any outliers? #### ######################################################## # Let's just look at the residuals and the influence plot so we can move through this part quickly. res = modelCaseAnalysis(m1, "RESIDUALS") d[res$Rownames,] # no clear outliers, really inf = modelCaseAnalysis(m1, "INFLUENCEPLOT") d[inf$Rownames,] varDescribe(d) # These are just the oldest and youngest children. We probably don't need to be too worried # about them. ################################################################ #### Preliminaries cont.: Are we meeting model assumptions? #### ################################################################ modelAssumptions(m1, "NORMAL") # this looks quite nice modelAssumptions(m1, "CONSTANT") # this isn't great, but it appears as though one point is making # it appear worse than it actually is. modelAssumptions(m1, "LINEAR") # look to be basically ok. ##################################### #### NEW! The Interactive Model #### ##################################### # Estimate the interactive model and interpret regression estimates # verbose method: # Create a new variable that is the product of ConditionC and AgeC d$InterC = d$ConditionC*d$AgeC # Then fit the model including ConditionC, AgeC and the interaction term. mInt = lm(Inferences ~ ConditionC + AgeC + InterC, data=d) modelSummary(mInt) # An alternative method (same result) mIntAlt1 = lm(Inferences ~ ConditionC + AgeC + ConditionC : AgeC, data=d) modelSummary(mIntAlt1) # Another, even shorter alternative (and most commonly used) mIntAlt2 = lm(Inferences ~ ConditionC * AgeC, data=d) modelSummary(mIntAlt2) # Why is the following model not equivalent, and NOT the correct model to test our question? mWRONG = lm(Inferences ~ ConditionC : AgeC, data=d) modelSummary(mWRONG) # It doesn't include simple effects #### Interpret each of the coefficients in the model #### ## Intercept b0 = 38.54; The estimated number of inferences made by a child of average age and neutral # with respect to condition is 38.54. ## ConditionC b1 = 11.15; For a participant of average age (in this study that's about 51 months), # there is a significant effect of condition on inferences, such that participants (of average age) # in the Collaboration condition made an average of 11.15 more correct causal inferences than participants # (of average age) in the Observation condition. OR: Our model predicts that 51 month-olds would make 11.15 # more correct causal inferences in the Collaboration condition than they would in the Observation condition. # (This is the effect of Condition on Inference at the average age.) ## AgeC b2 = 1.4; Averaging across conditions, there is a significant effect of age on inferences, such # that a one month change in age is associated with making 1.41 more causal inferences at posttest. # (In other words, there is a positive relationship between age and inferences, such that older children # are better at learning about the blicket detector, regardless of condition.) OR: Our model predicts that # among children who are neutral with respect to condition condition, a one month change in age is associated # with a 1.41 increase in the number of causal inferences that they will make at posttest. ## IntC b3 = 2.4; The effect of age on inferences differs by condition. The slope of the age-inference # relationship is 2.4 units greater in the Collaboration condition than it is in the Observation condition. # At this point, this is a little unclear. It will become more clear when we consider simple effects, below. # Compare to the original model. # How did the SEs change? modelSummary(m1) # The SEs are lower when we add the interaction. Releasing the constraint of setting b3 equal to zero # (not specifying an interaction) increases our model fit. In other words, allowing our predictors # to interact increases our model fit. # How much more variance are we explaining? # We went from explaining 20.6% of the variance in inferences to 32.4%. This is a substantial increase. ## Determine the regression line for the Observation condition (ObservationC = - 0.5) # Inferences = 38.54 + 11.15(-0.5) + 1.408*AgeC + 2.43(-0.5)*AgeC 38.54 + (11.15*-.5) # New intercept 1.41 + (2.43*-.5) # New Age coefficient # Inferences (in the Observation Condition) = 33 + .2*AgeC # In the Observation condition, a one month change in age is associated with a .2 point increase in the number # of causal inferences made. ## Determine the regression line for the Collaboration condition (ConditionC = 0.5) # Inferences = 38.54 + 11.15(0.5) + 1.408*AgeC + 2.43(0.5)*AgeC 38.54 + (11.15*.5) # New intercept 1.41 + (2.43*.5) # New Age coefficient # Inferences (in the Collaboration Condition) = 44 + 2.6*AgeC # In the collaboration condition, a one month change in age is associated with a 2.6 point increase in the number # of causal inferences made. # The simple slopes show that the effect of age on inferences is bigger (or steeper) in the collaboration # condition than it is in the Observation condition. This is an easier way to interpret the interaction. # Notice that the slope for age in these two equations differs by b3: 2.4! # testing simple effect of age for the observation condition. modObs = lm(Inferences ~ AgeC*Condition, data=d) modelSummary(modObs) # testing the simple effect of condition for 40 month olds d$Age40C = d$Age - 40 mod40 = lm(Inferences ~ Age40C*ConditionC, data=d) modelSummary(mod40) ######################################################################################################### #### Scatter plot for interaction of 2 predictors (1 continuous, 1 dichotomous) - 2 regression lines #### ######################################################################################################### # Create a new variable, ConditionStr, that codes Condition as a character string d$ConditionStr = varRecode(d$Condition, c(0,1), c("Observation","Collaboration")) # refit model using the string version of condition and raw age (the plot will make more sense) mPlot = lm(Inferences ~ Age*ConditionStr, data=d) # predict data from the model X = expand.grid(Age = seq(min(d$Age), max(d$Age), length=80), ConditionStr=c("Observation","Collaboration")) # make a new data frame predictor set to a range of values from the min to max with >= length of actual data Y = modelPredictions(mPlot, X) # Create the plot library(ggplot2) plot1 = ggplot(data=d, aes(x = Age, y = Inferences, color = ConditionStr)) + geom_point(aes(shape=ConditionStr)) + # add raw data points, set color AND shape by group scale_colour_brewer(palette="Set1") # use this function to change the color pallette; see help for options (or google it) plot1 plot1 = plot1 + geom_smooth(aes(ymin = CILo, ymax = CIHi, y=Predicted), # the predicted regression lines for condition data = Y, stat = "identity") + theme_bw(base_size = 14) plot1 plot1 = plot1 + coord_cartesian(xlim = c(36,64), ylim = c(0,100)) + # specify the range of the axes labs(x = 'Age (months)') + # clarify axis name theme(legend.position = c(.2,.93), # positioning legend (play around with values; (0,0) is bottom left, and (1,1) is top right) legend.background = element_blank(), # removing background of legend legend.title = element_blank()) # removing title of legend plot1 # there's one thing that slightly suboptimal about these lines. What is it? # the lines are based on the overall observed min and max of age, not the min and max within a given condition # this means we are extrapolating slightly. ################################### #### Interpret the Graph #### ################################### # blue points: raw observations in the Observation condition # red points: raw observations in the Collaboration condition # blue line: estimated relationship between age and inferences in the Observation condition # red line: estimated relationship between age and inferences in the Collaboration condition # grey bands: ±1 SE bands # what helps us infer the interaction is significant, looking at this graph? the slopes are # quite different between the two conditions. ############################################### #### Continuous by continuous interactions #### ############################################### # Now we'll take another look at that creativity variable. # Do kids who are more creative make more inferences? # Does the relationship of creativity on inferences vary by age? That is, is creativity # more "helpful" for children who are older? # Center creativity d$CreativityC = d$Creativity - mean(d$Creativity) # Run an additive model m2 = lm(Inferences ~ CreativityC + AgeC, data=d) modelSummary(m2) # Result? There are significant positive effects of both creativity and age. # Run an interactive model m3 = lm(Inferences ~ CreativityC * AgeC, data=d) modelSummary(m3) # Result? There are significant simple effects as well as a significant interaction # between age and creativity. However, this is difficult to interpret without breaking # it down a bit (even moreso than earlier, since both are continuous). # You can use an effects plot to give you a certain idea of what's going on. plot(effect('CreativityC:AgeC', m3)) # It appears as though the effect of creativity becomes more positive as age increases. # You can see this yourself by calculating a couple simple effects. Consider our equation # and the range of our variables. varDescribe(d) modelSummary(m3) # regression line for age 10 months below mean? # 3.23*CreativityC + -10(1.6) + .97(-10)*CreativityC -10*1.6 .97*-10 # -16 - 6.47*CreativityC # effect of creativity for this individual is negative # 8 months above mean? # 3.23*CreativityC + 8(1.6) + .97(8)*CreativityC 8*1.6 .97*8 + 3.23 # 12.8 + 10.99*CreativityC # effect of creativity for this individual is positive ########################################### #### PLOTTING CONTINUOUS BY CONTINUOUS #### ########################################### # It's ultimately up to you to decide how many regression lines to include on your plot, but the # general recommendation is 2 (or maybe 3). You want to choose values that are meaningful or display # something useful about the data as a whole. You also want to make a smart choice about which variable # to put on the x axis and to represent by the different lines # For our plot, we'll put age on the x axis, as it has a larger range and is more useful to think about # continuously. For creativity, we'll plot lines for ±1 SD around the mean of creativity. You might # consider these lines to represent generally creative and uncreative individuals. # Uncentered version for plotting: mPlot2 = lm(Inferences ~ Creativity*Age, data=d) # We make our model predictions based on these values, and we make them separately! Otherwise our CIs get messed up. XLow = expand.grid(Age = seq(min(d$Age),max(d$Age),length=80), Creativity = mean(d$Creativity)-sd(d$Creativity)) XHi = expand.grid(Age = seq(min(d$Age),max(d$Age),length=80), Creativity = mean(d$Creativity)+sd(d$Creativity)) YLow = modelPredictions(mPlot2, XLow) YHi = modelPredictions(mPlot2, XHi) plot2 = ggplot(data=d, aes(x = Age, y = Inferences)) + geom_point() plot2 # unfortunately, we lose information about an individual's creativity score here. You can sort of fix # this, but it can be sort of difficult to see meaninfully. plot2 = ggplot(data=d, aes(x=Age, y = Inferences, color=Creativity)) + geom_point() plot2 # Add both regression lines plot2 = plot2 + geom_smooth(aes(y = Predicted, ymin = CILo, ymax = CIHi), data=YLow, stat='identity') + geom_smooth(aes(y = Predicted, ymin = CILo, ymax = CIHi), data=YHi, stat='identity') + theme_bw() plot2 # Clean up plot2 = plot2 + coord_cartesian(xlim = c(36,64), ylim = c(0,100)) + # specify the range of the axes labs(x = 'Age (months)') + # clarify axis name theme(legend.position = c(.2,.85)) plot2 # The nice thing about setting color this way is the line colors coordinate with the colors of the raw data points. # However, they are sort of difficult to distinguish super easily. You can play around with the colour_brewer to make # things clear and aesthetically pleasing. # * Note: creativity was an entirely random variable (I literally gave everyone a random number based on a normal # distribution with a mean of 4), but we still found significant effects. Type I errors are real!!