##################################################### #### Week 9 Lab: Introduction to Interactions #### #### Friday, November 3rd, 2017 #### ##################################################### # Open and read Lab9_Description.docx (CollaborationStudy) #### Preliminaries: Examine the data #### library(lmSupport) # Read in the data. # Get descriptives # Univariate plots # For now, we're not going to pay attention to the creativity variable. # Bivariate correlation cor(d[, c()]) # Overarching question: # Test the hypothesis that older participants benefit more from collaboration than # younger participants. In other words, test the hypothesis that the effect of the Collaboration vs. # Observation condition on causal learning depends on age. # * you'll note that this is a dichotomous by continuous interaction, which you haven't yet # covered in lecture. We're starting with this example because it's easier to illustrate what's # going on. # Center the IVs # Why do we need to center the variables, especially if we intend to estimate an interactive model? ########################################### #### Review: The three parameter model #### ########################################### # First, let's estimate an additive model predicting the percentage of correct inferences drawn from Condition and Age. # Is there a significant effect of Condition? # How about Age? library(effects) ######################################################## #### Preliminaries cont.: Do we have any outliers? #### ######################################################## # Let's just look at the residuals and the influence plot so we can move through this part quickly. ################################################################ #### Preliminaries cont.: Are we meeting model assumptions? #### ################################################################ ##################################### #### NEW! The Interactive Model #### ##################################### # Estimate the interactive model and interpret regression estimates # verbose method: # Create a new variable that is the product of ConditionC and AgeC # Then fit the model including ConditionC, AgeC and the interaction term. # An alternative method (same result) # Another, even shorter alternative (and most commonly used) # Why is the following model not equivalent, and NOT the correct model to test our question? mWRONG = lm(Inferences ~ ConditionC : AgeC, data=d) modelSummary(mWRONG) #### Interpret each of the coefficients in the model #### ## Intercept b0 = ## ConditionC b1 = ## AgeC b2 = ## IntC b3 = 2.4 # At this point, this is a little unclear. It will become more clear when we consider simple effects, below. # Compare to the original model. # How did the SEs change? modelSummary(m1) # How much more variance are we explaining? ## Determine the regression line for the Observation condition (ObservationC = - 0.5) # Inferences = # Inferences (in the Observation Condition) = # In the Observation condition... ## Determine the regression line for the Collaboration condition (ConditionC = 0.5) # Inferences = # Inferences (in the Collaboration Condition) = # In the Collaboration condition... # The simple slopes show that the effect of age on inferences is bigger (or steeper) in the collaboration # condition than it is in the Observation condition. This is an easier way to interpret the interaction. # Notice that the slope for age in these two equations differs by b3! ######################################################################################################### #### Scatter plot for interaction of 2 predictors (1 continuous, 1 dichotomous) - 2 regression lines #### ######################################################################################################### # Create a new variable, ConditionStr, that codes Condition as a character string # refit model using the string version of condition and raw age (the plot will make more sense) # predict data from the model # make a new data frame predictor set to a range of values from the min to max with >= length of actual data # Create the plot library(ggplot2) plot1 = ggplot() + geom_point() + # add raw data points, set color AND shape by group scale_colour_brewer(palette="Set1") # use this function to change the color pallette; see help for options (or google it) plot1 plot1 = plot1 + geom_smooth() + theme_bw() plot1 plot1 = plot1 + coord_cartesian() + # specify the range of the axes labs() + # clarify axis name theme() plot1 # there's one thing that slightly suboptimal about these lines. What is it? ################################### #### Interpret the Graph #### ################################### # blue points: # red points: # blue line: # red line: # grey bands: # what helps us infer the interaction is significant, looking at this graph? ############################################### #### Continuous by continuous interactions #### ############################################### # Now we'll take another look at that creativity variable. # Do kids who are more creative make more inferences? # Does the relationship of creativity on inferences vary by age? That is, is creativity # more "helpful" for children who are older? # Center creativity # Run an additive model # Result? # Run an interactive model # Result? # You can use an effects plot to give you a certain idea of what's going on. # You can see this yourself by calculating a couple simple effects. Consider our equation # and the range of our variables. # regression line for age 10 months below mean? # 8 months above mean? ########################################### #### PLOTTING CONTINUOUS BY CONTINUOUS #### ########################################### # It's ultimately up to you to decide how many regression lines to include on your plot, but the # general recommendation is 2 (or maybe 3). You want to choose values that are meaningful or display # something useful about the data as a whole. You also want to make a smart choice about which variable # to put on the x axis and to represent by the different lines # For our plot, we'll put age on the x axis, as it has a larger range and is more useful to think about # continuously. For creativity, we'll plot lines for ±1 SD around the mean of creativity. You might # consider these lines to represent generally creative and uncreative individuals. # Uncentered version for plotting: # We make our model predictions based on these values, and we make them separately! Otherwise our CIs get messed up. plot2 = ggplot() + geom_point() plot2 # unfortunately, we lose information about an individual's creativity score here. You can sort of fix # this, but it can be sort of difficult to see meaninfully. plot2 = ggplot() + geom_point() plot2 # Add both regression lines plot2 = plot2 + geom_smooth() + geom_smooth() + theme_bw() plot2 # Clean up plot2 = plot2 + coord_cartesian() + # specify the range of the axes labs() + # clarify axis name theme() plot2 # The nice thing about setting color this way is the line colors coordinate with the colors of the raw data points. # However, they are sort of difficult to distinguish super easily. You can play around with the colour_brewer to make # things clear and aesthetically pleasing. # * Note: creativity was an entirely random variable (I literally gave everyone a random number based on a normal # distribution with a mean of 4), but we still found significant effects. Type I errors are real!!