#####################################################
####   Week 9 Lab: Introduction to Interactions  ####
####        Friday, November 3rd, 2017           ####
#####################################################

# Open and read Lab9_Description.docx (CollaborationStudy)

#### Preliminaries: Examine the data ####
library(lmSupport)

# Read in the data.


# Get descriptives


# Univariate plots


# For now, we're not going to pay attention to the creativity variable.

# Bivariate correlation
cor(d[, c()])

# Overarching question:
# Test the hypothesis that older participants benefit more from collaboration than
# younger participants. In other words, test the hypothesis that the effect of the Collaboration vs.
# Observation condition on causal learning depends on age.

# * you'll note that this is a dichotomous by continuous interaction, which you haven't yet
# covered in lecture. We're starting with this example because it's easier to illustrate what's
# going on.

# Center the IVs


# Why do we need to center the variables, especially if we intend to estimate an interactive model?



###########################################
#### Review: The three parameter model ####
###########################################

# First, let's estimate an additive model predicting the percentage of correct inferences drawn from Condition and Age.


# Is there a significant effect of Condition?

# How about Age?


library(effects)



########################################################
#### Preliminaries cont.: Do we have any outliers?  ####
########################################################

# Let's just look at the residuals and the influence plot so we can move through this part quickly.






################################################################
#### Preliminaries cont.: Are we meeting model assumptions? ####
################################################################



#####################################
#### NEW! The Interactive Model  ####
#####################################

# Estimate the interactive model and interpret regression estimates

# verbose method:
# Create a new variable that is the product of ConditionC and AgeC

# Then fit the model including ConditionC, AgeC and the interaction term.


# An alternative method (same result)


# Another, even shorter alternative (and most commonly used)


# Why is the following model not equivalent, and NOT the correct model to test our question?
mWRONG = lm(Inferences ~ ConditionC : AgeC, data=d)
modelSummary(mWRONG)



#### Interpret each of the coefficients in the model ####

## Intercept b0 =

## ConditionC b1 = 

## AgeC b2 = 

## IntC b3 = 2.4

# At this point, this is a little unclear. It will become more clear when we consider simple effects, below.

# Compare to the original model.
# How did the SEs change?
modelSummary(m1)


# How much more variance are we explaining?


## Determine the regression line for the Observation condition (ObservationC = - 0.5)
# Inferences = 

# Inferences (in the Observation Condition) = 
# In the Observation condition...


## Determine the regression line for the Collaboration condition (ConditionC = 0.5)
# Inferences =

# Inferences (in the Collaboration Condition) = 
# In the Collaboration condition...

# The simple slopes show that the effect of age on inferences is bigger (or steeper) in the collaboration 
# condition than it is in the Observation condition. This is an easier way to interpret the interaction. 
# Notice that the slope for age in these two equations differs by b3!



#########################################################################################################
#### Scatter plot for interaction of 2 predictors (1 continuous, 1 dichotomous) - 2 regression lines ####
#########################################################################################################

# Create a new variable, ConditionStr, that codes Condition as a character string 


# refit model using the string version of condition and raw age (the plot will make more sense)


# predict data from the model

# make a new data frame predictor set to a range of values from the min to max with >= length of actual data


# Create the plot
library(ggplot2)
plot1 = ggplot() + 
  geom_point() + # add raw data points, set color AND shape by group
  scale_colour_brewer(palette="Set1") # use this function to change the color pallette; see help for options (or google it)
plot1

plot1 = plot1 +
  geom_smooth() + 
  theme_bw()
plot1

plot1 = plot1 + coord_cartesian() + # specify the range of the axes
  labs() + # clarify axis name
  theme() 
plot1

# there's one thing that slightly suboptimal about these lines. What is it?


       
###################################
####    Interpret the Graph    ####
###################################

# blue points: 
# red points:
# blue line:
# red line: 
# grey bands: 
# what helps us infer the interaction is significant, looking at this graph? 


###############################################
#### Continuous by continuous interactions ####
###############################################

# Now we'll take another look at that creativity variable.
# Do kids who are more creative make more inferences?
# Does the relationship of creativity on inferences vary by age? That is, is creativity
# more "helpful" for children who are older?

# Center creativity


# Run an additive model


# Result? 

# Run an interactive model


# Result? 

# You can use an effects plot to give you a certain idea of what's going on.


# You can see this yourself by calculating a couple simple effects. Consider our equation
# and the range of our variables.


# regression line for age 10 months below mean?


# 8 months above mean?



###########################################
#### PLOTTING CONTINUOUS BY CONTINUOUS ####
###########################################

# It's ultimately up to you to decide how many regression lines to include on your plot, but the
# general recommendation is 2 (or maybe 3). You want to choose values that are meaningful or display
# something useful about the data as a whole. You also want to make a smart choice about which variable
# to put on the x axis and to represent by the different lines

# For our plot, we'll put age on the x axis, as it has a larger range and is more useful to think about
# continuously. For creativity, we'll plot lines for ±1 SD around the mean of creativity. You might
# consider these lines to represent generally creative and uncreative individuals.

# Uncentered version for plotting:


# We make our model predictions based on these values, and we make them separately! Otherwise our CIs get messed up.


plot2 = ggplot() +
  geom_point()
plot2
# unfortunately, we lose information about an individual's creativity score here. You can sort of fix
# this, but it can be sort of difficult to see meaninfully.
plot2 = ggplot() +
  geom_point()
plot2

# Add both regression lines
plot2 = plot2 + geom_smooth() +
  geom_smooth() +
  theme_bw()
plot2

# Clean up
plot2 = plot2 + coord_cartesian() + # specify the range of the axes
  labs() + # clarify axis name
  theme()
plot2

# The nice thing about setting color this way is the line colors coordinate with the colors of the raw data points.
# However, they are sort of difficult to distinguish super easily. You can play around with the colour_brewer to make
# things clear and aesthetically pleasing.

# * Note: creativity was an entirely random variable (I literally gave everyone a random number based on a normal
# distribution with a mean of 4), but we still found significant effects. Type I errors are real!!