########################################################
####  Lab 12: Categorical 2 X 3 / 3 X 2 designs     ####
####       and Intro to Repeated Measures:          ####
####                   1 Dec 17                     ####
########################################################


# OVERVIEW OF TODAY #

# 1. Contrast and Dummy coding in a 2 (Between-subjects) X 3 (Between-subjects) design
# 2. Repeated measures intro with a 2 (WITHIN-subjects) X 3 (Between-subjects) design

library (lmSupport)
library(ggplot2)
library (car)


# 1. Contrast and Dummy coding in a 2 (Between-subjects) X 3 (Between-subjects) design

# For our first example: 

# A group of educational psychologists devises an intervention that they believe will help 
# boost the performance of students in a large University class. The educational psychologists 
# also wonder if this intervention would be more effective for minority students in the class,
# in effect reducing the achievement gap.
# The codebook for this dataset is given below.

d <- dfReadDat("Lab12_DemoData1.dat")

# Codebook for "Lab12_DemoData1.dat" 

# Columns	    Variable 	                      Description	Values

# 1	id	      Student ID	                      1 - 120
# 2	cond	    Experimental condition	         "control", "intervention"
# 3	race	    Student race	                   "black", "hispanic", "white",
# 4	perf	    Student's class performance	      0 - 100 (DV)

# Get a sense of what's going on
str(d)
head(d)

# For sake of time, we will assume the data is clean and that model assumptions are all met 

# Let's see if we can get a rough sense of how things are behaving based on the means
varDescribe(d)
varDescribe(d$perf)
varDescribeBy(d$perf, d$condition) # intervention scored higher, descriptively


# So let's test some hypotheses

# Hypothesis 1: There will be an overall effect of Condition such that performance is better in the intervention
# than control groups averaged across all races

# Hypothesis 2. There is a Race x Condition interaction such that the intervention has a different effect among White students
# than among non-White (minority) students  (i.e., Race moderates the Condition effect)


# We want to translate our hypotheses into contrasts for both condition and race. 
# We can actually create our contrasts in a way that will allow us to test both hypotheses! 

# first see what data type condition is
class(d$condition)
# It is already a factor 

# Sometimes your data will already read in as factors and sometimes it will not. It is generally good practice 
# to set your factors yourself. If for no other reason, this will allow you to order the levels in the way you want

d$condition = factor(d$condition, levels = c('control', 'intervention'))

# Up until now, you have mostly been manually centering your variables. 
# However, we know from last lab that we can use varContrasts to set contrasts for this factor
contrasts(d$condition) <- varContrasts(d$condition, 
                                       Type = "POC",
                                       POCList = list(c(-.5, .5)),
                                       Labels = "Int_ConC")
# we can always see these labels by using contrasts 
contrasts(d$condition)

# Note: we did not change the name of the factor but it is now centered. If you think you might forget if a variable is 
# centered or not, you can always change the name of it or make a new variable with the regressors coded. 


## Create Race Contrasts ##
d$race = factor(d$race, levels = c('black', 'hispanic', 'white'))

levels(d$race)
contrasts(d$race) <- varContrasts(d$race, #The factor
                                  Type = "POC", #type of contrast
                                  POCList = list(c(-1, -1, 2), c(1, -1, 0)), # provide the contrasts (in order!)
                                  Labels = c("White_Minority", "Black_Hispanic")) # Name the contrasts

# We can now see these contrasts 
contrasts(d$race)

# Run the interactive model.
m1 <- lm(perf ~ race*condition, data=d)
modelSummary(m1)
modelEffectSizes(m1)

#But what if we want to get the effect sizes for the individual contrasts? 

d <- varRegressors(d, "race", c("WvM", "BvH"))

# Then could have used this code:
m2 <- lm(perf ~ (WvM + BvH)*condition, data=d)
modelSummary(m2)
modelEffectSizes(m2) # now we get effect sizes separated by contrast.


#### Interpret the results ####
# Averaging across races, performance in the Intervention condition was 17.7 points 
# higher than in the Control condition, t(114) = 7.32, p < .01, pn^2 = .24. This result is consistent 
# with our hypothesis that there is, on average, an effect of the intervention 

# Our data support the hypothesis that the intervention has a different effect among white students 
# than among non-White students The effect of the intervention was 21.75 less among white students 
# than non-White children, t(114) = -4.1, p < .05, pn^2 = .13.


# Now say you submit this study for peer review and Reviewer 1 asks if the effect of the intervention
# differs for Black vs. Hispanic individuals. Then Reviewer 2 asks if the effect of the intervention differs 
# for Black vs. White individuals. THEN Reviewer 3 asks if the effect of the intervention differs for Hispanic 
# vs White individuals. You realize that yourself or others could have reasonably had planned hypotheses that line up 
# with these tests. How can you test all these contrasts? 

# Dummy codes! 

# However if you were going to plan these tests you would need to do what? 

# Correct for multiple comparisons! 

# How could you do this correction? 

# You can use Fisher's LSD! 

# We need to see if race by condition interaction is significant
# As we learned last week, we can use the Anova function 
m3 <- lm(perf ~ race * condition, data = d)
Anova(m3, type=3) 

# Do we satisfy the condition of Fisher LSD? 
# Yep! 

### Interaction of condition and white v black and white v hispanic ###
levels(d$race)
contrasts(d$race) <- varContrasts(d$race, 
                                  Type = "Dummy",
                                  RefLevel = 3, Labels= c('B_W', 'H_W'))

# Fit a model that tests the reviewers' questions; White students
# are the reference group

# We'll want the individual contrasts variables in our data frame to get
# an effect size for each of them, so code them out
d <- varRegressors(d, "race", c("bvw", "hvw"))
some(d)


m4 <- lm(perf ~ (bvw + hvw) * condition, data = d)
modelSummary(m4)
modelEffectSizes(m4)

# Remember this gives you the same estimates as 

m5 <- lm(perf ~ race * condition, data = d)
modelSummary(m5)
modelEffectSizes(m5)

# But this one does not allow you to get the individual effect sizes 

#Interpret the relevant coefficients

# wvb:condition: The effect of the intervention was 17.70 units greater for
# Black students than for White students

# wvh:condition: The effect of the intervention was 24.65 units greater for
# Hispanic students than for White students

# To get the third test, we have to make a different racial group the reference group
contrasts(d$race) <- varContrasts(d$race,
                                  Type = "Dummy",
                                  RefLevel = 2, Labels=c('B_H', 'W_H'))
d <- varRegressors(d, "race", c("bvh", "wvh"))

# Obtain the third test of whether the effect of the intervention varies across the different
# pairwise comparisons of racial groups
m6 <- lm(perf ~ (bvh + wvh) * condition, data = d)
modelSummary(m6)
modelEffectSizes(m6)

# bvh:condition: The effect of the intervention was 6.95 units more negative for Black students
# than for Hispanic students, but this difference was not different from 0

# wvh:condition: same as above!

# Note: Alternatively, you could have skipped testing the main effect and used the Holm-Bonferroni approach
# by correcting the p values of the tests (see last week's demo)


#### Bar plot of the main effect of race ####

#windows()

mod <- lm(perf ~ race*condition, data = d)
 
predictorX <- expand.grid(race = levels(d$race), condition = levels(d$condition))
predictorX

predictedY <- modelPredictions(mod, predictorX)
predictedY

library(cowplot)
barplot <- ggplot(data=d, aes(x = race, y = perf, fill=condition)) +
   geom_bar(mapping = aes(y=Predicted), data = predictedY, stat = "identity",width = 0.5, position_dodge()) +
   #geom_point(colour='darkgrey', position = position_jitter(w = .1, h=0)) + 
   geom_errorbar(data = predictedY, width=.25, aes(y = Predicted, ymin = CILo, ymax = CIHi), stat="identity",
                 position_dodge(.5)) + 
   labs(y = 'Performance', x = 'Race', fill='Condition') +
   coord_cartesian(ylim = c(15,105), expand=T)
barplot


#### 2. Repeated measures intro with a 2 (WITHIN-subjects) X 3 (Between-subjects) design ####

# Note: The repeated measures data we will work with today will already be in wide format: 
# a participant's repeated responses are in a single row, and each response is in a separate column.

# Some data, especially repeated measures data, will often start in long format: 
# each row is one time point per participant. So each participant will have data in multiple rows. 
# Any variables that don't change across time will have the same value in all the rows.

#  Next week we will show you how to go from wide format to long format in R. 

rm(list=ls())   #clear all objects from workspace

d <- dfReadDat("Lab12_DemoData2.dat")

# For your intro to repeated measures, we will use real (modified) data from a study in John and Daniel's 
# lab. Evidence suggests that the pharmacological properties of alcohol reduces anxiety about 
# unpredictably bad events more so than anxiety about predictably bad events. In this experiment, 
# participants were divided into three groups and given an alcohol beverage, a placebo beverage 
# (deceptively told alcohol but only got an alcohol flavored juice drink), or a control beverage (truthfully 
# told no alcohol and actually got a regularly flavored juice drink). All participants were hooked up to 
# electrodes and given mild electric shocks whenever they saw cues come up on a computer screen. 
# Some cues signaled predictable shock (shock would be of a specified intensity that was previewed to 
# the participants before the study started) and some cues signaled unpredictable shock (shock intensity 
# would be of some unknown level). Cue types were counterbalanced. At the end of the experiment, 
# participants rated how anxious they were (scale of 0-5) after seeing each type of cue. 


#H ypothesis: 
# 1. Alcohol will significantly reduce anxiety about unpredictable shock. 
# 2. Furthermore, the effect of alcohol on self-reported anxiety to unpredictable shock is pharmacological,
# so there should not be an expectancy effect from the participants thinking they drank alcohol (placebo group).  


# Codebook for "Lab12_DemoData2.dat" 

# Columns	                   Variable 	                                Description	Values

# 1	Unpredictable	  Self-reported anxiety to Unpredictable shock	       0 - 4.75
# 2	Predictable	    Self-reported anxiety to Predictable shock	         0 - 4.25
# 3	BG	            Participant's beverage group assignment	             Control (no alcohol), Alcohool, Placebo      


# Get a sense of what's going on
str(d)
head(d)
table(d$BG)
varDescribe(d)


# Again, for sake of time, we will assume the data is clean and that model assumptions are all met 

# Hypothesis: Alcohol but not placebo drinks will reduce self-reported anxiety to unpredictable shock. 

# First let's make sure BG is a factor
class(d$BG)

# yep

# What are its levels?

levels(d$BG)
d$BG = factor(d$BG, levels=c('CON', 'PLA', 'ALC')) #but we can reorder the levels how we like 
levels(d$BG)

# What set of orthoginal contrasts can we make to test this hypothesis? 

contrasts(d$BG) = varContrasts(d$BG, Type='POC', POCList = list(c(-1,-1,2),c(-1,1,0)),Labels = c('A_CP', 'P_C')) 

# Our questions are about the effect of alcohol on unpredictable shock vs predictable shock, but 
# there may be some designs where you first would want to test for a general, "main effect" of your focal variable 
# (in this case alcohol) on your dependent variable (in this case self reported anxiety). 
# To do that, we can first create a new variable that averages the two repeated measures together. 

d$UnPreAverage = (d$Unpredictable + d$Predictable)/2

m7 <- lm(UnPreAverage ~ BG, data=d)
modelSummary(m7)
modelEffectSizes(m7)

# Interpret the intercept 

# Averaging across shock types and beverage groups, self-reported anxiety was 2.4 and this is different than 0. 
# Not important for our hypotheses 

# Interpret b1 

# Averaging across shock types, self reported anxiety was .4 lower for alcohol versus the mean of the control and 
# placebo beverage groups and this was a significant difference. 
# Not important for our hypotheses (which have to do with predictability) 

# Interpret b2 

# Averaging across shock types, self reported anxiety was .1 higher for placebo versus the control 
# groups but this was not a significant difference. 
# Also not important for our hypotheses 

# To test our hypotheses, we will need to make a difference score to remove the "dependence problem" 
# of the self-reported anxiety data. In other words, if our hypotheses is about an effect of alcohol 
# on self reported anxiety and a moderating effect of type of shock (i.e., a greater effect of alcohol 
# on self reported anxiety to unpredictable vs predictable shock) we need to create a new variable that 
# is a difference score for unpredictable vs predictable shock 

varDescribe(d)
#It may make inerpretation easier to subtract the smaller score from the larger score 

d$UnPreDifference = d$Unpredictable - d$Predictable

m8 <- lm(UnPreDifference ~ BG, data=d) 
modelSummary(m8)
modelEffectSizes(m8)

m9 <- lm(Unpredictable - Predictable ~ BG, data=d) #Note: you could also just do the math within the lm function
# and that would give you the same result 

# What if we wanted to get effect size estimates for each contrast?

# We need to code regressors 

d <- varRegressors(d, "BG", c("AvCP", "PvC"))

m10 <- lm(UnPreDifference ~ (AvCP+PvC), data=d) 
modelSummary(m10)
modelEffectSizes(m10)

# Interpret the intercept 

# Averaging across beverage groups, self-reported anxiety was .38 higher
# for unpredictable vs predictable shock
# STILL not important for our hypotheses 

# Interpret b1 

# Self reported anxiety is .16 lower for unpredictable vs predictable shock 
# in the alcohol beverage group compared to in the average of the placebo and control groups 

#Interpret b2 

# Self reported anxiety is .10 lower for unpredictable vs predictable shock 
# in the placebo beverage group compared to in the control group but this is not a signficant difference 


# So how do we test the simple effects of the alcohol vs control/placebo contrasts for each type of shock? 

# We run seperate models for each shock type! 
m11 <- lm(Unpredictable ~ (AvCP+PvC), data=d) 
modelSummary(m11)
modelEffectSizes(m11)

# Interpret b1 

# Self reported anxiety to Unpredictable shock is .5 lower in the alcohol beverage group compared to in the average of the 
# placebo and control groups 

# Interpret b2 

# Self reported anxiety to Unpredictable shock is .11 higher in the placebo beverage group compared to in the control group 
# but this is not a signficant difference 

m12 <- lm(Predictable ~ (AvCP+PvC), data=d) 
modelSummary(m12)
modelEffectSizes(m12)

# Interpret b1 

# Self reported anxiety to Predictable shock is .4 lower in the alcohol beverage group compared to in the average of the 
# placebo and control groups 

# Interpret b2 

# Self reported anxiety to Predictable shock is .2 higher in the placebo beverage group compared to in the control group 
# but this is not a signficant difference 

# We will give you ggplot code to graph repeated measures data next week! 

# Final question to think about as you go on to your homework. What if our focal variable was unpredictable vs predictable shock 
# and we only wanted to test for signficant differences in unpredictable vs predictable shock anxiety within each of the
# three beverage groups? What coding scheme would we use?

# Dummy coding. We could set each beverage group as the reference group and test the intercept for each model