###################################################################################################
#### Week 13 Lab: ANCOVA, long to wide, more repeated measures ####################################
###################################################################################################

# We'll start by taking a different approach to one of last week's data sets, dealing with startle.

# Then we'll look at within-subject conditions with more than 2 levels, converting them from long to wide as well.

########################################################
#### ANCOVA ############################################
########################################################
rm(list=ls())   #clear all objects from workspace

# read in the data (same as last week):
d = dfReadDat('Lab13_DemoData1.dat')

# What's going on here again?
varDescribe(d)
# Participants are in one of three conditions: alcohol, placebo, or control. They rate the anxiety they
# feel after shocks that are / are not predictable.

# Let's imagine the researchers consider the "predictable" shocks to be a good symbol of baseline anxiety
# in response to electric shocks. There also may be only a moderate correlation between anxiety in the
# predictable and unpredictable condition.

# What is a difference approach we could use to anlayze the data?
# ANCOVA!

# ANCOVA is an acronym for analysis of covariance.  This is a term that was coined in the ANOVA
# literature before the GLM became more popular in psychology.  ANCOVA is just a fancy term for
# a model in which we're also controlling for a variable (i.e., where we're not allowing a control 
# variable - a covariate -  to contribute to the effect of the focal predictor[s]).

# What is the main benefit of using ANCOVA over a difference score?
# increased power

# How does it achieve this benefit?
# It doesn't assume a correlation of 1 between pre-test and post-test (without intervention)

# What is the correlation between predictable and unpredictable scores?
cor(d$Predictable, d$Unpredictable)
# .87. That's pretty high. Maybe we should check the correlation in the control group

control = d[d$BG == 'CON',]
cor(control$Predictable, control$Unpredictable)
# .76. ok so actually a little lower.

# Let's prep our BG contrasts (copied from last week)
d$BG = factor(d$BG, levels = c('CON', 'PLA', 'ALC'))
contrasts(d$BG) = varContrasts(d$BG, Type='POC', POCList = list(c(-1,-1,2),c(-1,1,0)),Labels = c('A_CP', 'P_C')) 
# but let's use varRegressors to get effect sizes
d = varRegressors(d, 'BG', c('A_CP','P_C'))

# Let's do an ANCOVA
mCV = lm(Unpredictable ~ A_CP + P_C + Predictable, data=d)
modelSummary(mCV)
modelEffectSizes(mCV)

# What's going on here?
# Unsurprisingly, predictable shock anxiety is a very strong predictor of unpredictable shock anxiety.
# When controlling for predictable shock, our first contrasts continues to have a significant effect
# on unpredictable scores, indicating alcohol participants have lower anxiety in unpredictable shocks
# compared to others based on what we'd expect from their predictable shock anxiety alone.

# This is a complex sentence, but it's important all the elements are there. When we see a significant
# effect in an ANCOVA model, we must indicate this effect has an influence over and above the pre-test,
# whatever that may be.

# Say I measure how much sugar someone eats in one day, then randomly assign them to either receive an
# intervention discussing the health risks of sugar or not, then measure their sugar intake the
# following day. If the intervention has an effect, the effect occurs over and above the predictive
# power of the previous day's sugar intake.

# Why is ANCOVA still probably NOT the correct way to analyze the data from this experiment?
# We didn't assign the participants to beverage group after collecting the predictable trails. To
# do ANCOVA, pre-test scores should be collected before experimental assignment, and group should
# not be correlated with the pretest.
varDescribeBy(d$Predictable, d$BG)
# ...but it appears as though there is some kind of relationship.


##########################################################################
#### LONG TO WIDE ########################################################
##########################################################################
rm(list=ls())   #clear all objects from workspace
# We're going to return to an example we've done before, with a prejudice reduction intervention and IAT scores.
# Participants are assigned to one of two conditions, and their "implicit bias" is measured at three timepoints:
# right after the intervention, 4 weeks later, and 8 weeks later.

dL <- dfReadDat("Lab13_DemoData2.dat", SubID=NULL)
varDescribe(dL)

# Are these data in long or wide format?
# Long!

# How do you know?
# Every participant has 3 lines in the data frame, one for each observation.
# In order to do the analysis we'll learn today, they must be in wide format.

?reshape
# We need to give this function a dataframe and some information about what kinds of variables
# we have in the data.

dW = reshape(dL, timevar = 'time', idvar = c('SubID','condition'), direction='wide')
# timevar: the IV/predictor that varies within participants
# idvar: the things that vary between participants
# direction: are we going long to wide or wide to long?

varDescribe(dW)

# What do the startle columns represent now?
# The IAT score for a given participant at one of the three time points

# The only thing that's still unclear is which column is which. R names the columns based on the
# levels of the timevar we give it, which is why it's just numbers here. We could rename the values
# before transforming, or just rename them now.
dW = varRename(dW, c('iat.1','iat.2','iat.3'), c('W0iat','W4iat','W8iat'))
varDescribe(dW)


######################################################################################
#### 2 (between) x 3+ (within) designs ###############################################
######################################################################################

# Now we can work with these data to assess the effects of the prejudice reduction intervention over time.
# To justify this analysis, there would have to be a prediction that the effect of the intervention is
# somehow time-varying, maybe growing over time.

# How do we test the main effect of condition?
# Average IAT scores across time points and regress them on condition.

dW$IATM = rowMeans(dW[,c('W0iat','W4iat','W8iat')])
m2 = lm(IATM ~ condition, data=dW)
modelSummary(m2)
modelEffectSizes(m2)
# remember, intervention is coded 0 (control) and 1 (intervention)

# Conclusion?
# Averaging across time points, there is no effect of the intervention.


# Now we want to see whether the effect of the intervention varies based on time.
# The problem is we can't use "contrasts" with the outcome since our outcome technically involves 3 different
# levels, so instead we'll use math!

# Let's say we first want to see whether the impact is larger in week 8 than it is during the earlier time
# points overall. How to do this with math?
dW$W8_W40 = dW$W8iat - (dW$W0iat + dW$W4iat)/2
m8_40 = lm(W8_W40 ~ condition, data=dW)
modelSummary(m8_40)

# Let's break this down a little:
# b0: how much an individuals week 8 scores differ from their pooled week 0 and 4 scores, on average
# b1: how much this effect of time on iat scores differs by condition. It is non-significant.

# *NOTE*: technically, our outcome "contrast" isn't unit-weight. So why is it we can interpret the 
# parameter as representing specifically the difference between the groups? Essentially, unit weighting matters
# on the predictor side but not on the outcome side. We can show this using math if you're really curious.

# In this example, it makes more sense to do two pairwise comparisons instead: week 8 to week 4 and week
# 4 to week 0. Let's do both (and we'll practice doing so without creating new variables)
m8_4 = lm(W8iat - W4iat ~ condition, data=dW)
modelSummary(m8_4)

# conclusion?
# the effect of the time interval between 4 and 8 weeks on IAT scores doesn't differ by condition.

m4_0 = lm(W4iat - W0iat ~ condition, data=dW)
modelSummary(m4_0)

# conclusion?
# the effect of the time interval between 0 and 4 weeks on IAT scores differs by condition: the IAT
# scores of those in the intervention condition change more greatly.

m8_0 = lm(W8iat - W0iat ~ condition, data=dW)
modelSummary(m8_0)
# conclusion?
# there is some marginal indication that the group effect differs between weeks 0 and 8


# How do we get the simple effects of condition at each time point?
# easy, just regress the IAT score at that time point on condition!

mW0 = lm(W0iat ~ condition, data=dW)
modelSummary(mW0)

mW4 = lm(W4iat ~ condition, data=dW)
modelSummary(mW4)

mW8 = lm(W8iat ~ condition, data=dW)
modelSummary(mW8)

# conclusion?
# the only time the intervention condition differs from control is four weeks after the study,
# though at 8 weeks the direction of the effect is maintained.


#### CORRECTING FOR MULTIPLE COMPARISONS ####
# We're going to use Holm-Bonferroni. Even though we're using a small enough number to do Fisher's,
# the means of doing it in this situation is quite complex and, because of how the F stat is calculated
# in this specific situation, possibly more conservative than doing holm-bonferroni.
# Let's do it for our pairwise comparisons, and bear in mind we care about the interaction term.

# make a vector to include our p values
ps = c(0,0,0)

ps[1] = modelSummary(m8_4)$coefficients[2,4]  
ps[2] = modelSummary(m4_0)$coefficients[2,4]  
ps[3] = modelSummary(m8_0)$coefficients[2,4]  

ps = sort(ps)
ps

p.adjust(ps, "holm")
# More viewable
round(p.adjust(ps, "holm"), digits = 5)
# Compare to the old values:
round(ps, digits=5)


####################################################################
#### GRAPHING ######################################################
####################################################################

dW$conditionS = as.factor(varRecode(dW$condition, c(0,1), c('Control','Intervention')))
mW0 = lm(W0iat ~ conditionS, data=dW)

mW4 = lm(W4iat ~ conditionS, data=dW)

mW8 = lm(W8iat ~ conditionS, data=dW)

pX0 = expand.grid(conditionS = c('Control','Intervention'), time = 1)
pX4 = expand.grid(conditionS = c('Control','Intervention'), time = 2)
pX8 = expand.grid(conditionS = c('Control','Intervention'), time = 3)

pY0 = modelPredictions(mW0, pX0)
pY4 = modelPredictions(mW4, pX4)
pY8 = modelPredictions(mW8, pX8)

pYs = rbind(pY0, pY4)
pYs = rbind(pYs, pY8)

library(ggplot2)
library(cowplot)
bp1 = ggplot(data=pYs, aes(x=as.factor(time), y = Predicted, fill=conditionS)) +
  geom_bar(mapping = aes(), stat='identity', width=.5, data=pYs, position_dodge()) +
  geom_errorbar(width=.25, aes(ymin = CILo, ymax = CIHi), stat='identity', data=pYs, position_dodge(.5)) +
  labs(x = 'Observation', y = 'IAT score', fill = 'Condition') +
  coord_cartesian(ylim = c(.028,.6), expand=T)
bp1