##########################################
#### Week 3 Lab: Two Parameter Models ####
####   Friday, September 22nd, 2016   ####
##########################################

########################################################
#### Download lab files into your working directory ####
########################################################
library(lmSupport)

# Set your working directory

# We're using the bias dataset from last week


# Check out the dataset

# How could you achieve the same thing as head() using bracket notation?


# some() lives in a package we don't really want to load (for only one function? why?)
# We can also use bracket notation to do the same thing as some(). Let's use google.


###########################################
#### Read page 1 of Lab3_Exercise.docx ####
###########################################

# What's the experiment about? 
# What is the independent variable? 
# What is the dependent variable?

# Prediction: 

# In other words: Does concern score explain variance in week 4 IAT scores?


################################################################
#### What models should we compare to test this hypothesis? ####
################################################################

# DATA = MODEL + ERROR
# Model C: 
# Model A:


###################################
#### Prepare data for analysis ####
###################################

# We have four concern items; need to create average score for each participant
# Which of the items are reverse-coded?


# It would appear as though the reverse-scored item from last week (4) has been adjusted already.


# Why varScore and not rowMeans?


################################################
#### The Compact Model: One Parameter Model ####
################################################
# Fit a one-parameter model
# What is our one parameter? 


# We can ask for the values of y that are predicted by our model

# What is the number we're predicting for everyone?

# We can ask for the residuals


# See that the predicted values (our model) plus the error equals 
# the data themselves!

# And we can look at the coefficients, or parameter estimates themselves


# If we want to ask questions about model fit, we need to calculate SSE:


# Does this value alone tell us if the model fits the data well?


##################################################
#### The Augmented Model: Two Parameter Model ####
##################################################
# Fit a two-parameter model
# What is our second parameter? 


# We can ask for the values of y that are predicted by our model


# We can also ask for the errors


# And we can ask for just the coefficients, or parameter estimates themselves


# Week4IAT =

# If we want to ask questions about model fit, we need to calculate SSE:


# Model Comparison


# What does the p-value tell us? (generally, not this specific p)
# p-Value: 


modelSummary(mA, t=FALSE) # How is this related to what we just did?
# Remember: R is giving us the results of two different model comparisons.
# Each line in the summary is associated with a different set of comparisons.
# What is the interpretation of the second coefficient, b1? What two models are being compared?

# What is the interpretation of the first coefficient, b0? What two models are being compared?


# Coefficients from Model C
# Why is b0 different between Model A and Model C?


# Confidence Intervals


# Effect Sizes
# Calculate PRE/ partial eta squared for b1


# Or just use modelEffectSizes() to get partial eta-squared

# What does partial eta squared represent?


#####################################
#### Graphing our Data and Model ####
#####################################
# You will encounter us using the terms "quick and dirty" plot and "publication quality" plot.
# Generally speaking, when we use the latter, we mean ggplot. Quick and dirty plots are more
# for your understanding and ability to look at the data visually.

# Based on last week:


# Generating regression line with CI bands using the effects package
library(effects)  #used for quick and dirty view of models.  VERY IMPORTANT.


# Why are CI bands not linear?

# But are these the error bands we want? 

# Publication quality graphing

# Load ggplot2
library(ggplot2)
# From Help: ggplot() is typically used to construct a plot incrementally, 
# using the + operator to add layers to the existing ggplot object. This is 
# advantageous in that the code is explicit about which layers are added and 
# the order in which they are added. For complex graphics with multiple layers, 
# initialization with ggplot is recommended.

# Generating predicted data (necessary for confidence interval bands)
# creating data frame for predictor values, first two numbers are range of predictor

# A dataframe containing just one variable, ConcernM, representing many of the possible
# values of ConcernM.

# use modelPredictions() to get standard error of Y-hats

# What did this add?

# Graph a scatterplot of the data with ConcernM on the x-axis and Wk4IAT on the y-axis


# now we add laters to this plot as we go:


# Now let's add a layer to "plot" that will graph the regression line.
# Can we just use the default parameters? 

# Finally, let's do just a couple things to make it look nice:


# Everything at once (combining previous code into single plot)


# If you were prepping this graph for publication, what else would you want to do?


# We will learn how to do all these things later!


######################
#### Lab exercise ####
######################

# Do numbers 1-7 in small groups. Then stop, and we'll reconvene to do graphing.


#####################
#### Extra Stuff ####
#####################

# Calculate the SD of the errors
SE = sqrt(SSEA/dfD)
SE # Residual standard error (or standard error of the estimate)

# Correlation: when you want the relationship between two quantitative variables
cor(d$Wk4IAT,d$ConcernM)      # r - an effect size indicator (= the square root of partial eta squared)
cor.test(d$Wk4IAT,d$ConcernM)  # r has a p-value