############################## #### Psychology 610 #### #### Week 2 #### #### Friday, 15 September #### ############################## ################################## #### This week's 'fun with R' #### ################################## install.packages('RXKCD') library(RXKCD) #this command loads the 'RXKCD' package # What's this package for? # for example... ?getXKCD # getting XKCD comic strips (any XKCD comic!), like this one! getXKCD(552) # hit zoom for a better view of the comic # Can also be found by googling 'XKCD 552' # Explain the statistical issue that this comic raises. # Name four causal pathways that could explain the relationship between taking a statistics class # and understanding that correlation does NOT imply causation. # Read up on this in Hoyle et al. (2002): p. 37 ########################## #### MEAN-ONLY MODELS #### ########################## # Set your working directory, read in the data. # Learn about the variables present in the dataframe. # What do you think these variables represent? # We have a lot of information here, but we're only going to use a little bit. Mean-only models # are models in which we... # This is one step above the null model, where we... # In this experiment, we're curious about the change in people's IAT scores from baseline to week 8. # Create a variable called "IATChange" that represents this change over time. # Now learn a little about this variable. ########################## #### THE NULL MODEL #### ########################## # Specification of the null model # how many parameters? # Step 1: make the best predictions given the information you have # Step 2: compute the total prediction error # Why do we square the error terms? # Sum of squared errors (SSE): # We could also write these multiple lines in one line of code: # Conclusion? ######################### #### THE BASIC MODEL #### ######################### # Specification of the basic (mean-only) model # How many parameters? # Step 1: make the best predictions, given the information you have # Step 2: compute the total prediction error and SSE # Or, in one line... # Is this model better than the null model? ########################## #### MODEL COMPARISON #### ########################## # Is Model A (augmented) significantly better than Model C (compact)? # This question is answered using the F statistic. At its core, F estimates the amount of # error reduction resulting from adding a parameter to our model. It compares the error reduction # against the average prediction error of the augmented model (total prediction error / N). # When this ratio is > approximately 4 (the error reduction is at least 4 times larger than the average # prediction error per participant) we conclude it’s worth adding the parameter. # Underlying logic: Every parameter you add is going to reduce some error in the data # because of math. F calculates whether adding a given parameter is "worth it" in terms of the # additional error you reduce. In order for F to be significant, the parameter you've added # needs to reduce (substantially) more error than adding any random parameter would. # First we are going to calculate this by hand. n = 78 #### FORMULA #### # F = ((SSE0 - SSE1)/(p1-p0))/(SSE1/(n-p1)) # pf function give you the p-value from the F-table that you're familiar with. # lower.tail=FALSE means you want the probability of getting an F-value LARGER # than your Fstat if in fact the population mean is zero. ############################# #### MODELS THE EASY WAY #### ############################# # Let's make a model where we predict scores on our variable of interest (IATChange) from the # mean of that variable (this is the mean-only / basic model). Everyone's score is being predicted by a # constant when using the mean-only model. This is something we need to keep in mind when we # specify our model. # The p-value should be the same as what we calculated above. pf(Fstat, dfN, dfD, lower.tail=FALSE) # Hooray, we rock! ##################################### #### PARAMETER ESTIMATE APPROACH #### ##################################### # We just taught you the model comparison approach. When using this approach, we focus our # attention on the reduction of overall error that results from adding a given parameter (or # parameters) to the model. # A slightly different, but statistically equivalent approach, is the parameter estimate approach. # When using this approach, we focus instead on testing a given parameter against 0. That is, # how confident are we that a given parameter actually has an influence on the data we observe? # In thinking through this, you'll notice that the basic function of what we're doing is the # same regardless of which approach we use, but the explanations we give and the way we talk about # parameters is slightly different. Being able to think using both approaches will provide some # useful insights into your results. # Let's look at our model summary again, this time inspecting the t statistic (as discussed in # class) rather than the F statistic. # Find b0. What does it represent? # Note also the SE of b0. # The t statistic is calculated by taking the difference of the parameter estimate at the null # estimate for that parameter (usually 0) and dividing by the standard error (this is a bit more # straightforward than the F formula, but note that F ~ t^2). # t = (b0 - null B0) / SE(b0) # Remember that t is a measure of ___. This is a correlary of ___, with the similar logic that # a t more extreme than ~ ±1.96 will occur with less than 5% likelihood (i.e., reach conventional # levels of statistical significance). ###################################### #### BRACKET NOTATION and #### #### CALCULATING COMPOSITE SCORES #### Optional in class. If not covered, students should go through ###################################### this section in the TA script on their own time. # You'll notice that we have variables named Concern1-4 in this dataset. This is a scale that measures # concern about discrimination toward Black people. We're going to teach you a few essential functions # using these items. # Using bracket notation to subset: # Make a vector containing the item names # Print just these columns of data using bracket notation # We want to average these items to create a mean concern score. But first we need to check that the scale # is reliable. What does this mean? (Dig back to psych stats in your brain.) # How do we measure it? library(psych) ?alpha # alpha requires a matrix of the data you want to check the reliability of, which in this case means only # those variables comprising the concern scale. You also tell alpha what items to reverse (among other options). # We don't know what items are reversed (if any), so we should include option 'check.keys = T' so alpha reverse- # codes for us. # Note: ONLY use this if you do not know which items are reverse-coded. This would be sloppy practice if you're using # your own scale, in which case you should know these kinds of details about the items # Try it: # Remember from psych stats the number considered the conventional cutoff here? # Pay attention to the "raw_alpha" column in the "Reliability if an item is dropped:" section # of the output. This shows our reliability would improve a bit if we removed the reverse-coded item, # but seeing as we already pass the arbitrary .7 criterion and the difference would be minimal, keep it. # Calculating composite score: # Why not just add the values and divide by 4? # Use varScore instead ?varScore # Needs to know which items are forward-coded and reverse-coded # Needs to know high and low anchor points of scale range # Prorate: Prorating adds in the sum of available values divided by the number # of available values for each missing value (i.e. the mean of available values) # MaxMiss: Maximum acceptable percentage of missing data before total score will # be set to missing. # Aside: One way people sleuth for bad science is by looking at mean scores. If you have a scale of # a certain range with a certain number of items, there are a finite number of possible composite scores. # Here, for example, each person's composite score must end with a multiple of .25 since there are 4 items # and the response scale is whole numbers (if Prorate=F, .33 and .67 would also be possible). So, when # looking at data, sleuths will make an exhaustive list of possible decimal approximations of mean scores # and check the provided data against them. If some values of the composite score shouldn't be mathematically # possible, they'll sound the alarm! Pretty cool, huh? ########################### #### IN-LAB ASSIGNMENT #### ########################### # Note to future TAs: 2016 version has basic introduction to single predictor models. ~MC