Monday, December 17, 2012

Gender equality in science

An interesting post over at BishopBlog takes on the lack of men in Psychology. One of the reasons BishopBlog is a favorite of mine is that you get real data along with interpretations and opinions. Two points in the post strongly resonated with my experience. 

One is the decline of women in science by career stage. A few years ago, the NSF did a major study of women and minorities in science and identified attrition as a key reason for the under-representation of women in science. Following Dr. Bishop's example, a little complementary data (from 2010; these and lots more data are available here): in Neuroscience, the graduate student population was slightly biased toward women (52.7% are female), but the postdoctoral fellow population was biased toward men (only 45.7% were female). Given the fairly large sample sizes (2798 graduate students and 818 postdocs), this difference was highly reliable (chi-square test of independence, p < 0.001). 

The second is the effect of sub-field. I am particularly sensitive to this because I seem to work in two of the most gender-biased sub-fields: computational modeling seems strongly male-dominated, but cognitive neuropsychology seems strongly female-dominated. I couldn't find data for those fields exactly, but the APA membership data that Dr. Bishop mentioned show a huge disparity: women make up only about 25% of the members in Experimental Psychology and Behavioral Neuroscience, close to half in Clinical Neuropsychology, and about 70% in Developmental Psychology.

This issue is certainly complex and there is no simple solution. That said, there are some strategies that we know would help and can be implemented relatively easily. For example, we know that there is bias in the review process (e.g., Peters & Ceci, 1982), so why not make it double-blind? This is already the standard in some fields, but remains generally optional or unavailable in cognitive science and cognitive neuroscience. It is true that reviewers may be able to guess the identity of the author(s) some of the time, but isn't guessing correctly some of the time better than knowing all of the time? This would (partially) level the playing field between genders as well as between junior and senior scientists and should lead to a more fair system.

Peters, D. P., & Ceci, S. J. (1982). Peer-review practices of psychological journals: The fate of published articles, submitted again. Behavioral and Brain Sciences, 5, 187-255.

Friday, December 14, 2012

Lateralization of word and face processing

A few weeks ago I was at the annual meeting of the Psychonomic Society where, among other interesting talks, I heard a great one by Marlene Behrmann about her recent work showing that lateralization of visual word recognition drives lateralization of face recognition. Lateralization of word and face processing are among the most classic findings in cognitive neuroscience: in adults, regions in the inferior temporal lobe in the left hemisphere appear to be specialized for recognizing visual (i.e., printed) words and the same regions in the right hemisphere appear to be specialized for recognizing faces. Marlene and her collaborators (David Plaut, Eva Dundas, Adrian Nestor, and others) have shown that these specializations are linked and that the left hemisphere specialization for words seems to drive the right hemisphere specialization for faces. It's a nice combination of: 
  1. Behavioral experiments showing that lateralization for words develops before lateralization for faces, and that reading ability predicts degree of lateralization for faces (Dundas, Plaut, & Behrmann, 2012).
  2. ERP evidence also showing earlier development of lateralization for words than for faces.
  3. Computational modeling showing how this specialization could emerge without pre-defined modules (Plaut & Behrmann, 2011).
  4. Functional imaging evidence that the lateralization is relative: the right fusiform gyrus is more involved in face processing, but the left is involved also (Nestor, Plaut, & Behrmann, 2011).
It's a beautiful example of how different methods can come together to provide a more complete picture of cognitive and neural function.

Less than one week after I posted this, there is a new paper by Behrmann and Plaut (in press, Cerebral Cortex, doi:10.1093/cercor/bhs390) reporting further evidence, this time from cognitive neuropsychology, that lateralization of face and word processing is relative. They tested a group of individuals with left hemisphere damage and deficits in word recognition ("pure alexia") and a group of individuals with right hemisphere damage and deficits in face recognition ("prosopagnosia"). The individuals with pure alexia exhibited mild but reliable face recognition deficits and the individuals with prosopagnosia exhibited mild but reliable word recognition deficits.

ResearchBlogging.orgDundas EM, Plaut DC, & Behrmann M (2012). The Joint Development of Hemispheric Lateralization for Words and Faces. Journal of Experimental Psychology: General. PMID: 22866684. DOI: 10.1037/a0029503.

Nestor A, Plaut DC, & Behrmann M (2011). Unraveling the distributed neural code of facial identity through spatiotemporal pattern analysis. Proceedings of the National Academy of Sciences, 108(24), 9998-10003 PMID: 21628569

Plaut DC, & Behrmann M (2011). Complementary neural representations for faces and words: a computational exploration. Cognitive Neuropsychology, 28(3-4), 251-275 PMID: 22185237

Monday, November 12, 2012

Complementary taxonomic and thematic semantic systems

I am happy to report that my paper with Kristen Graziano (a Research Assistant in my lab) showing cross-task individual differences in strength of taxonomic vs. thematic semantic relations is in this month's issue of the Journal of Experimental Psychology: General (Mirman & Graziano, 2012a). This paper is part of a cluster of four articles developing the idea that there is a functional and neural dissociation between taxonomic and thematic semantic systems in the human brain.  

First, some definitions: by "taxonomic" relations I mean concepts whose similarity is based on shared features, which is strongly related to shared category membership (for example, dogs and bears share many features, in particular, the cluster of features that categorize them as mammals). By "thematic" relations I mean concepts whose similarity is based on frequent co-occurrence in situations or events (for example, dogs and leashes do not share features and are not members of the same category, but both are frequently involved in the taking-the-dog-for-a-walk event or situation).

Regarding the functional dissociation, I described in an earlier post our finding (Kalenine et al, 2012) that thematic relations are activated faster than taxonomic relations (at least for manipulable artifacts). In this most recent paper we show that the relative degree of activation of taxonomic vs. thematic relations during spoken word comprehension predicts  - at the individual participant level - whether that participant will tend to pick taxonomic or thematic relations in an explicit similarity judgement task. In other words, for some people, taxonomic relations are more salient and for other people thematic relations are more salient, and this difference is consistent across two very different task contexts.

Regarding the neural dissociation, in a voxel-based lesion-symptom mapping study of semantic picture naming errors (i.e., picture naming errors that were semantically related to the target), we found that lesions in the anterior temporal lobe were associated with increased taxonomically-related errors relative to thematically-related errors and lesions in the posterior superior temporal lobe and inferior parietal lobe (a region we refer to as "temporo-parietal cortex" or TPC) were associated with the reverse pattern: increased thematically-related errors relative to taxonomically-related errors (Schwartz et al., 2011). In a follow-up study, we found that individuals with TPC damage showed reduced implicit activation of thematic relations, but not taxonomic relations, during spoken word comprehension (Mirman & Graziano, 2012b).

I think these findings add some important pieces to the puzzle of semantic cognition and we're now working on a theoretical and computational framework for explaining these complementary semantic systems. Kalénine S., Mirman D., Middleton E.L., & Buxbaum L.J. (2012). Temporal dynamics of activation of thematic and functional knowledge during conceptual processing of manipulable artifacts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38 (5), 1274-1295 PMID: 22449134
Mirman D., & Graziano K.M. (2012a). Individual differences in the strength of taxonomic versus thematic relations. Journal of Experimental Psychology: General, 141 (4), 601-609 PMID: 22201413
Mirman D., & Graziano K.M. (2012b). Damage to temporo-parietal cortex decreases incidental activation of thematic relations during spoken word comprehension. Neuropsychologia, 50 (8), 1990-1997 PMID: 22571932
Schwartz M.F., Kimberg D.Y., Walker G.M., Brecher A., Faseyitan O.K., Dell G.S., Mirman D., & Coslett H.B. (2011). Neuroanatomical dissociation for taxonomic and thematic knowledge in the human brain. Proceedings of the National Academy of Sciences of the United States of America, 108 (20), 8520-8524 PMID: 21540329

Monday, October 29, 2012

Embodied cognition: Theoretical claims and theoretical predictions

I'm at the Annual Meeting of the Academy of Aphasia (50th Anniversary!) in San Francisco. I like the Academy meeting because it is smaller than the other meetings that I attend and it brings together an interesting interdisciplinary group of people that are very passionate about the neural basis of language and acquired language disorders. One of the big topics of discussion on the first day of the meeting was embodied cognition, particularly its claim that semantic knowledge is grounded in sensory and motor representations as opposed to amodal representations. Lawrence Barsalou (e.g., Barsalou, 2008) and Friedemann Pulvermuller (e.g., Carota, Moseley, & Pulvermuller, 2012) are among the most active advocates of this view and many, many others have provided interesting and compelling data to support it. Nevertheless, the view remains controversial. Alfonso Caramazza and Bradford Mahon, in particular, have been vocal critics of the embodied view (e.g., Mahon & Caramazza, 2008). 

Embodied cognition is an important concept and many researchers are very actively studying it, both from negative and positive perspectives, so it would be completely hopeless for me to try to summarize all of the evidence in a simple blog post. Instead I want to focus on one very specific issue that I have seen raised on several occasions (including here at the Academy meeting). Many experiments that are taken to support embodied cognition use materials for which the semantics have very clear sensory-motor content. For example, in a study of verb comprehension, the materials might be words such as "kick", "scratch", and "lick" that strongly involve different motor effectors (foot, hand, and mouth) and the prediction is that there should be clearly different patterns of activation in primarily motor control areas of the brain corresponding to those effectors. Setting aside specific controversies regarding those studies, critics of embodied cognition sometimes say something along the lines of "But what about verbs that don't have obvious motor components, such as 'melt' and 'remember'? Those couldn't be embodied in the motor strip!"

I think this question is conflating the general theoretical claim of embodied cognition -- that semantic knowledge is grounded in sensory and motor representations -- and the specific contexts where that general claim makes testable predictions. Because the motor strip is well-characterized and quite consistent across individuals, it is fairly straightforward to predict that verbs which have clear and very different motoric meanings should have very different neural correlates in the motor strip. This does not mean that other verbs are not embodied! Only that those other verbs don't make easily testable predictions. If the neural representation of temperature were well-characterized, we might be able to make clear predictions about verbs like "melt" and "freeze" and "boil". The same goes for abstract nouns, which are often considered to be a challenge for embodied cognition theories because they don't have simple sensory-motor bases. My take is that abstract noun meanings representations are just as embodied as concrete noun meanings, but they have more variable and diffuse representations, so they are harder to study. So, for example, the representation of "freedom" might involve visual representations of the Statue of Liberty for some people and open fields for other people, etc., so it is harder to measure this visual grounding because it is different for different people. Whereas the semantic representation of a concrete concept like "telephone" is going to be much more consistent across people because we all have more or less the same sensory and motor experiences with telephones.

The bottom line is that it is important to distinguish between the broad theoretical claim of embodied cognition, which is meant to apply to all semantic representations, and the subset of cases where this claim makes clear, testable predictions. Extending embodied cognition to the more difficult cases is certainly an important line of work (Barsalou, for example, is actively working on the representation of emotion and emotion words), but the fact that this extension is not yet complete is not, in itself, evidence that the theory is fundamentally flawed. Barsalou, L. (2008). Grounded Cognition Annual Review of Psychology, 59 (1), 617-645 DOI: 10.1146/annurev.psych.59.103006.093639
Carota F, Moseley R, & Pulvermüller F (2012). Body-part-specific representations of semantic noun categories. Journal of Cognitive Neuroscience, 24 (6), 1492-1509 PMID: 22390464
Mahon, B., & Caramazza, A. (2008). A critical look at the embodied cognition hypothesis and a new proposal for grounding conceptual content Journal of Physiology-Paris, 102 (1-3), 59-70 DOI: 10.1016/j.jphysparis.2008.03.004

Monday, October 8, 2012

Two ways that correlation and stepwise regression can give different results

In general, a correlation test is used to test the association between two variables (y and z). However, if there is a third variable (x) that might be related to z or y, it makes sense to use stepwise regression (or partial correlation). There are two quite different situations where the correlation and stepwise regression will produce different results. Here are some examples using made up data.

Saturday, September 22, 2012

The power to see the future is exciting and terrifying

In a recent comment in Nature, Daniel Acuna, Stefano Allesina, and Konrad Kording describe a statistical model for predicting h-index. In case you are not familiar with it, h-index is a citation-based measure of scientific impact. An h-index of n means that you have n publications with at least n citations. I only learned about h-index relatively recently and I think it is a quite elegant measure -- simple to compute, not too biased by a single highly-cited paper or by many low-impact (uncited) papers. Acuna, Allesina, and Kording took publicly available data and developed a model for predicting future h-index based on number of articles, current h-index, years since first publication, number of distinct journals published in, and number of articles in the very top journals in the field (Nature, Science, PNAS, and Neuron). Their model accounted for about 66% of the variance in future h-index among neuroscientists, which I think is pretty impressive. Perhaps the coolest thing about this project is the accompanying website that allows users to predict their own h-index.

Since hiring and tenure decisions are intended to reflect both past accomplishments and expectations of future success, this prediction model is potentially quite useful. Acuna et al. are appropriately circumspect about relying on a single measure for making such important decisions and they are aware that over-reliance on a single metric to produce "gaming" behavior. So the following is not meant as a criticism of their work, but two examples jumped to my mind: (1) Because number of distinct journals is positively associated with future h-index (presumably it is an indicator of breadth of impact), researchers may choose to send their manuscripts to less appropriate journals in order to increase the number of journals in which their work has appeared. Those journals, in turn, would be less able to provide appropriate peer review and the articles would be less visible to the relevant audience, so their impact would actually be lower. (2) The prestige of those top journals already leads them to be targets for falsified data -- Nature, Science, and PNAS are among the leading publishers of retractions (e.g., Liu, 2006). Formalizing and quantifying that prestige factor can only serve to increase the motivation for unethical scientific behavior.

That said, I enjoyed playing around with the simple prediction calculator on their website. I'd be wary if my employer wanted to use this model to evaluate me, but I think it's kind of a fun way to set goals for myself: the website gave me a statistical prediction for how my h-index will increase over the next 10 years, now I'm going to try to beat that prediction. Since h-index is (I think) relatively hard to "game", this seems like a reasonably challenging goal. Acuna, D. E., Allesina, S., & Kording, K. P. (2012). Predicting scientific success. Nature, 489 (7415), 201-202. DOI: 10.1038/489201a
Liu, S. V. (2006). Top Journals’ Top Retraction Rates. Scientific Ethics, 1 (2), 91-93.

Tuesday, September 18, 2012

Aggregating data across trials of different durations

Note: This post is a summary of a more detailed technical report.

In a typical “visual world paradigm” (VWP) eye tracking study a trial ends when the participant responds, which naturally leads to some trials that are shorter than others. So we need to decide when computing fixation proportions at later time points, should terminated trials be included or not? Based on informal discussions with other VWP researchers, I think three approaches are currently in use: (1) for each time bin, include all trials and count post-response frames as non-object fixations (i.e., the participant is done fixating all objects from this trial), (2) include all trials and count post-response frames as target fixations (i.e., if the participant selected the correct object, then consider all subsequent fixations to be on that object; note that, typically, any trials on which the participant made an incorrect response are excluded from analysis), (3) include only trials that are currently on-going and ignore any terminated trials since there is no data for those trials.

The problem with the third approach is that it is a form of selection bias because trials do not terminate at random, so as the time series progresses through the time window, the data move further and further from the complete, unbiased set of trials to a biased subset of only trials that required additional processing time. This bias will operate both between conditions (i.e., more trials from a condition with difficult stimuli than from a condition with easy stimuli) and within conditions (i.e., more of the trials that were difficult than that were easy within a condition). 

Here's an analogy to clarify this selection bias: imagine that we want to evaluate the response rate over time to a drug for a deadly disease. We enroll 100 participants in the trial and administer the drug. At first, only 50% of the participants respond to the drug. As the trial progresses, the non-responders begin to, unfortunately, die. After 6 months, only 75 participants are alive and participating in the trial and the same 50 are responding to the treatment. At this point, is the response rate the same 50% or has it risen to 67%? Would it be accurate to conclude that responsiveness to the treatment increases after 6 months?

Returning to eye-tracking data, the effect of this selection bias is to make differences appear more static. So, for target fixation data, you get the pattern below: considering only on-going makes it look like there is an asymptote difference between conditions, but "padding" the post-response frames with Target fixations correctly captures the processing speed difference. (These data are from a Monte Carlo simulation, so we know that the Target method is correct). 

For competitor fixations, ignoring terminated trials makes the competition effects look longer-lasting, as in the figure on the left. These data come from our recent study of taxonomic and thematic semantic competition, so you can see the selection bias play out in real VWP data. We also randomly dropped 10% and 20% of the data points to show that the effect of ignoring terminated trials is not just a matter of having fewer data points.

Whether post-response data are considered "Target" or "Non-object" fixations does not seem to have biasing effects, though it does affect how the data do look in the same way that probability distribution curves and cumulative distribution curves show the same underlying data but in different ways. More details on all of this are available in our technical report.

Friday, September 7, 2012

More on fixed and random effects: Plotting and interpreting

In a recent post I showed how plotting model fits can help to interpret higher-order polynomial terms. The key comparison there was between a model that did and did not have the higher order fixed effect terms. If you're going to use this strategy, you need to remember that fixed and random effects capture some of the same variance, so if you're going to remove some fixed effects to visualize their effect, you also need to remove the corresponding random effects. In that previous post, those higher-order random effects were not included in the model (more on this in a minute), so I could just talk about the fixed effects. Here's how it would look if I started with a full model that included all fixed and random effects and compared it to just removing the higher-order fixed effects…

Here are the models – they're the same as in the previous post, except that they include the higher-order random effects:
m.full <- lmer(fixS ~ (ot1 + ot2 + ot3 + ot4) * obj * cond + (1 + ot1 + ot2 + ot3 + ot4 | subj) + (1 + ot1 + ot2 + ot3 + ot4 | subj:obj:cond), data = subset(data.ex, obj != "T"), REML = F)
m.exSub <- lmer(fixS ~ (ot1 + ot2 + ot3 + ot4) * obj + (ot1 + ot2 + ot3 + ot4) * cond + (ot1 + ot2) * obj * cond + (1 + ot1 + ot2 + ot3 + ot4 | subj) + (1 + ot1 + ot2 + ot3 + ot4 | subj:obj:cond), data = subset(data.ex, obj != "T"), REML = F)
And here are the model fits (thinner lines represent the model without the higher-order fixed effects):
plot of chunk effect-sizes

Not much of a difference, is there? If we also drop the higher-order random effects (thicker dashed lines in the graph below), then we can again see that the higher-order terms were capturing the Early-vs-Late difference:
m.exSub2 <- lmer(fixS ~ (ot1 + ot2 + ot3 + ot4) * obj + (ot1 + ot2 + ot3 + ot4) * cond + (ot1 + ot2) * obj * cond + (1 + ot1 + ot2 + ot3 + ot4 | subj) + (1 + ot1 + ot2 | subj:obj:cond), data = subset(data.ex, obj != "T"), REML = F)
plot of chunk unnamed-chunk-1

This graphical example shows that random and fixed effects capture some of the same variance and this point is also important for deciding which random effects to include in the model in the first place. This decision is somewhat complicated and I don't think the field has completely settled on an answer (I'll be revisiting the issue in future posts). I generally try to include as many time terms in the random effects as possible before running into convergence errors. An important rule-of-thumb is to always include random effects that correspond to the fixed effects that you plan to interpret. Since random and fixed effects capture some of the same variance, you can get spuriously significant fixed effects if you omit the corresponding random effects. It won't affect the actual fixed effect parameter estimates (since the random effects are constrained to have a mean of 0), but omitting random effects tends to reduce the standard error of the corresponding fixed effect parameter estimates, which makes them look more statistically significant than they should be.

Here's how that looks in the context of the Early-vs-Late example:
coefs.full <-
coefs.full$p <- format.pval(2 * (1 - pnorm(abs(coefs.full[, "t value"]))))
coefs.full[grep("*objU:cond*", rownames(coefs.full), value = T), ]
##                     Estimate Std. Error t value       p
## objU:condEarly     -0.004164    0.01709 -0.2436 0.80751
## ot1:objU:condEarly  0.065878    0.07745  0.8506 0.39497
## ot2:objU:condEarly -0.047568    0.04362 -1.0906 0.27545
## ot3:objU:condEarly -0.156184    0.05181 -3.0145 0.00257
## ot4:objU:condEarly  0.075709    0.03308  2.2888 0.02209
m.ex <- lmer(fixS ~ (ot1 + ot2 + ot3 + ot4) * obj * cond + (1 + ot1 + ot2 + ot3 + ot4 | subj) + (1 + ot1 + ot2 | subj:obj:cond), data = subset(data.ex, obj != "T"), REML = F)
coefs.ex <-
coefs.ex$p <- format.pval(2 * (1 - pnorm(abs(coefs.ex[, "t value"]))))
coefs.ex[grep("*objU:cond*", rownames(coefs.ex), value = T), ]
##                     Estimate Std. Error t value       p
## objU:condEarly     -0.004164    0.01701 -0.2448 0.80664
## ot1:objU:condEarly  0.065878    0.07586  0.8685 0.38514
## ot2:objU:condEarly -0.047568    0.04184 -1.1370 0.25554
## ot3:objU:condEarly -0.156184    0.02327 -6.7119 1.9e-11
## ot4:objU:condEarly  0.075709    0.02327  3.2535 0.00114

As I said, the presence (m.full) vs. absence (m.ex) of the higher-order random effects does not affect the fixed effect parameter estimates, but it does affect their standard errors. Without the cubic and quartic random effects, those fixed effect standard errors are much smaller, which increases their apparent statistical significance. In this example, the cubic and quartic terms' p-values are < 0.05 either way, but you can see that the differences in the t-value were quite large, so it's not hard to imagine an effect that would look significant only without the corresponding random effect.

Wednesday, August 29, 2012

Plotting model fits

We all know that it is important to plot your data and explore the data visually to make sure you understand it. The same is true for your model fits. First, you want to make sure that the model is fitting the data relatively well, without any substantial systematic deviations. This is often evaluated by plotting residual errors, but I like to start with plotting the actual model fit.

Second, and this is particularly important when using orthogonal polynomials, you want to make sure that the statistically significant effects in the model truly correspond to the “interesting” (i.e., meaningful) effects in your data. For example, if your model had a significant effects on higher-order terms like the cubic and quartic, you might want to conclude that this corresponds to a difference between early and late competition. Plotting the model fits with and without that term can help confirm that interpretation.

The first step to plotting model fits is getting those model-predicted values. If you use lmer, these values are stored in the eta slot of the model object. It can be extracted using m@eta, where m is the model object. Let's look at an example based on eye-tracking data from Kalenine, Mirman, Middleton, & Buxbaum (2012).

##       Time           fixS           cond     obj          subj    
##  Min.   : 500   Min.   :0.0000   Late :765   T:510   21     :102  
##  1st Qu.: 700   1st Qu.:0.0625   Early:765   C:510   24     :102  
##  Median : 900   Median :0.1333               U:510   25     :102  
##  Mean   : 900   Mean   :0.2278                       27     :102  
##  3rd Qu.:1100   3rd Qu.:0.3113                       28     :102  
##  Max.   :1300   Max.   :1.0000                       40     :102  
##                                                      (Other):918  
##     timeBin        ot1              ot2               ot3        
##  Min.   : 1   Min.   :-0.396   Min.   :-0.2726   Min.   :-0.450  
##  1st Qu.: 5   1st Qu.:-0.198   1st Qu.:-0.2272   1st Qu.:-0.209  
##  Median : 9   Median : 0.000   Median :-0.0909   Median : 0.000  
##  Mean   : 9   Mean   : 0.000   Mean   : 0.0000   Mean   : 0.000  
##  3rd Qu.:13   3rd Qu.: 0.198   3rd Qu.: 0.1363   3rd Qu.: 0.209  
##  Max.   :17   Max.   : 0.396   Max.   : 0.4543   Max.   : 0.450  
##       ot4         
##  Min.   :-0.3009  
##  1st Qu.:-0.1852  
##  Median :-0.0231  
##  Mean   : 0.0000  
##  3rd Qu.: 0.2392  
##  Max.   : 0.4012  
ggplot(data.ex, aes(Time, fixS, color = obj)) + facet_wrap(~cond) + 
    stat_summary(fun.y = mean, geom = "line", size = 2)
plot of chunk plot-data
I've renamed the conditions "Late" and "Early" based on the timing of their competition effect: looking at fixation proportions for the related Competitor (green lines) relative to the Unrelated distractor, it looks like the “Late” condition had a later competition effect than the “Early” condition. We start by fitting the full model and plotting the model fit. For convenience, we'll make a new data frame that has the modeled observed data and the model fit:
m.ex <- lmer(fixS ~ (ot1 + ot2 + ot3 + ot4) * obj * cond + (1 + ot1 + ot2 + ot3 + ot4 | subj) + (1 + ot1 + ot2 | subj:obj:cond), data = subset(data.ex, obj != "T"), REML = F)
data.ex.fits <- data.frame(subset(data.ex, obj != "T"), GCA_Full = m.ex@eta)
ggplot(data.ex.fits, aes(Time, fixS, color = obj)) + facet_wrap(~cond) + stat_summary( = mean_se, geom = "pointrange", size = 1) + stat_summary(aes(y = GCA_Full), fun.y = mean, geom = "line", size = 2) + labs(x = "Time Since Word Onset (ms)", y = "Fixation Proportion")
plot of chunk fit-full-model
The fit looks pretty good and the model seems to capture the early-vs.-late competition difference, so now we can use the normal approximation to get p-values for the object-by-condition interaction:
coefs.ex <-
coefs.ex$p <- format.pval(2 * (1 - pnorm(abs(coefs.ex[, "t value"]))))
coefs.ex[grep("*objU:cond*", rownames(coefs.ex), value = T), ]
##                     Estimate Std. Error t value       p
## objU:condEarly     -0.004164    0.01701 -0.2448 0.80664
## ot1:objU:condEarly  0.065878    0.07586  0.8685 0.38514
## ot2:objU:condEarly -0.047568    0.04184 -1.1370 0.25554
## ot3:objU:condEarly -0.156184    0.02327 -6.7119 1.9e-11
## ot4:objU:condEarly  0.075709    0.02327  3.2535 0.00114
There are significant object-by-condition interaction effects on the cubic and quartic terms, so that's where competition in the two conditions differed, but does that correspond to the early-vs.-late difference? To answer this question we can fit a model that does not have those cubic and quartic terms and visually compare it to the full model. We'll plot the data with pointrange, the full model with thick lines, and the smaller model with thinner lines.
m.exSub <- lmer(fixS ~ (ot1 + ot2 + ot3 + ot4) * obj + (ot1 + ot2 + ot3 + ot4) * cond + (ot1 + ot2) * obj * cond + (1 + ot1 + ot2 + ot3 + ot4 | subj) + (1 + ot1 + ot2 | subj:obj:cond), data = subset(data.ex, obj != "T"), REML = F)
data.ex.fits$GCA_Sub <- m.exSub@eta
ggplot(data.ex.fits, aes(Time, fixS, color = obj)) + facet_wrap(~cond) + stat_summary( = mean_se, geom = "pointrange", size = 1) + stat_summary(aes(y = GCA_Full), fun.y = mean, geom = "line", size = 2) + stat_summary(aes(y = GCA_Sub), fun.y = mean, geom = "line", size = 1) + labs(x = "Time Since Word Onset (ms)", y = "Fixation Proportion")
plot of chunk sub-model
Well, it sort of looks like the thinner lines have less early-late difference, but it is hard to see. It will be easier if we look directly at the competition effect size (that is, the difference between the competitor and unrelated fixation curves):
ES <- ddply(data.ex.fits, .(subj, Time, cond), summarize, Competition = fixS[obj == "C"] - fixS[obj == "U"], GCA_Full = GCA_Full[obj == "C"] - GCA_Full[obj == "U"], GCA_Sub = GCA_Sub[obj == "C"] - GCA_Sub[obj == "U"])
ES <- rename(ES, c(cond = "Condition"))
ggplot(ES, aes(Time, Competition, color = Condition)) + stat_summary(fun.y = mean, geom = "point", size = 4) + stat_summary(aes(y = GCA_Full), fun.y = mean, geom = "line", size = 2) + labs(x = "Time Since Word Onset (ms)", y = "Competition") + stat_summary(aes(y = GCA_Sub), fun.y = mean, geom = "line", size = 1)
plot of chunk effect-sizes
Now we can clearly see that the full model (thick lines) captures the early-vs.-late difference, but when we remove the cubic and quartic terms (thinner lines), that difference almost completely disappears. So that shows that those higher-order terms really were capturing the timing of the competition effect.

P.S.: For those that care about behind-the-scenes/under-the-hood things, this post was created (mostly) using knitr in RStudio.