Hi @Anna, thanks for your questions!
PPI can be quite confusing depending on what you read as sometimes the concept (i.e., changing functional connectivity between two regions based on a task manipulation) can sometimes lead to different a intuition than the statistics (i.e., testing an interaction if the slope of one region in predicting Y changes as a function of the task indicator variable). I am by no means an expert in this technique, but happy to share my opinions in response to your questions.
- Specifically I wondered if it would be necessary to “zero mean” (this is at least how FSL calls it, it does not mean mean centering but to shift the data that zero is between the min and max value of all data points) the Psychological regressor after the convolution step?
If your variable is normally distributed, then mean centering (i.e., X - np.mean(X)
) will make the center of the data centered at zero. There are lots of different ways to standardize your data based on the median, range etc. I’m honestly not sure how much it really matters. The reason why this is important is that it is much easier to interpret an interaction term if both regressors are centered at zero (though not for binary indicator variables). There is a nice explanation of why you might want to transform your variables in Gelman and Hill, 2006 Chapter 4.
- And further if there is a need to detrend the Physiological regressor before standardizing it? Or is it genereally incorrect to extract the Physiological regressor of a “only preprocessed” but not denoised dataset?
This is a very important step to consider, but there is not a generally agreed upon order of operations. Like all functional connectivity analyses, it is very important to remove artifacts or trends that might be present in both seed regions as these could be correlated and lead to falsely interpret that there is a meaningful relationship between the two regions. For example, imagine if a participant moves their head, which creates a phasic spike in both seed regions. This will dramatically inflate the correlation between these two regions without removing the spike from both target regions. Similarly, if there is a large linear trend in both seed regions that you do not remove, you will spuriously find a relationship based on the common linear trend rather than the faster temporal fluctuations you might be more interested in. Understanding what you want to do conceptually is the most important part as there are a variety of practical ways to do this. I will not enumerate all of the ways to do it, but simply expand upon the example in the notebook. By including motion covariates, scanner spikes, linear trends, and/or high pass filters in your connectivity design matrix, you are essentially identifying the independent variance on your target region (Y) from your reference region (X) controlling for variance in either that may be associated with head motion, spikes, linear drift, or slow frequency signals. You could also do each of these independently beforehand, but it would privilege removing any variance associated with those signals akin to orthogonalization, rather than considering everything simultaneously in a single regression step. Of course, you never know what spurious relationship might be lurking in your data, which is why it’s good to think about it ahead of time, or plot your data for clues.
- And my last question is, why you decided to standardize the Physiological regressor instead of only demeaning it? If I understand correctly FSL only demeans the regressor. I tried both ways on my data and it seemed to make a considerable difference, the standardized version being a lot more powerful.
Again, there is no “correct” way to do this, it really depends on a lot of factors. In general, if one of your variables is scaled considerably larger than the other variables in your model, then that regressor will dominate the residual error. Remember that OLS regression is finding an optimal set of betas that when multiplied by each respective regressor and summed will give you a predicted Y. The residual error is the difference between Y and Predicted Y. If one regressor is on a much larger scale than the rest of the variables, then the OLS estimator will overweight changes in betas for that regressor relative to all of the other regressors. Because I have standarized the head motion regressors and linear trends, and the spikes and task indicator variables are binary, leaving the seed regressor of interest on the original scale will potentially lead to largely ignoring the influence of the covariates. Again, I point you to the Gelman and Hill book (2006) Chapter 4 for a discussion on this topic. Because different imaging software packages scale their brain intensity signals in different ways (e.g., FSL adds 10,000 to all voxels), this step will likely have a differential effect across software packages. In the example on DartBrains, I personally think it makes sense to standardize the seed regressor.
There are lots of great resources on learning more about the nuances of PPI, I highly recommend Jeannette Mumford’s two part video series (Part I, Part II) and reading The Jill O’Reilly’s tools of the trade tutorial in SCAN. Most importantly, as I hope is conveyed throughout the DartBrains course, I think it is best to learn as much as you can about the nuances of regression so that you can make your own informed decisions based on statistics rather than dogmatically following any prescriptive rule. One important last thing to note, is that I largely ignored the whole convolution vs deconvolution debate (Gitelman et al., 2003).
Thanks again for posting your questions on AskPBS.org, these types of questions and discussions they stimulate are very important for everyone doing this type of work.