weixin_39636164
weixin_39636164
2020-12-09 12:51

Concerns regarding TE-(in)dependence metric calculation

Summary

I would like to better document and understand how model metrics are computed. I have some concerns about some of the steps- especially how WTS is calculated and how regressions with get_coeffs and computefeats2 are performed.

Additional Detail

  1. Compute optimal combination from T2* and echo data.
  2. Compute un-normalized weights for each component for optimally combined data (WTS) with computefeats2 and normalized mixing matrix (when provided, otherwise regular mixing matrix).
    1. Use signs of skew values for weights (per component) to flip components and weights so that component time series positively align with the data.
      • PROBLEM: The global signal trend for the 5-echo data is consistently being rendered as going down in the ICA step (although it goes up, as expected, in the PCA step). Nothing I've tried, including using betas, z-statistics, mean, and skew, has fixed this, but it makes me suspicious of the sign determination method.
        • NOTE: This can be discussed in #316.
    2. Estimate normalized total variance explained per component from weights.
    3. PROBLEM: While the comments say "un-normalized weights", the function computefeats2 does not return un-normalized weights. It computes the weights using the data after it has been z-scored over time. It then caps the weights at +/- 0.999 and z-transforms them using Fisher's transform as if they were correlation coefficients.
      • SOLUTION: If we're interested in using z-values for each of the components to the optimally combined data, we can run a regression and compute those z-scores (as in this post).
      • NOTE: This can be discussed in #178.
    4. ~~PROBLEM: I'm pretty sure that the normalized mixing matrix, which is only used with PCA, is actually z-scored along the wrong axis.~~ FIXED WITH #228
      • SOLUTION: We can just normalize each component's time series over time (instead of each volume's values over components).
  3. Compute un-normalized weights for each component for demeaned optimally combined data (tsoc_B) with get_coeffs and unnormalized mixing matrix.
    1. Convert weights to percent signal change units.
    2. Estimate total variance explained per component from weights.
    3. PROBLEM: Demeaning data should only impact weights if mixing matrix is not demeaned, which it should be, but sometimes isn't.
      • SOLUTION: Mean-center the mixing matrix/X within get_coeffs and drop the add_const argument (per #179).
  4. Compute un-normalized weights for each component for each echo with get_coeffs.
  5. Compute voxel-wise means for each echo.
  6. Compute alpha map for each component: Squared echo coefficients from 4, summed over time.
    • QUESTION: What is alpha?
  7. Compute component's variance explained: Sum of squared un-normalized weights from 3 (tsoc_B) divided by total variance from 3b.
    • PROBLEM: The variance explained measures calculated within fitmodels_direct do not seem to align well with the variance explained values we get directly from PCA, which makes me doubt fitmodels_direct.
      • NOTE: The values may not be very similar (for example, for PCA the "real" varex might be 37.8%, while the estimated varex is 57.6% and the estimated normalized varex is 53.8%), but the values are highly correlated across components.
      • NOTE: This can be discussed in #317.
  8. Compute component's normalized variance explained: Sum of squared un-normalized weights from 2 (WTS) divided by total variance from 2b.
  9. Compute model-wise pseudo-F-statistics for each component:
    1. Set up design matrices (X) for S0 and T2* models.
      • S0 model: Echo-specific weights follow echo-specific averages
      • T2 model: Weight peaks at T2
    2. Compute model coefficient maps:
      • Sum of design matrix times component echo weights from 4 over time divided by design matrix squared summed over time.
    3. Compute predicted values for models:
      • Design matrix times model coefficient maps.
    4. Compute sum of squared error maps for models:
      • Echo coefficients minus predicted values, squared and summed over time.
    5. Compute pseudo-F-statistics:
      • Alpha minus SSE, multiplied by number of echoes minus one, all divided by SSE.
    6. Cap F-statistics at 500.
    7. PROBLEM: Shouldn't these F-statistics be calculated only for voxels with positive betas? Signs of component time series should be flipped to match the actual data, and I don't think anticorrelations are meaningful in this context. I could definitely be wrong though.
  10. Compute averaging weight map for each component:
    1. Z-score component's un-normalized weights from 2 (WTS) over voxels.
      • PROBLEM: These values are already z-scored within computefeats2. Why would we do this again?
    2. Cap z-scores at +/- 8.
    3. Square z-score maps.
      • PROBLEM: By squaring the maps, we're treating negative weights as if they're equally as valid as positive weights. If the sign determination in 2.1 is accurate, then only positive weights should be used.
        • SOLUTION: Limit weighted averaging to positive values. Set negative values in this map to zero.
        • NOTE: This can be discussed in #318.
  11. Calculate Kappa and Rho metrics by weighted average of pseudo-F-statistics with weight map.
  12. If full metric calculation (full_sel) is desired, calculate additional metrics.
  13. Store several maps for each component in dictionary:
    1. Z-score maps (prior to squaring) originally used for weighted averaging
    2. Percent signal change of optimally combined data maps
    3. Pseudo-F-statistic maps
  14. Perform simple cluster-extent thresholding on pseudo-F-statistic maps using cluster-defining F-value threshold from getfbounds function and cluster size threshold based on number of voxels in mask.
    1. Store binarized maps of significant voxels in F_R2_clmaps and F_S0_clmaps arrays.
    2. Store number of voxels in significant clusters in countsigFR2 and countsigFS0 arrays.
  15. Perform simple cluster-extent thresholding on Z-score maps using cluster-defining threshold of 1.95 (p < 0.05, two-sided) and the same cluster-extent threshold as everything else.
    - NOTE: Cluster-extent thresholding is bidirectional, but mixes adjacent positive and negative values to define clusters. I think we should make it one-sided, based on my concerns regarding the signs of component weights.
  16. Perform simple cluster-extent thresholding on rank-transformed absolute-valued un-normalized weight maps from 3 (tsoc_B) using a cluster-defining threshold of the number of voxels in the mask minus the number of significant voxels from the countsigFR2/countsigFS0 variables and the same cluster-extent threshold as is applied to the F_R2_clmaps/F_S0_clmaps arrays.
    • PROBLEM: This should initially result in the same number of significant voxels in Br_R2_clmaps/Br_S0_clmaps (these arrays) as there are in the corresponding component maps from F_R2_clmaps/F_S0_clmaps, but then after cluster-extent thresholding they would have fewer. Does the difference in number of significant voxels cause any problems?
    • NOTE: Br_R2_clmaps and Br_S0_clmaps should differ in terms of the number of voxels that survive thresholding because of the different cluster-defining thresholds determined by F_R2_clmaps/F_S0_clmaps, but the underlying data (tsoc_B) is exactly the same.

Next Steps

  1. Could everyone who is familiar with the metrics and fitmodels_direct look through the code and my description to see if I have anything wrong and to address any of the issues I've raised?

该提问来源于开源项目:ME-ICA/tedana

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

11条回答

  • weixin_39640395 weixin_39640395 5月前

    This is a great write-up ! I think it would be valuable to take some of these points to the next developers's call to discuss as a group. WDYT, ?

    点赞 评论 复制链接分享
  • weixin_39636164 weixin_39636164 5月前

    Absolutely. Sounds like a plan.

    点赞 评论 复制链接分享
  • weixin_39618169 weixin_39618169 5月前

    There has been very little discussion on this since its posting, and this is a pretty big issue. I'm afraid that others (including myself) look at the size of the issue and immediately start having heart palpitations over how big it is. I would like to propose that we take all of the concerns above, including many which actually have a solution spelled out, and make them into their own issues. This is an excellent write-up, though, so maybe we should think about how to incorporate elements of it into documentation or comments in code.

    点赞 评论 复制链接分享
  • weixin_39636164 weixin_39636164 5月前

    I can take certain concerns/problems and open them as separate issues, but some of them only make sense (at least to me) when considered within the overall metric calculation function.

    点赞 评论 复制链接分享
  • weixin_39618169 weixin_39618169 5月前

    True. Perhaps this is a "hold out until the next hackathon" type of problem. WDYT?

    点赞 评论 复制链接分享
  • weixin_39636164 weixin_39636164 5月前

    I've started opening specific issues and linking them here. I think this issue will still be useful, but we can focus discussion on those smaller issues. But I also agree that actually dealing with all of these issues will most likely have to wait until we have a hackathon.

    点赞 评论 复制链接分享
  • weixin_39618169 weixin_39618169 5月前

    Sounds great to me! Thanks!

    点赞 评论 复制链接分享
  • weixin_39875842 weixin_39875842 5月前

    This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

    点赞 评论 复制链接分享
  • weixin_39618169 weixin_39618169 5月前

    Bot, you're right that this is a little stale, but please keep it open anyway. kthx

    点赞 评论 复制链接分享
  • weixin_39875842 weixin_39875842 5月前

    This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

    点赞 评论 复制链接分享
  • weixin_39636164 weixin_39636164 5月前

    I'm going to close this since there are multiple smaller issues open and this issue is stale.

    点赞 评论 复制链接分享

相关推荐