This is a great write-up ! I think it would be valuable to take some of these points to the next developers's call to discuss as a group. WDYT, ?

# Concerns regarding TE-(in)dependence metric calculation

### Summary

I would like to better document and understand how model metrics are computed. I have some concerns about some of the steps- especially how `WTS`

is calculated and how regressions with `get_coeffs`

and `computefeats2`

are performed.

### Additional Detail

- Compute optimal combination from T2* and echo data.
- Compute un-normalized weights for each component for optimally combined data (
`WTS`

) with`computefeats2`

and normalized mixing matrix (when provided, otherwise regular mixing matrix).- Use signs of skew values for weights (per component) to flip components and weights so that component time series positively align with the data.
**PROBLEM**: The global signal trend for the 5-echo data is consistently being rendered as going*down*in the ICA step (although it goes up, as expected, in the PCA step). Nothing I've tried, including using betas, z-statistics, mean, and skew, has fixed this, but it makes me suspicious of the sign determination method.**NOTE**: This can be discussed in #316.

- Estimate normalized total variance explained per component from weights.
**PROBLEM**: While the comments say "un-normalized weights", the function`computefeats2`

does not return un-normalized weights. It computes the weights using the data after it has been z-scored over time. It then caps the weights at +/- 0.999 and z-transforms them using Fisher's transform as if they were correlation coefficients.**SOLUTION**: If we're interested in using z-values for each of the components to the optimally combined data, we can run a regression and compute those z-scores (as in this post).**NOTE**: This can be discussed in #178.

- ~~
**PROBLEM**: I'm pretty sure that the normalized mixing matrix, which is only used with PCA, is actually z-scored along the wrong axis.~~ FIXED WITH #228**SOLUTION**: We can just normalize each component's time series over time (instead of each volume's values over components).

- Use signs of skew values for weights (per component) to flip components and weights so that component time series positively align with the data.
- Compute un-normalized weights for each component for
*demeaned*optimally combined data (`tsoc_B`

) with`get_coeffs`

and unnormalized mixing matrix.- Convert weights to percent signal change units.
- Estimate total variance explained per component from weights.
**PROBLEM**: Demeaning data should only impact weights if mixing matrix is not demeaned, which it*should*be, but sometimes isn't.**SOLUTION**: Mean-center the mixing matrix/X*within*`get_coeffs`

and drop the`add_const`

argument (per #179).

- Compute un-normalized weights for each component for each echo with
`get_coeffs`

. - Compute voxel-wise means for each echo.
- Compute alpha map for each component: Squared echo coefficients from 4, summed over time.
- QUESTION: What is alpha?

- Compute component's variance explained: Sum of squared un-normalized weights from 3 (
`tsoc_B`

) divided by total variance from 3b.**PROBLEM**: The variance explained measures calculated within`fitmodels_direct`

do not seem to align well with the variance explained values we get directly from PCA, which makes me doubt`fitmodels_direct`

.**NOTE**: The values may not be very similar (for example, for PCA the "real" varex might be 37.8%, while the estimated varex is 57.6% and the estimated normalized varex is 53.8%), but the values are highly correlated across components.**NOTE**: This can be discussed in #317.

- Compute component's normalized variance explained: Sum of squared un-normalized weights from 2 (
`WTS`

) divided by total variance from 2b. - Compute model-wise pseudo-F-statistics for each component:
- Set up design matrices (
`X`

) for S0 and T2* models.- S0 model: Echo-specific weights follow echo-specific averages
- T2
*model: Weight peaks at T2*

- Compute model coefficient maps:
- Sum of design matrix times component echo weights from 4 over time divided by design matrix squared summed over time.

- Compute predicted values for models:
- Design matrix times model coefficient maps.

- Compute sum of squared error maps for models:
- Echo coefficients minus predicted values, squared and summed over time.

- Compute pseudo-F-statistics:
- Alpha minus SSE, multiplied by number of echoes minus one, all divided by SSE.

- Cap F-statistics at 500.
**PROBLEM**: Shouldn't these F-statistics be calculated only for voxels with positive betas? Signs of component time series should be flipped to match the actual data, and I don't think anticorrelations are meaningful in this context. I could definitely be wrong though.

- Set up design matrices (
- Compute averaging weight map for each component:
- Z-score component's un-normalized weights from 2 (
`WTS`

) over voxels.**PROBLEM**: These values are already z-scored within`computefeats2`

. Why would we do this again?

- Cap z-scores at +/- 8.
- Square z-score maps.
**PROBLEM**: By squaring the maps, we're treating negative weights as if they're equally as valid as positive weights. If the sign determination in 2.1 is accurate, then only positive weights should be used.**SOLUTION**: Limit weighted averaging to positive values. Set negative values in this map to zero.**NOTE**: This can be discussed in #318.

- Z-score component's un-normalized weights from 2 (
- Calculate Kappa and Rho metrics by weighted average of pseudo-F-statistics with weight map.
- If full metric calculation (
`full_sel`

) is desired, calculate additional metrics. - Store several maps for each component in dictionary:
- Z-score maps (prior to squaring) originally used for weighted averaging
- Percent signal change of optimally combined data maps
- Pseudo-F-statistic maps

- Perform simple cluster-extent thresholding on pseudo-F-statistic maps using cluster-defining F-value threshold from
`getfbounds`

function and cluster size threshold based on number of voxels in mask.- Store binarized maps of significant voxels in
`F_R2_clmaps`

and`F_S0_clmaps`

arrays. - Store number of voxels in significant clusters in
`countsigFR2`

and`countsigFS0`

arrays.

- Store binarized maps of significant voxels in
- Perform simple cluster-extent thresholding on Z-score maps using cluster-defining threshold of 1.95 (p < 0.05, two-sided) and the same cluster-extent threshold as everything else.

-**NOTE**: Cluster-extent thresholding is bidirectional, but mixes adjacent positive and negative values to define clusters. I think we should make it one-sided, based on my concerns regarding the signs of component weights. - Perform simple cluster-extent thresholding on rank-transformed absolute-valued un-normalized weight maps from 3 (
`tsoc_B`

) using a cluster-defining threshold of the number of voxels in the mask minus the number of significant voxels from the`countsigFR2`

/`countsigFS0`

variables and the same cluster-extent threshold as is applied to the`F_R2_clmaps`

/`F_S0_clmaps`

arrays.**PROBLEM**: This should initially result in the same number of significant voxels in`Br_R2_clmaps`

/`Br_S0_clmaps`

(these arrays) as there are in the corresponding component maps from`F_R2_clmaps`

/`F_S0_clmaps`

, but then after cluster-extent thresholding they would have fewer. Does the difference in number of significant voxels cause any problems?**NOTE**:`Br_R2_clmaps`

and`Br_S0_clmaps`

should differ in terms of the number of voxels that survive thresholding because of the different cluster-defining thresholds determined by`F_R2_clmaps`

/`F_S0_clmaps`

, but the underlying data (`tsoc_B`

) is exactly the same.

### Next Steps

- Could everyone who is familiar with the metrics and
`fitmodels_direct`

look through the code and my description to see if I have anything wrong and to address any of the issues I've raised?

该提问来源于开源项目：ME-ICA/tedana

- 点赞
- 写回答
- 关注问题
- 收藏
- 复制链接分享
- 邀请回答