The stability of feature selection is defined as the robustness of the sets of selected features with respect to small variations in the data on which the feature selection is conducted. To quantify stability, several datasets from the same data generating process can be used. Alternatively, a single dataset can be split into parts by resampling. Either way, all datasets used for feature selection must contain exactly the same features. The feature selection method of interest is applied on all of the datasets and the sets of chosen features are recorded. The stability of the feature selection is assessed based on the sets of chosen features using stability measures.
stabilityYu( features, sim.mat, threshold = 0.9, correction.for.chance = "estimate", N = 10000, impute.na = NULL )
features 


sim.mat 

threshold 

correction.for.chance 

N 

impute.na 

numeric(1)
Stability value.
Let \(O_{ij}\) denote the number of features in \(V_i\) that are not shared with \(V_j\) but that have a highly simlar feature in \(V_j\): $$O_{ij} = \{ x \in (V_i \setminus V_j) : \exists y \in (V_j \backslash V_i) \ with \ Similarity(x,y) \geq threshold \}.$$ Then the stability measure is defined as (see Notation) $$\frac{2}{m(m1)}\sum_{i=1}^{m1} \sum_{j=i+1}^{m} \frac{I(V_i, V_j)  E(I(V_i, V_j))}{\frac{V_i + V_j}{2}  E(I(V_i, V_j))}$$ with $$I(V_i, V_j) = V_i \cap V_j + \frac{O_{ij} + O_{ji}}{2}.$$ Note that this definition slightly differs from its original in order to make it suitable for arbitrary datasets and similarity measures and applicable in situations with \(V_i \neq V_j\).
For the definition of all stability measures in this package,
the following notation is used:
Let \(V_1, \ldots, V_m\) denote the sets of chosen features
for the \(m\) datasets, i.e. features
has length \(m\) and
\(V_i\) is a set which contains the \(i\)th entry of features
.
Furthermore, let \(h_j\) denote the number of sets that contain feature
\(X_j\) so that \(h_j\) is the absolute frequency with which feature \(X_j\)
is chosen.
Analogously, let \(h_{ij}\) denote the number of sets that include both \(X_i\) and \(X_j\).
Also, let \(q = \sum_{j=1}^p h_j = \sum_{i=1}^m V_i\) and \(V = \bigcup_{i=1}^m V_i\).
Yu L, Han Y, Berens ME (2012). “Stable Gene Selection from Microarray Data via Sample Weighting.” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(1), 262272. doi: 10.1109/tcbb.2011.47 .
Zhang M, Zhang L, Zou J, Yao C, Xiao H, Liu Q, Wang J, Wang D, Wang C, Guo Z (2009). “Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes.” Bioinformatics, 25(13), 16621668. doi: 10.1093/bioinformatics/btp295 .
Bommert A (2020). Integration of Feature Selection Stability in Model Fitting. Ph.D. thesis, TU Dortmund University, Germany. doi: 10.17877/DE290R21906 .
feats = list(1:3, 1:4, 1:5) mat = 0.92 ^ abs(outer(1:10, 1:10, "")) stabilityYu(features = feats, sim.mat = mat, N = 1000)#> [1] 0.5223179