Associate predictors (or features) with subtypes; these could be diagnoses or cluster assignments. Will use regression to return data frames intended to visualize the most important relationships between features and types. There are two approaches - worth using both. Can be combined with boostrapping to give distributional visualizations.
featureImportanceForSubtypes(
dataframein,
subtypeLabels,
featureMatrix,
associationType = c("features2subtypes", "subtypes2features", "subjects"),
covariates = "1",
transform = "effect_sizes",
significance_level = 0.001,
visualize = FALSE
)
Input dataframe with all relevant data
Input subtype assignments.
matrix/dataframe defining the data columns as features.
either predictor features from subtypes or predict subtypes from features. will produce related but complementary results. in some cases, depending on subtypes/degrees of freedom, only one will be appropriate. the third option (subjects) reports rownames of the dataframe that best fit the related subtype.
optional string of covariates
optional effect_size
to threshold effects
boolean
dataframes for visualization that show feature to subtype importance e.g. via pheatmap
mydf = generateSubtyperData( 100 )
rbfnames = names(mydf)[grep("Random",names(mydf))]
fimp = featureImportanceForSubtypes( mydf, mydf$DX, mydf[,rbfnames], "subtypes2features" )