This is the training module for subtype definition based on a matrix. Currently only supports clustering based on gaussian mixtures via the ClusterR package. One may want to use a specific sub-group for this, e.g. patients.
trainSubtypeClusterMulti(
mxdfin,
measureColumns,
method = "kmeans",
desiredk,
maxk,
groupVariable,
group,
frobNormThresh = 0.01,
trainTestRatio = 0,
distance_metric = NULL,
flexweights = NULL,
flexgroup = NULL,
groupFun = NULL
)
Input data frame
vector defining the data columns to be used for clustering. Note that these methods may be sensitive to scaling so the user may want to scale columns accordingly.
string GMM or kmeans or medoid
number of subtypes
maximum number of subtypes
names of the column that defines the group to use for training.
string defining a subgroup on which to train
fractional value less than 1 indicating the amount of change in the reconstruction error (measured by frobenius norm) from the previous iteration 1 - F_cur / F_prev that will determine the optimal number of clusters. For GMM clustering.
Training testing split for finding optimal number of clusters. For GMM clustering. If zero, then will not split data. Otherwise, will compute reconstruction error in test data only.
see medoid methods in ClusterR
optional weights
optional group
optional function name to use in group-guided clustering e.g. minSumClusters
the clustering object
mydf = generateSubtyperData( 100 )
rbfnames = names(mydf)[grep("Random",names(mydf))]
gmmcl = trainSubtypeClusterMulti( mydf, rbfnames, maxk=4 )