Predict subtype from multivariate data — predictSubtypeClusterMulti • subtyper

This is the inference module for subtype definition based on a matrix. Currently only supports clustering based on gaussian mixtures via the ClusterR package.

predictSubtypeClusterMulti(
  mxdfin,
  measureColumns,
  clusteringObject,
  clustername = "GMMClusters",
  idvar,
  visitName,
  baselineVisit,
  reorderingDataframe,
  distance_metric = "pearson_correlation"
)

Arguments

mxdfin: Input data frame
measureColumns: vector defining the data columns to be used for clustering.
clusteringObject: a clustering object to predict clusters
clustername: column name for the identified clusters
idvar: variable name for unique subject identifier column
visitName: the column name defining the visit variables
baselineVisit: the string naming the baseline visit
reorderingDataframe: reorder the cluster names based on this dataframe mapping of original to new variable names
distance_metric: see medoid methods in ClusterR

Value

the clusters attached to the data frame; also returns membership probabilities

Author

Avants BB

Examples

mydf = generateSubtyperData( 100 )
rbfnames = names(mydf)[grep("Random",names(mydf))]
gmmcl = trainSubtypeClusterMulti( mydf, rbfnames, maxk=4 )
gmmclp = predictSubtypeClusterMulti( mydf, rbfnames, gmmcl )