Adjusts specified features across sites so that the control group within each site matches the mean of the control group in a reference site. The transformation is applied to all data within each site, regardless of diagnosis, for each feature separately.

harmonize_sites(
  data,
  site_col,
  diagnosis_col,
  control_label,
  feature_cols,
  reference_site
)

Arguments

data

A data frame containing the data.

site_col

A string indicating the column name for site identifiers.

diagnosis_col

A string indicating the column name for diagnosis identifiers.

control_label

The label in the diagnosis column identifying the control group.

feature_cols

A vector of strings specifying the feature columns to be harmonized.

reference_site

The site identifier to use as the reference site for control means.

Value

A list containing:

  • harmonized_data: the data frame with features adjusted across sites

  • summary_stats: a data frame with original control means by site and reference means for each feature

Examples

# harmonize_sites(df, site_col = "Site", diagnosis_col = "Diagnosis", 
#                control_label = "Control", feature_cols = c("Feature1", "Feature2"), reference_site = "SiteA")