R/subtyper.R
match_cohort_pair.Rd
This function generates random subsets of a data frame to minimize the difference with another data frame based on a specified set of columns, as measured by the t-statistic. Authored by Avants and Chat-GPT 3.5.
match_cohort_pair(
df1,
df2,
cols,
sample_size,
num_iterations = 1000,
restrict_df1 = 0.05,
option = "optimal",
verbose = TRUE
)
Data frame to be subsetted.
Data frame used as a reference for comparison.
Vector of column names used for matching.
the number to sample from df1
Number of random subsets to generate.
float lower quantile to restrict df1 based on first col value to match range of df2
either random or optimal
boolean
rownames of a sub data frame that minimizes the difference with df2 in terms of t-statistic.
set.seed(123)
df1 <- data.frame(A = rnorm(100), B = factor(sample(1:3, 100, replace = TRUE)), C = rnorm(100))
df2 <- data.frame(A = rnorm(50), B = factor(sample(1:3, 50, replace = TRUE)), C = rnorm(50))
matching_cols <- c("A", "B")
# matched_subset <- match_cohort_pair(df1, df2, matching_cols)
# print(matched_subset)