This function selects the top k rows of a dataframe according to a specified column and criterion (either maximizing or minimizing the values). It returns a logical vector indicating which rows are among the top k.

topk(df, column, k, maximize = TRUE)

Arguments

df

A dataframe from which to select the top rows.

column

A string specifying the column name to base the selection on.

k

An integer specifying the number of top rows to select.

maximize

A logical value indicating whether to maximize (TRUE) or minimize (FALSE) the criterion. Default is TRUE.

Value

A logical vector of the same length as the number of rows in df. TRUE indicates that the row is one of the top k.

Examples

df <- data.frame(
  id = 1:10,
  value = c(5, 2, 9, 4, 7, 3, 6, 10, 8, 1)
)

# Select the top 3 rows based on maximizing the 'value' column
topk(df, "value", 3, maximize = TRUE)
#>  [1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE

# Select the top 3 rows based on minimizing the 'value' column
topk(df, "value", 3, maximize = FALSE)
#>  [1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE