Skip to contents

This function filters a given data frame to select the most representative documents for a specified topic based on their probability scores. The documents are sorted by relevance in descending order, and the top n documents are returned.

Usage

get_most_representative_docs(df, topic_nr, n_docs = 5)

Arguments

df

A data frame containing at least the columns 'Topic', 'Document', and 'probs'.

topic_nr

An integer specifying the topic number to filter the documents.

n_docs

An integer specifying the number of top representative documents to return. Defaults to 5.

Value

A vector of the most representative documents corresponding to the specified topic. If the number of documents available is less than n_docs, all available documents are returned.

Examples

if (FALSE) { # \dontrun{
# Assuming `df_docs` is a data frame with columns `Topic`, `Document`, and `probs`
get_most_representative_docs(df_docs, topic_nr = 3, n_docs = 5)
} # }