Building a Topic Model Shiny App from Scratch
=====================================================
As a professional technical blogger, it’s not uncommon for individuals to seek guidance on translating their existing R scripts into user-friendly applications like Shiny. In this article, we’ll explore the process of converting a topic model script into a functional Shiny app that allows non-R trained colleagues to easily input and output data.
Prerequisites
Before diving into the code, ensure you have the following installed:
- R
- Shiny (install using
install.packages("shiny")) - RStudio or another IDE of your choice for development
Familiarize yourself with basic Shiny concepts and syntax. This article will cover the necessary material to help you get started.
Setting Up the Project Structure
Create a new directory for your project, then create the following subdirectories:
ui(user interface)server.R(server-side logic)input.csv(output CSV file)
Inside the ui directory, create an R file with the same name. This will serve as our Shiny app’s user interface.
Building the User Interface
In your ui.R file, define a simple interface for inputting the Excel file and outputting the results:
# ui.R
library(shiny)
# Define UI
ui <- fluidPage(
# Input section
sidebarLayout(
sidebarPanel(
fileInput("file", "Select Excel File")
),
# Output section
mainPanel(
actionButton("download", "Download Results"),
tableOutput("output")
)
)
)
This code creates a basic interface with an input field for selecting the Excel file and a download button to output the results.
Building the Server-Side Logic
In your server.R file, define the server-side logic that processes the user input:
# server.R
library(shiny)
library(readxl)
# Define server function
server <- function(input, output) {
# Input processing
file_path <- input$file$datapath
if (!is.null(file_path)) {
df <- read_excel(file_path,
col_names = TRUE,
skip = 1)
# Preprocessing steps remain the same as in your R script
df$charsinfeedback <- sapply(df$Text, function(x) nchar(x))
df$wordsinfeedback <- sapply(strsplit(df$Text, "\\s+"), length)
head(df$charsinfeedback)
extendedstopwords <- c("a", "amp", "hark", "day", "via", "harkiv",
"music", '"', "'",
)
extendedstopwords <- c(extendedstopwords,
gsub("'","",grep("'",extendedstopwords,value = T)))
dtm.control <- list(
tolower = TRUE,
removePunctuation = TRUE,
removeNumbers = TRUE,
stopwords = c(stopwords("english")),
stemming = FALSE,
wordLengths = c(3, Inf),
weighting = weightTf
)
# Create document-term matrix (DTM)
dtm <- DocumentTermMatrix(Corpus(VectorSource(df$Text)), control = dtm.control)
# Remove sparse terms and convert to a matrix
matrix_dtm <- as.matrix(dtm)
dim(matrix_dtm)
freq <- colSums(as.matrix(dtm))
length(freq)
ord <- order(freq, decreasing = TRUE)
freq[head(ord)]
freq[tail(ord)]
findFreqTerms(dtm, lowfreq = 50)
# Set up LDA model
burnin <- 4000
iter <- 2000
thin <- 500
seed <- list(2003, 5, 63, 10001, 765)
nstart <- 5
K <- 10
rowTotals <- apply(dtm, 1, sum)
empty.rows <- dtm[rowTotals == 0, ]$dimnames[[1]][[1]]
corpus3 <- F_stem[as.numeric(empty.rows)]
dtm <- dtm[rowTotals > 0, ]
dim(dtm)
ldaOut3 <- LDA(matrix_dtm, K, method = "Gibbs", control = list(
nstart = nstart,
seed = seed,
best = TRUE,
burnin = burnin,
iter = iter,
thin = thin
))
# Extract and write topics, terms, and probabilities to CSV files
ldaOut3.topics <- as.matrix(terms(ldaOut3))
write.csv(ldaOut3.topics, file = paste("LDAGibbs", K, "K3DocsToTopics.csv"))
ldaOut3.terms <- as.matrix(terms(ldaOut3, 10))
write.csv(ldaOut3.terms, file = paste("LDAGibbs", K, "TopicsToTerms.csv"))
topicProbabilities3 <- as.data.frame(ldaOut3$@gamma)
write.csv(topicProbabilities3, file = paste("LDAGibbs", K, "TopicProbabilities.csv"))
}
# Download button action
observeEvent(input$download, {
download_file(path = paste("output", file_path), filename = paste("Output_LDA", file_path))
})
# Output table
output$table <- renderTable({
df
})
}
In this code:
- We define a function that processes the user input Excel file.
- The function performs preprocessing steps (character counting, word splitting, stemming, and stopword removal) identical to those in your original R script.
- We create an LDA model using the document-term matrix and extract topics, terms, and probabilities for writing CSV files.
- An observe event is used to trigger a download button action when clicked.
Deploying the Shiny App
To deploy the app:
- Open your RStudio IDE or another suitable environment.
- Create a new project with the
fileInputUI andserver.Rcode above. - Connect to an R server (e.g., local, remote, or cloud-based) using a package like
shinyproxy. - Run
shinyApp(ui = ui(), server = server())to launch your Shiny app.
User Experience
Once launched, users can input their Excel file and view the results in an interactive table:
- Select an Excel file by clicking the “Select Excel File” button.
- Click the “Download Results” button to download a CSV file containing the output data.
This concludes our exploration of building a topic model Shiny app from scratch. You’ve successfully converted your existing R script into a functional, user-friendly application that allows non-R trained colleagues to easily input and output data.
Last modified on 2024-06-26