Structural Topic Modeling with R — Part II
In Structural Topic Modeling with R — Part I, I covered STM basics, including libraries, modeling, and finding an optimal number of topics, all done in RStudio.
We set up our final STM (Structural Topic Model) with 20 topics, 75 iterations, and set prevalence on publisher and date. This is what we have in the data panel in RStudio so far:
Let’s see what our model came up with! The following tools can be used to evaluate the model:
1. labelTopics gives the top words for each topic,
2. findThoughts gives the top documents for each topic (the documents with the highest proportion of each topic).
A sample of the console output is provided below:
Console output for top documents by topic:
Sometimes, I find it useful to see the topic correlation. For this purpose, you can leverage the code below:
At this point, it would be a good idea to look at the convergence plot.
We can also deploy the simple wordcloud library to review word distribution per specific topic.
Now, we will jump into working with meta-data.
First, we will set up the estimated effect and take a look at the publishers' effect. Please note, we need to add a small prior for numerical stability.
We can take a look at another set of publishers and compare the estimated effect.
We can also compare two topics or a single topic across two covariate levels to see how the terms differ.
Next, we will review the topic proportions within the documents for 20 topics.
Lastly, in this series of structural topic modeling in R, we will cover the topic quality and plot them with a topic number.
I hope you enjoyed setting up and reviewing Structural Topic Modeling in R.
Until next time!