Structural Topic Modeling with R — Part II

3 min readMay 22, 2021

In Structural Topic Modeling with R — Part I, I covered STM basics, including libraries, modeling, and finding an optimal number of topics, all done in RStudio.

We set up our final STM (Structural Topic Model) with 20 topics, 75 iterations, and set prevalence on publisher and date. This is what we have in the data panel in RStudio so far:

Let’s see what our model came up with! The following tools can be used to evaluate the model:
1. labelTopics gives the top words for each topic,

2. findThoughts gives the top documents for each topic (the documents with the highest proportion of each topic).

A sample of the console output is provided below:

Console output for top documents by topic:

Sometimes, I find it useful to see the topic correlation. For this purpose, you can leverage the code below:

At this point, it would be a good idea to look at the convergence plot.

We can also deploy the simple wordcloud library to review word distribution per specific topic.

Now, we will jump into working with meta-data.

First, we will set up the estimated effect and take a look at the publishers' effect. Please note, we need to add a small prior for numerical stability.