Structural Topic Modeling with R — Part II

Jovan Trajceski
3 min readMay 22, 2021

--

In Structural Topic Modeling with R — Part I, I covered STM basics, including libraries, modeling, and finding an optimal number of topics, all done in RStudio.

We set up our final STM (Structural Topic Model) with 20 topics, 75 iterations, and set prevalence on publisher and date. This is what we have in the data panel in RStudio so far:

Screenshot of RStudio

Let’s see what our model came up with! The following tools can be used to evaluate the model:
1. labelTopics gives the top words for each topic,

2. findThoughts gives the top documents for each topic (the documents with the highest proportion of each topic).

A sample of the console output is provided below:

Top words per topic.

Console output for top documents by topic:

Top documents per topic.
Top 3 paragraphs for Topic 1 to 15

Sometimes, I find it useful to see the topic correlation. For this purpose, you can leverage the code below:

Topic correlation.

At this point, it would be a good idea to look at the convergence plot.

Convergence plot.

We can also deploy the simple wordcloud library to review word distribution per specific topic.

Wordcloud for topic 17.

Now, we will jump into working with meta-data.

First, we will set up the estimated effect and take a look at the publishers' effect. Please note, we need to add a small prior for numerical stability.

Effect of Zacks vs. Seeking Alpha publishers.

We can take a look at another set of publishers and compare the estimated effect.

Effect of ‘TalkMarkets’ vs. ‘Investopedia’ publishers.

We can also compare two topics or a single topic across two covariate levels to see how the terms differ.

Comparing content in topic 17 and topic 12.

Next, we will review the topic proportions within the documents for 20 topics.

Topic proportions.

Lastly, in this series of structural topic modeling in R, we will cover the topic quality and plot them with a topic number.

Topic quality — Topic 16 and topic 20.

I hope you enjoyed setting up and reviewing Structural Topic Modeling in R.

Until next time!

--

--

Jovan Trajceski

MSc. Data Analytics (HKBU, Hong Kong) and CPA (Toronto, Canada) designated professional with a passion for Data Science, Digital Transformation, and Analytics.