Latent Dirichlet Allocation

Blueprint for the LDA Machine

In this distribution we have a parameter `alpha’. Depending on the values of alpha we will get different distributions.
Dirichlet distributions based on alpha values
we consider two Dirichlet distributions - One which will associate document with topics and the other topics with words
Two Dirichlet distributions
These two Dirichlet distributions are the parameters in the blueprint. We generate different documents by adjusting the points in these two distributions
Two Dirichlet distributions as knobs in the blueprint

LDA in action
We generate documents by assigning documents to topics and topics to words. The probability of generating the same article as training data will be very low.
Generating documents with assigning of topics for different documents and assigning topics to different words
Comparing the probability of the document generated with the ground-truth

Gibbs Sampling in the context of LDA is trying to tag the words in the document to be monochromatic (belonging to a single category) and trying to tag the document to be monochromatic
Gibbs Sampling
Ensuring we are considering all the topics are considered in Gibbs sampling
Assigning to topics to documents based on assigning topics to words
Maximizing the probability of the LDA equation is very difficult. Hence we use Gibbs sampling

Gibbs sampling is a statistical algorithm used to generate samples from a probability distribution that might be too complex to calculate directly. It is often used in Bayesian inference, where the goal is to estimate the unknown parameters of a model given some observed data.
The idea behind Gibbs sampling is to iteratively sample from the conditional distributions of each variable in the model, while holding all other variables fixed. This means that we generate a sample for one variable at a time, based on the values of the other variables in the model.
The process starts with some initial values for all the variables in the model. Then, for each iteration of the algorithm, we randomly select one of the variables and update its value based on the values of the other variables in the model. We keep doing this for all the variables until we have generated enough samples.