machine learning - Similarity between LDA results over two different number of topics? -
if choose 20 topics in lda , if choose 30 topics. question both these results intersect 20 topics , produce similar results
short answer - no. way lda works uses gibbs sampler dirichlet distribution on document vectors. allocations made on sample , hence different both because of sampling randomness , allocation uncertainties unless define explicit random seed , run same number of topics k. take @ original paper blei et al. 2003 see how k defined.
update (with regard comment): hierarchical lda (hlda) trying solve problem of retaining topics , subtopics constructing levels of topics following chinese restaurant model. it's still not perfect.
the way flat lda works, however, looks @ documents rather topics produce further results. say, topic 0 (first table in restaurant) , documents try sit there, it's not enough space , create topic 1 docs feel more comfortable, etc., etc. right point of view of how these tables created. there 1 big thing that's critical - topic 0 changes when create new table/topic 1 because documents have left first table , took words (or probabilities of cooccurence thereof) them new table , words in topic 0 got reshuffled given new situation. same happens when create more tables/topics previous re-estimated. hence, never same 20 topics when rerunning 30.
Comments
Post a Comment