[MADLIB-1351] Add stopping criteria on perplexity to LDA - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: v1.17
Component/s: Module: Parallel Latent Dirichlet Allocation
Labels:
None

Description

In LDA
http://madlib.apache.org/docs/latest/group__grp__lda.html
make stopping criteria on perplexity rather than just number of iterations.

Suggested approach is to do what scikit-learn does
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html

evaluate_every : int, optional (default=0)
How often to evaluate perplexity. Set it to 0 or negative number to not evaluate perplexity in training at all. Evaluating perplexity can help you check convergence in training process, but it will also increase total training time. Evaluating perplexity in every iteration might increase training time up to two-fold.

perplexity_tol : float, optional (default=1e-1)
Perplexity tolerance to stop iterating. Only used when evaluate_every is greater than 0.

Attachments

Issue Links

links to

GitHub Pull Request #432

Activity

People

Assignee:: Himanshu Pandey

Reporter:: Frank McQuillan

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/May/19 19:18

Updated:: 18/Dec/19 22:06

Resolved:: 18/Dec/19 22:06

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

14h 50m