Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
0.5.0
-
None
Description
LDA training performance was not good for a production workload.
Better to do profiling and improve the training throughput. (cc: nzw )
2018-04-18 06:32:01,410 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Wrote 341047 records to a temporary file for iterative training: /mnt4/hadoop/yarn/cache/yarn/nm-local-dir/usercache/18/appcache/application_1522730964147_209083/container_1522730964147_209083_01_000004/tmp/hivemall_topicmodel8295452490442575792.sgmt (259.4 MiB) 2018-04-18 07:50:41,979 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4437.724 2018-04-18 09:05:55,765 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4579.0825 2018-04-18 10:21:48,865 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4651.425 2018-04-18 11:37:47,772 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4711.779 2018-04-18 12:58:02,262 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4739.12 2018-04-18 14:15:19,689 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4774.822 2018-04-18 15:30:12,067 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4788.2305 2018-04-18 16:51:48,425 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4808.8013 2018-04-18 18:31:14,548 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4826.866 2018-04-18 19:49:41,266 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4834.5537 2018-04-18 21:13:19,976 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4850.7837 2018-04-18 22:29:45,115 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4848.2095 2018-04-18 23:48:47,483 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4867.3945 2018-04-19 01:09:23,242 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4861.012 2018-04-19 02:24:50,819 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4873.5796 2018-04-19 03:42:27,052 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4873.4126 2018-04-19 04:57:24,786 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4880.2183 2018-04-19 06:12:14,056 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4889.6064 2018-04-19 07:27:26,864 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4885.1523 2018-04-19 07:27:26,865 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Performed 20 iterations of 341,047 training examples on a secondary storage (thus 6,820,940 training updates in total) 2018-04-19 07:27:27,078 WARN [Thread-5] org.apache.hadoop.hive.ql.exec.GroupByOperator: Disable Hash Aggr: #hash table = 99999 #total = 100000 reduction = 0.0 minReduction = 0.5
Attachments
Issue Links
- relates to
-
HIVEMALL-199 Reduce memory usage of lda_predict
- Open