Uploaded image for project: 'Hivemall'
  1. Hivemall
  2. HIVEMALL-194

Improve the thoughtput of LDA training

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.5.0
    • 0.7.0
    • None

    Description

      LDA training performance was not good for a production workload.
      Better to do profiling and improve the training throughput. (cc: nzw )

      2018-04-18 06:32:01,410 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Wrote 341047 records to a temporary file for iterative training: /mnt4/hadoop/yarn/cache/yarn/nm-local-dir/usercache/18/appcache/application_1522730964147_209083/container_1522730964147_209083_01_000004/tmp/hivemall_topicmodel8295452490442575792.sgmt (259.4 MiB)
      2018-04-18 07:50:41,979 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4437.724
      2018-04-18 09:05:55,765 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4579.0825
      2018-04-18 10:21:48,865 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4651.425
      2018-04-18 11:37:47,772 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4711.779
      2018-04-18 12:58:02,262 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4739.12
      2018-04-18 14:15:19,689 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4774.822
      2018-04-18 15:30:12,067 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4788.2305
      2018-04-18 16:51:48,425 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4808.8013
      2018-04-18 18:31:14,548 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4826.866
      2018-04-18 19:49:41,266 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4834.5537
      2018-04-18 21:13:19,976 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4850.7837
      2018-04-18 22:29:45,115 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4848.2095
      2018-04-18 23:48:47,483 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4867.3945
      2018-04-19 01:09:23,242 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4861.012
      2018-04-19 02:24:50,819 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4873.5796
      2018-04-19 03:42:27,052 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4873.4126
      2018-04-19 04:57:24,786 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4880.2183
      2018-04-19 06:12:14,056 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4889.6064
      2018-04-19 07:27:26,864 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Mean perplexity over mini-batches: 4885.1523
      2018-04-19 07:27:26,865 INFO [Thread-5] hivemall.topicmodel.ProbabilisticTopicModelBaseUDTF: Performed 20 iterations of 341,047 training examples on a secondary storage (thus 6,820,940 training updates in total)
      2018-04-19 07:27:27,078 WARN [Thread-5] org.apache.hadoop.hive.ql.exec.GroupByOperator: Disable Hash Aggr: #hash table = 99999 #total = 100000 reduction = 0.0 minReduction = 0.5
      

      Attachments

        Issue Links

          Activity

            People

              myui Makoto Yui
              myui Makoto Yui
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: