Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4913

Put deterministic ordering in the top-K ngrams output of UDF context_ngrams()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.10.0, 0.11.0
    • 0.12.0
    • UDF
    • None

    Description

      Currently UDF context_ngrams() output top-K ngrams in the order of descending frequency. When there are ties, i.e., phrases with same ngram value, the order is indeterministic. Tie breaker is needed to break the ties, so that the output is deterministic.

      Attachments

        1. HIVE-4913.patch
          8 kB
          Xuefu Zhang
        2. HIVE-4913.patch
          7 kB
          Xuefu Zhang

        Issue Links

          Activity

            People

              xuefuz Xuefu Zhang
              xuefuz Xuefu Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: