Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-28196

Preserve column stats when applying UDF upper/lower.

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Current Hive re-estimates column stats (including avgColLen) when it encounters UDF.
      In the case of upper and lower, Hive sets avgColLen to hive.stats.max.variable.length.
      But these UDFs do not change column stats and the default value(100) is too high for string type key columns, on which upper/lower are usually applied.

      This patch keeps input data's avgColLen after applying UDF upper/lower to make a better query plan.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            seonggon Seonggon Namgung Assign to me
            seonggon Seonggon Namgung
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment