Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3917

Support noscan operation for analyze command

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.11.0
    • 0.11.0
    • Statistics
    • None
    • Reviewed

    Description

      hive supports analyze command to gather statistics from existing tables/partition https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

      It collects:
      1. Number of Rows
      2. Number of files
      3. Size in Bytes

      If table/partition is big, the operation would take time since it will open all files and scan all data.

      It would be nice to support fast operation to gather statistics which doesn't require to open all files:
      1. Number of files
      2. Size in Bytes

      Potential syntax is
      ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan];

      In the future, all statistics without scan can be retrieved via this optional parameter.

      Attachments

        1. HIVE-3917.patch.1
          62 kB
          Gang Tim Liu
        2. HIVE-3917.patch.2
          70 kB
          Gang Tim Liu
        3. HIVE-3917.patch.3
          70 kB
          Gang Tim Liu
        4. HIVE-3917.patch.4
          87 kB
          Gang Tim Liu

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gangtimliu Gang Tim Liu Assign to me
            gangtimliu Gang Tim Liu
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment