Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3958

support partial scan for analyze command - RCFile

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.11.0
    • None
    • Reviewed

    Description

      analyze commands allows us to collect statistics on existing tables/partitions. It works great but might be slow since it scans all files.

      There are 2 ways to speed it up:
      1. collect stats without file scan. It may not collect all stats but good and fast enough for use case. HIVE-3917 addresses it
      2. collect stats via partial file scan. It doesn't scan all content of files but part of it to get file metadata. some examples are https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and HFile of Hbase (Edit: That link should be https://cwiki.apache.org/confluence/display/Hive/RCFileCat.)

      This jira is targeted to address the #2. More specifically RCFile format.

      Attachments

        1. HIVE-3958.patch.1
          65 kB
          Gang Tim Liu
        2. HIVE-3958.patch.2
          64 kB
          Gang Tim Liu
        3. HIVE-3958.patch.3
          67 kB
          Gang Tim Liu
        4. HIVE-3958.patch.4
          76 kB
          Gang Tim Liu
        5. HIVE-3958.patch.5
          75 kB
          Gang Tim Liu
        6. HIVE-3958.patch.6
          75 kB
          Gang Tim Liu

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gangtimliu Gang Tim Liu Assign to me
            gangtimliu Gang Tim Liu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment