Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12877

Hive use index for queries will lose some data if the Query file is compressed.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2.1
    • Fix Version/s: None
    • Component/s: Indexing
    • Labels:
      None
    • Environment:

      This problem exists in all Hive versions.no matter what platform

      Description

      Hive created the index using the extracted file length when the file is the compressed,
      but when to divide the data into pieces in MapReduce,Hive use the file length to compare with the extracted file length,if
      If it found that these two lengths are not matched, It filters out the file.So the query will lose some data.
      I modified the source code and make hive index can be used when the files is compressed,please test it.

        Attachments

        1. index_query_compressed_file_failure.q
          2 kB
          yangfang
        2. HIVE-12877.patch
          1 kB
          yangfang
        3. HIVE-12877.1.patch
          2 kB
          yangfang
        4. 19-index_compressed_file.gz
          14.41 MB
          yangfang

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              yangfang yangfang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Due:
                Created:
                Updated: