Hive
  1. Hive
  2. HIVE-1343

add an interface in RCFile to support concatenation of two files without (de)compression

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.8.0
    • Labels:
      None

      Description

      If two files are concatenated, we need to read each record in these files and write them back to the destination file. The IO cost is mostly unavoidable due to the lack of append functionality in HDFS. However the CPU cost could be significantly reduced by avoiding compression and decompression of the files.

      The File Format layer should provide API that implement the block-level concatenation.

        Issue Links

          Activity

          Ning Zhang created issue -
          He Yongqiang made changes -
          Field Original Value New Value
          Attachment HIVE-1343.1.patch [ 12444499 ]
          Hide
          Ning Zhang added a comment -

          Yongqiang this patch only exposes the FileInputReader to the client and the client has to merge the file locally. This won't be scalable. What we should do is to run this merge job as a map-only job so that it can be run in parallel.

          Talked with Dhruba and he think it would be possible to make it a map-only job. The idea is to define a new RecordReader that does not do decompression and iterate over records. Instead it iterates over uncompressed blocks.

          Show
          Ning Zhang added a comment - Yongqiang this patch only exposes the FileInputReader to the client and the client has to merge the file locally. This won't be scalable. What we should do is to run this merge job as a map-only job so that it can be run in parallel. Talked with Dhruba and he think it would be possible to make it a map-only job. The idea is to define a new RecordReader that does not do decompression and iterate over records. Instead it iterates over uncompressed blocks.
          John Sichi made changes -
          Fix Version/s 0.6.0 [ 12314524 ]
          Carl Steinbach made changes -
          Component/s Serializers/Deserializers [ 12312585 ]
          He Yongqiang made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Carl Steinbach made changes -
          Fix Version/s 0.8.0 [ 12316178 ]
          Jeff Hammerbacher made changes -
          Link This issue relates to HIVE-1950 [ HIVE-1950 ]
          Carl Steinbach made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              He Yongqiang
              Reporter:
              Ning Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development