Hive
  1. Hive
  2. HIVE-2126

Hive's symlink text input format should be able to work with ComineHiveInputFormat

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      at compile time, if a partition's file format is SymlinkTextInputFormat, will replace the symlink path with paths in the symlink file. This way, it will work with Hive's HiveCombineFileInputFormat.

      The reason we are doing it at compile time is because:
      1) At run time, the input path is not only used to get record reader, but also used for hive to get aliases and thus operator tree. But the CombineHiveInputFormat can have multiple paths for each split, and when switching paths, it also set the job with new input file name. So it always require a real input path name. Can not fake it.
      2) if write a new input format, it will require a lot of duplication work with existing CombineHiveInputFormat.

      1. HIVE-2126.2.patch
        21 kB
        He Yongqiang
      2. HIVE-2126.1.patch
        18 kB
        He Yongqiang

        Activity

        He Yongqiang created issue -
        He Yongqiang made changes -
        Field Original Value New Value
        Attachment HIVE-2126.1.patch [ 12477283 ]
        He Yongqiang made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        He Yongqiang added a comment -
        Show
        He Yongqiang added a comment - review board: https://reviews.apache.org/r/653/
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/653/
        -----------------------------------------------------------

        Review request for hive.

        Summary
        -------

        Hive's symlink text input format should be able to work with ComineHiveInputFormat

        This addresses bug hive-2126.
        https://issues.apache.org/jira/browse/hive-2126

        Diffs


        trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1096093
        trunk/conf/hive-default.xml 1096093
        trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1096093
        trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1096093
        trunk/ql/src/java/org/apache/hadoop/hive/ql/io/ReworkMapredInputFormat.java PRE-CREATION
        trunk/ql/src/java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java 1096093
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java 1096093
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1096093
        trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestSymlinkTextInputFormat.java 1096093

        Diff: https://reviews.apache.org/r/653/diff

        Testing
        -------

        Thanks,

        Yongqiang

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/653/ ----------------------------------------------------------- Review request for hive. Summary ------- Hive's symlink text input format should be able to work with ComineHiveInputFormat This addresses bug hive-2126. https://issues.apache.org/jira/browse/hive-2126 Diffs trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1096093 trunk/conf/hive-default.xml 1096093 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1096093 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1096093 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/ReworkMapredInputFormat.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java 1096093 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java 1096093 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1096093 trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestSymlinkTextInputFormat.java 1096093 Diff: https://reviews.apache.org/r/653/diff Testing ------- Thanks, Yongqiang
        Hide
        Namit Jain added a comment -

        I havent taken a look at the code - but a high level question.
        Should we call it SymbolicInputFormat instead ?
        I mean, it should work for all kinds of files - not just text files.
        For backward compatibility, we can make SymbolicTextInputFormat extend
        SymbolicInputFormat.

        Show
        Namit Jain added a comment - I havent taken a look at the code - but a high level question. Should we call it SymbolicInputFormat instead ? I mean, it should work for all kinds of files - not just text files. For backward compatibility, we can make SymbolicTextInputFormat extend SymbolicInputFormat.
        Hide
        He Yongqiang added a comment -

        The reason of using"ReworkMapredInputFormat" is that the interface "reworkMapred" can also be used by other formats in future, like some other file format also want to change the mapred work depends on the input.
        what do you think?

        Show
        He Yongqiang added a comment - The reason of using"ReworkMapredInputFormat" is that the interface "reworkMapred" can also be used by other formats in future, like some other file format also want to change the mapred work depends on the input. what do you think?
        Hide
        Namit Jain added a comment -

        It might be simpler if the HiveInputFormat (or a new interface which extends InputFommat) adds
        this new method.

        All Hive input formats will implement the above interface. The default implementation does nothing,

        You dont need code like below:
        if (partDesc.getInputFileFormatClass().equals(SymlinkTextInputFormat.class)) {
        //change to TextInputFormat

        You always call a new method, which is a no-op for all other input formats right now.

        Show
        Namit Jain added a comment - It might be simpler if the HiveInputFormat (or a new interface which extends InputFommat) adds this new method. All Hive input formats will implement the above interface. The default implementation does nothing, You dont need code like below: if (partDesc.getInputFileFormatClass().equals(SymlinkTextInputFormat.class)) { //change to TextInputFormat You always call a new method, which is a no-op for all other input formats right now.
        He Yongqiang made changes -
        Attachment HIVE-2126.2.patch [ 12477334 ]
        Hide
        Namit Jain added a comment -

        can you update the review board ?

        Show
        Namit Jain added a comment - can you update the review board ?
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/653/
        -----------------------------------------------------------

        (Updated 2011-04-25 20:48:09.176419)

        Review request for hive.

        Changes
        -------

        added a new class SymbolicInputFormat, and move the reworkMapred to this new class.
        move the new code from SemanticAnalyzer to a Utilities method

        Summary
        -------

        Hive's symlink text input format should be able to work with ComineHiveInputFormat

        This addresses bug hive-2126.
        https://issues.apache.org/jira/browse/hive-2126

        Diffs (updated)


        trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1096548
        trunk/conf/hive-default.xml 1096548
        trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1096548
        trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1096548
        trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1096548
        trunk/ql/src/java/org/apache/hadoop/hive/ql/io/ReworkMapredInputFormat.java PRE-CREATION
        trunk/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java PRE-CREATION
        trunk/ql/src/java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java 1096548
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java 1096548
        trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1096548
        trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestSymlinkTextInputFormat.java 1096548

        Diff: https://reviews.apache.org/r/653/diff

        Testing
        -------

        Thanks,

        Yongqiang

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/653/ ----------------------------------------------------------- (Updated 2011-04-25 20:48:09.176419) Review request for hive. Changes ------- added a new class SymbolicInputFormat, and move the reworkMapred to this new class. move the new code from SemanticAnalyzer to a Utilities method Summary ------- Hive's symlink text input format should be able to work with ComineHiveInputFormat This addresses bug hive-2126. https://issues.apache.org/jira/browse/hive-2126 Diffs (updated) trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1096548 trunk/conf/hive-default.xml 1096548 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1096548 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1096548 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1096548 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/ReworkMapredInputFormat.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java 1096548 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java 1096548 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1096548 trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestSymlinkTextInputFormat.java 1096548 Diff: https://reviews.apache.org/r/653/diff Testing ------- Thanks, Yongqiang
        Hide
        Namit Jain added a comment -

        +1

        Show
        Namit Jain added a comment - +1
        Hide
        Namit Jain added a comment -

        Committed. Thanks Yongqiang

        Show
        Namit Jain added a comment - Committed. Thanks Yongqiang
        Namit Jain made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Carl Steinbach made changes -
        Fix Version/s 0.8.0 [ 12316178 ]
        Carl Steinbach made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            He Yongqiang
            Reporter:
            He Yongqiang
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development