Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1978

Hive SymlinkTextInputFormat does not estimate input size correctly

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • None
    • None

    Attachments

      1. HIVE-1978.1.patch
        9 kB
        He Yongqiang
      2. HIVE-1978.2.patch
        9 kB
        He Yongqiang

      Activity

        namit Namit Jain added a comment -

        It might be simpler to add a .q file testcase.
        Just load 2 files (say a1.q and a2.q in a hdfs directory).
        Then load a new file, say foo, for the table 'T' - the contents of the file 'foo' are

        a1.q
        a2.q

        Then, 'T' can be queried

        namit Namit Jain added a comment - It might be simpler to add a .q file testcase. Just load 2 files (say a1.q and a2.q in a hdfs directory). Then load a new file, say foo, for the table 'T' - the contents of the file 'foo' are a1.q a2.q Then, 'T' can be queried
        namit Namit Jain added a comment -

        Also, it might be simpler to add the new function 'getContentSummary' in all existing
        input formats.

        You can create a dummy class which all other hive input formats (other than symlinktextinputformat) extend.
        In the abstract dummy class, the existing defn. can be there.

        FileSystem fs = p.getFileSystem(ctx.getConf());
        cs = fs.getContentSummary(p);

        That waym you dont need any special checking in Utilities.java - it calls getContentSummary(),
        which is implemented by all input formats that hive supports.

        namit Namit Jain added a comment - Also, it might be simpler to add the new function 'getContentSummary' in all existing input formats. You can create a dummy class which all other hive input formats (other than symlinktextinputformat) extend. In the abstract dummy class, the existing defn. can be there. FileSystem fs = p.getFileSystem(ctx.getConf()); cs = fs.getContentSummary(p); That waym you dont need any special checking in Utilities.java - it calls getContentSummary(), which is implemented by all input formats that hive supports.
        he yongqiang He Yongqiang added a comment -

        namit, a .q test file can not include what this jira does. From a .q file, it is very difficult to know SymlinkTextInputFormat get the input size correctly.

        >>getContentSummary' in all existing input formats.
        There is no guarantee that the inputformat is from Hive. It is very difficult to change all input format.

        he yongqiang He Yongqiang added a comment - namit, a .q test file can not include what this jira does. From a .q file, it is very difficult to know SymlinkTextInputFormat get the input size correctly. >>getContentSummary' in all existing input formats. There is no guarantee that the inputformat is from Hive. It is very difficult to change all input format.
        he yongqiang He Yongqiang added a comment -

        fixed a typo

        he yongqiang He Yongqiang added a comment - fixed a typo
        nzhang Ning Zhang added a comment -

        +1

        nzhang Ning Zhang added a comment - +1
        nzhang Ning Zhang added a comment -

        Committed. Thanks Yongqiang!

        nzhang Ning Zhang added a comment - Committed. Thanks Yongqiang!

        People

          he yongqiang He Yongqiang
          he yongqiang He Yongqiang
          Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

          Dates

            Created:
            Updated:
            Resolved: