Hive
  1. Hive
  2. HIVE-1978

Hive SymlinkTextInputFormat does not estimate input size correctly

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None
    1. HIVE-1978.1.patch
      9 kB
      He Yongqiang
    2. HIVE-1978.2.patch
      9 kB
      He Yongqiang

      Activity

      Hide
      Namit Jain added a comment -

      It might be simpler to add a .q file testcase.
      Just load 2 files (say a1.q and a2.q in a hdfs directory).
      Then load a new file, say foo, for the table 'T' - the contents of the file 'foo' are

      a1.q
      a2.q

      Then, 'T' can be queried

      Show
      Namit Jain added a comment - It might be simpler to add a .q file testcase. Just load 2 files (say a1.q and a2.q in a hdfs directory). Then load a new file, say foo, for the table 'T' - the contents of the file 'foo' are a1.q a2.q Then, 'T' can be queried
      Hide
      Namit Jain added a comment -

      Also, it might be simpler to add the new function 'getContentSummary' in all existing
      input formats.

      You can create a dummy class which all other hive input formats (other than symlinktextinputformat) extend.
      In the abstract dummy class, the existing defn. can be there.

      FileSystem fs = p.getFileSystem(ctx.getConf());
      cs = fs.getContentSummary(p);

      That waym you dont need any special checking in Utilities.java - it calls getContentSummary(),
      which is implemented by all input formats that hive supports.

      Show
      Namit Jain added a comment - Also, it might be simpler to add the new function 'getContentSummary' in all existing input formats. You can create a dummy class which all other hive input formats (other than symlinktextinputformat) extend. In the abstract dummy class, the existing defn. can be there. FileSystem fs = p.getFileSystem(ctx.getConf()); cs = fs.getContentSummary(p); That waym you dont need any special checking in Utilities.java - it calls getContentSummary(), which is implemented by all input formats that hive supports.
      Hide
      He Yongqiang added a comment -

      namit, a .q test file can not include what this jira does. From a .q file, it is very difficult to know SymlinkTextInputFormat get the input size correctly.

      >>getContentSummary' in all existing input formats.
      There is no guarantee that the inputformat is from Hive. It is very difficult to change all input format.

      Show
      He Yongqiang added a comment - namit, a .q test file can not include what this jira does. From a .q file, it is very difficult to know SymlinkTextInputFormat get the input size correctly. >>getContentSummary' in all existing input formats. There is no guarantee that the inputformat is from Hive. It is very difficult to change all input format.
      Hide
      He Yongqiang added a comment -

      fixed a typo

      Show
      He Yongqiang added a comment - fixed a typo
      Hide
      Ning Zhang added a comment -

      +1

      Show
      Ning Zhang added a comment - +1
      Hide
      Ning Zhang added a comment -

      Committed. Thanks Yongqiang!

      Show
      Ning Zhang added a comment - Committed. Thanks Yongqiang!

        People

        • Assignee:
          He Yongqiang
          Reporter:
          He Yongqiang
        • Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development