Hive
  1. Hive
  2. HIVE-524

ExecDriver adds 0 byte file to input paths

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.5.0
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      In the addInputPaths method in ExecDriver:
      If the input path of a partition cannot be found or contains no files with data in them, a 0 byte file is created and added to the job instead. This causes our custom InputFormat to throw an exception since it is asked to process an unknown file format (not an lzo file).

        Issue Links

          Activity

          Hide
          Johan Oskarsson added a comment -

          What is the purpose of the empty files? Could we simply skip adding anything if the input partition directory is missing? Or alternatively raise an error to the user.

          Show
          Johan Oskarsson added a comment - What is the purpose of the empty files? Could we simply skip adding anything if the input partition directory is missing? Or alternatively raise an error to the user.
          Hide
          Namit Jain added a comment -

          The problem is that downstream map-reduce jobs can run into problems.

          For eg:

          consider the query:

          select .... from
          (query 1 union all query 2);

          It will result in 3 map-reduce jobs: query 1, query 2 and outer query depending on query 1 and query2.

          If query2 had empty partitions, and we disallow it.
          outer query will fail because the output for query 2 has not been created.

          That's why we create a dummy file

          The correct fix would be to create a file based on the table descriptor instead of some hard-coded value. Then, the custom input format can be attached to the table descriptor and will work fine.
          I am already in the process of implementing that as part of map-join, and will merge it in soon.

          Show
          Namit Jain added a comment - The problem is that downstream map-reduce jobs can run into problems. For eg: consider the query: select .... from (query 1 union all query 2); It will result in 3 map-reduce jobs: query 1, query 2 and outer query depending on query 1 and query2. If query2 had empty partitions, and we disallow it. outer query will fail because the output for query 2 has not been created. That's why we create a dummy file The correct fix would be to create a file based on the table descriptor instead of some hard-coded value. Then, the custom input format can be attached to the table descriptor and will work fine. I am already in the process of implementing that as part of map-join, and will merge it in soon.
          Hide
          Namit Jain added a comment -
          Show
          Namit Jain added a comment - The fix should be in: https://issues.apache.org/jira/browse/HIVE-195
          Hide
          Zheng Shao added a comment -

          Namit, is it convenient to split the patch for this one? That map-side join is a big patch but this one seems a really small fix.

          Show
          Zheng Shao added a comment - Namit, is it convenient to split the patch for this one? That map-side join is a big patch but this one seems a really small fix.
          Hide
          Joydeep Sen Sarma added a comment -

          maybe we should get rid of the assumption that the subqueries need to produce output directories?

          Show
          Joydeep Sen Sarma added a comment - maybe we should get rid of the assumption that the subqueries need to produce output directories?
          Hide
          Johan Oskarsson added a comment -

          Seems HIVE-195 didn't fix this issue? I just ran into it again with the latest trunk checkout

          Show
          Johan Oskarsson added a comment - Seems HIVE-195 didn't fix this issue? I just ran into it again with the latest trunk checkout

            People

            • Assignee:
              Unassigned
              Reporter:
              Johan Oskarsson
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development