Uploaded image for project: 'HCatalog'
  1. HCatalog
  2. HCATALOG-580

Optimizations in HCAT-538 break e2e tests

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.5
    • 0.5
    • None
    • None
    • RH 5.8 (on AWS)
      Hadoop 1.1.2.17 (build)
      HCat 0.5 (build)

    Description

      The optimizations brought in by HCATALOG-538 break dynamic partitioning in the e2e tests. The issue is that the assumption that if the first child in a directory structure is a directory, the rest are directories, and if the first child is a file, then the rest are files is an incorrect one.

      (Admittedly, one part of that, that of assuming that if the first child is a file, the assumption that it is a leaf directory is not necessarily a bad one in premise, although still incorrect)

      The issue with this is that underlying FileOutputCommitter and OutputFormat behaviour would affect whether or not you get files or directories, or whether there would be any _temporary directories still left behind, for eg.

      In the case I tested, the issue is that there is a _temporary directory in a "leaf" directory, followed by part files. The optimization sees the _temporary directory, finds nothing inside it, so doesn't mkdir any parent, then decides that the rest are directories, then moves to the part file, and tries to rename it directly without mkdir-ing its parent directory.

      The e2e test conf in question is Pig_Checkin_7

                      {
                                       'num' => 7
                                      ,'hcat_prep'=>q\drop table if exists pig_checkin_7;
      create table pig_checkin_7 (name string, age int) partitioned by (ds string) STORED AS TEXTFILE;\
                                      ,'pig' => q\a = load 'studentparttab30k' using org.apache.hcatalog.pig.HCatLoader();
      b = foreach a generate name, age, ds;
      store b into 'pig_checkin_7' using org.apache.hcatalog.pig.HCatStorer();\,
                                      ,'result_table' => 'pig_checkin_7',
                                      ,'sql'   => "select name, age, ds from studentparttab30k;",
                                      ,'floatpostprocess' => 1
                                      ,'delimiter' => '       '
                      }
      

      Attachments

        1. HCATALOG-580-3.patch
          6 kB
          Daniel Dai
        2. HCATALOG-580-2.patch
          6 kB
          Daniel Dai
        3. HCATALOG-580-1.patch
          5 kB
          Daniel Dai

        Issue Links

          Activity

            People

              daijy Daniel Dai
              sushanth Sushanth Sowmyan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: