Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4885

Alternative object serialization for execution plan in hive testing

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0, 0.11.0
    • Fix Version/s: 0.12.0
    • Component/s: CLI
    • Labels:
      None

      Description

      Currently there are a lot of test cases involving in comparing execution plan, such as those in TestParse suite. XmlEncoder is used to serialize the generated plan by hive, and store it in the file for file diff comparison. However, XmlEncoder is tied with Java compiler, whose implementation may change from version to version. Thus, upgrade the compiler can generate a lot of fake test failures. The following is an example of diff generated when running hive with JDK7:

      Begin query: case_sensitivity.q
      diff -a /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.out /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/parse/case_sensitivity.q.out
      diff -a -b /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.xml /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/plan/case_sensitivity.q.xml
      3c3
      <  <object class="org.apache.hadoop.hive.ql.exec.MapRedTask" id="MapRedTask0">
      ---
      >  <object id="MapRedTask0" class="org.apache.hadoop.hive.ql.exec.MapRedTask"> 
      12c12
      <        <object class="java.util.ArrayList" id="ArrayList0">
      ---
      >        <object id="ArrayList0" class="java.util.ArrayList"> 
      14c14
      <          <object class="org.apache.hadoop.hive.ql.exec.MoveTask" id="MoveTask0">
      ---
      >          <object id="MoveTask0" class="org.apache.hadoop.hive.ql.exec.MoveTask"> 
      18c18
      <              <object class="org.apache.hadoop.hive.ql.exec.MoveTask" id="MoveTask1">
      ---
      >              <object id="MoveTask1" class="org.apache.hadoop.hive.ql.exec.MoveTask"> 
      22c22
      <                  <object class="org.apache.hadoop.hive.ql.exec.StatsTask" id="StatsTask0">
      ---
      >                  <object id="StatsTask0" class="org.apache.hadoop.hive.ql.exec.StatsTask"> 
      60c60
      <                  <object class="org.apache.hadoop.hive.ql.exec.MapRedTask" id="MapRedTask1">
      ---
      >                  <object id="MapRedTask1" class="org.apache.hadoop.hive.ql.exec.MapRedTask"> 
      
      

      As it can be seen, the only difference is the order of the attributes in the serialized XML doc, yet it brings 50+ test failures in Hive.

      We need to have a better plan comparison, or object serialization to improve the situation.

        Attachments

        1. HIVE-4885.patch
          4 kB
          Xuefu Zhang

          Issue Links

            Activity

              People

              • Assignee:
                xuefuz Xuefu Zhang
                Reporter:
                xuefuz Xuefu Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: