Hive
  1. Hive
  2. HIVE-1675

SAXParseException on plan.xml during local mode.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      When hive switches to local mode (hive.exec.mode.local.auto=true) I receive a sax parser exception on the plan.xml
      If I set hive.exec.mode.local.auto=false I get the correct results.

      1. local_10005_plan.xml
        50 kB
        Bennie Schut
      2. local_10006_plan.xml
        138 kB
        Bennie Schut
      3. HIVE-1675.patch
        0.6 kB
        Bennie Schut

        Activity

        Hide
        Bennie Schut added a comment -

        Added HIVE-2073 for the different error we now get on local mode.

        Show
        Bennie Schut added a comment - Added HIVE-2073 for the different error we now get on local mode.
        Hide
        Bennie Schut added a comment -

        I'm unable to reproduce this on the current trunk. I still get errors on local mode but it's something different "java.lang.IllegalStateException(Shutdown in progress)" after witch the server shuts down. I'll make a new jira for that and I think we can close this one.

        Show
        Bennie Schut added a comment - I'm unable to reproduce this on the current trunk. I still get errors on local mode but it's something different "java.lang.IllegalStateException(Shutdown in progress)" after witch the server shuts down. I'll make a new jira for that and I think we can close this one.
        Hide
        Joydeep Sen Sarma added a comment -

        the combination of auto local and parallel had a bug that was recently fixed (HIVE-1776)

        The stack reported above is weird - we should not be entering this code path. as u mentioned - HIVE-1846 will fix this - but currently, the getMapRedWork call invoked from different places in the stack should be satisfied from the an in-memory cache (and shouldn't hit the file). the file is not written out for local mode (because there's an assumption that everything runs in the same jvm).

        Unable to explain this - i think this is worth fixing because the plan file processing takes some time and it's better to retrieve it from memory where possible.

        Show
        Joydeep Sen Sarma added a comment - the combination of auto local and parallel had a bug that was recently fixed ( HIVE-1776 ) The stack reported above is weird - we should not be entering this code path. as u mentioned - HIVE-1846 will fix this - but currently, the getMapRedWork call invoked from different places in the stack should be satisfied from the an in-memory cache (and shouldn't hit the file). the file is not written out for local mode (because there's an assumption that everything runs in the same jvm). Unable to explain this - i think this is worth fixing because the plan file processing takes some time and it's better to retrieve it from memory where possible.
        Hide
        Bennie Schut added a comment -

        manage to cause this with parallel=false so perhaps not such an interesting angle

        I've added some more logging to better understand the cause of this:

        2010-12-07 15:49:44,697 INFO exec.Utilities (Utilities.java:getMapRedWork(154)) - Getting jobid:9c2eeba4-a602-4d4b-ba0b-60ce815c4ea7 from cache.
        2010-12-07 15:49:44,703 INFO lzo.GPLNativeCodeLoader (GPLNativeCodeLoader.java:<clinit>(34)) - Loaded native gpl library
        2010-12-07 15:49:44,705 INFO lzo.LzoCodec (LzoCodec.java:<clinit>(72)) - Successfully loaded & initialized native-lzo library [hadoop-lzo rev c7acdaa96a7ce04538c0716fe699ffaf11836c70]
        2010-12-07 15:49:44,712 INFO mapred.FileInputFormat (FileInputFormat.java:listStatus(192)) - Total input paths to process : 1
        2010-12-07 15:49:44,880 INFO exec.Utilities (Utilities.java:getMapRedWork(154)) - Getting jobid:e8b2dab2-986a-4bb1-947f-00aec5b46a06 from cache.
        2010-12-07 15:49:44,882 INFO exec.ExecDriver (SessionState.java:printInfo(268)) - Job running in-process (local Hadoop)
        2010-12-07 15:49:44,882 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(256)) - job_local_0001
        java.lang.RuntimeException: java.io.FileNotFoundException: HIVE_PLANe8b2dab2-986a-4bb1-947f-00aec5b46a06 (No such file or directory)
        at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:166)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:238)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:244)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:139)
        Caused by: java.io.FileNotFoundException: HIVE_PLANe8b2dab2-986a-4bb1-947f-00aec5b46a06 (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileInputStream.<init>(FileInputStream.java:66)
        at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:157)
        ... 3 more

        First thing I noticed which is different from a successful job would be that it's trying to get a different jobid from the cache "Getting jobid:e8b2dab2-986a-4bb1-947f-00aec5b46a06 from cache"
        I'm still confused.

        Show
        Bennie Schut added a comment - manage to cause this with parallel=false so perhaps not such an interesting angle I've added some more logging to better understand the cause of this: 2010-12-07 15:49:44,697 INFO exec.Utilities (Utilities.java:getMapRedWork(154)) - Getting jobid:9c2eeba4-a602-4d4b-ba0b-60ce815c4ea7 from cache. 2010-12-07 15:49:44,703 INFO lzo.GPLNativeCodeLoader (GPLNativeCodeLoader.java:<clinit>(34)) - Loaded native gpl library 2010-12-07 15:49:44,705 INFO lzo.LzoCodec (LzoCodec.java:<clinit>(72)) - Successfully loaded & initialized native-lzo library [hadoop-lzo rev c7acdaa96a7ce04538c0716fe699ffaf11836c70] 2010-12-07 15:49:44,712 INFO mapred.FileInputFormat (FileInputFormat.java:listStatus(192)) - Total input paths to process : 1 2010-12-07 15:49:44,880 INFO exec.Utilities (Utilities.java:getMapRedWork(154)) - Getting jobid:e8b2dab2-986a-4bb1-947f-00aec5b46a06 from cache. 2010-12-07 15:49:44,882 INFO exec.ExecDriver (SessionState.java:printInfo(268)) - Job running in-process (local Hadoop) 2010-12-07 15:49:44,882 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(256)) - job_local_0001 java.lang.RuntimeException: java.io.FileNotFoundException: HIVE_PLANe8b2dab2-986a-4bb1-947f-00aec5b46a06 (No such file or directory) at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:166) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:238) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:244) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:139) Caused by: java.io.FileNotFoundException: HIVE_PLANe8b2dab2-986a-4bb1-947f-00aec5b46a06 (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(FileInputStream.java:106) at java.io.FileInputStream.<init>(FileInputStream.java:66) at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:157) ... 3 more First thing I noticed which is different from a successful job would be that it's trying to get a different jobid from the cache "Getting jobid:e8b2dab2-986a-4bb1-947f-00aec5b46a06 from cache" I'm still confused.
        Hide
        Bennie Schut added a comment -

        The good news is I'm unable to reproduce this once I "set hive.exec.parallel=false". Perhaps it's cleaning uo some things during one of the steps while the other steps still need it?
        hive.exec.parallel works great for non local stuff for us but the combination of local and parallel seems like it has some issues.
        I was getting all kinds of different exceptions on local mode like the saxparser but also:

        org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out in any of the configured local directories
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
        at org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:193)

        and:
        java.io.FileNotFoundException: HIVE_PLAN1fe0a487-2e46-44e4-a557-9b3c9dddac07 (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileInputStream.<init>(FileInputStream.java:66)
        at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:155)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:238)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:244)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:139)

        I only get these issues when enabling parallel and local.
        At least an angle to look at.

        Show
        Bennie Schut added a comment - The good news is I'm unable to reproduce this once I "set hive.exec.parallel=false". Perhaps it's cleaning uo some things during one of the steps while the other steps still need it? hive.exec.parallel works great for non local stuff for us but the combination of local and parallel seems like it has some issues. I was getting all kinds of different exceptions on local mode like the saxparser but also: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) at org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:193) and: java.io.FileNotFoundException: HIVE_PLAN1fe0a487-2e46-44e4-a557-9b3c9dddac07 (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(FileInputStream.java:106) at java.io.FileInputStream.<init>(FileInputStream.java:66) at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:155) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:238) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:244) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:139) I only get these issues when enabling parallel and local. At least an angle to look at.
        Hide
        Bennie Schut added a comment -

        doesn't look like it actually got solved. Testing it a bit more.

        Show
        Bennie Schut added a comment - doesn't look like it actually got solved. Testing it a bit more.
        Hide
        Bennie Schut added a comment -

        We needed to close the OutputStream (or flush it) before another process can read the data. This fixed the SAXParserException we got.

        Show
        Bennie Schut added a comment - We needed to close the OutputStream (or flush it) before another process can read the data. This fixed the SAXParserException we got.
        Hide
        Bennie Schut added a comment -

        It's happening on 0.20.2

        Show
        Bennie Schut added a comment - It's happening on 0.20.2
        Hide
        Joydeep Sen Sarma added a comment -

        what version of hadoop is this happening against?

        Show
        Joydeep Sen Sarma added a comment - what version of hadoop is this happening against?
        Hide
        Bennie Schut added a comment -

        I've looked at the plan.xml files but don't see any issue with them. Any ideas on what could cause this?

        Show
        Bennie Schut added a comment - I've looked at the plan.xml files but don't see any issue with them. Any ideas on what could cause this?
        Hide
        Bennie Schut added a comment -

        plan files which produced the parse exception.

        Show
        Bennie Schut added a comment - plan files which produced the parse exception.
        Hide
        Bennie Schut added a comment -

        The query I used:
        INSERT OVERWRITE TABLE servers
        select * from (
        select *
        from servers_load
        union all
        select s.*
        from servers s
        left outer join servers_load l on (l.server_id = s.server_id)
        where l.server_id is null

        Full stacktrace:

        plan = file:/tmp/hadoop/hive_2010-09-30_10-41-39_345_c2bbcfc5-e1d2-4e33-b41f-244ef289b41f/-local-10005/plan.xml
        plan = file:/tmp/hadoop/hive_2010-09-30_10-41-39_345_c2bbcfc5-e1d2-4e33-b41f-244ef289b41f/-local-10006/plan.xml
        [Fatal Error] :181:44: XML document structures must start and end within the same entity.
        java.lang.RuntimeException: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1168)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1040)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:382)
        at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1662)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:215)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:93)
        at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:373)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:606)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1029)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
        Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.
        at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
        at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1092)
        ... 16 more
        Job Submission failed with exception 'java.lang.RuntimeException(org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.)'
        Job running in-process (local Hadoop)

        Show
        Bennie Schut added a comment - The query I used: INSERT OVERWRITE TABLE servers select * from ( select * from servers_load union all select s.* from servers s left outer join servers_load l on (l.server_id = s.server_id) where l.server_id is null Full stacktrace: plan = file:/tmp/hadoop/hive_2010-09-30_10-41-39_345_c2bbcfc5-e1d2-4e33-b41f-244ef289b41f/-local-10005/plan.xml plan = file:/tmp/hadoop/hive_2010-09-30_10-41-39_345_c2bbcfc5-e1d2-4e33-b41f-244ef289b41f/-local-10006/plan.xml [Fatal Error] :181:44: XML document structures must start and end within the same entity. java.lang.RuntimeException: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1168) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1040) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980) at org.apache.hadoop.conf.Configuration.get(Configuration.java:382) at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1662) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:215) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:93) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:373) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:800) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:606) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1029) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1092) ... 16 more Job Submission failed with exception 'java.lang.RuntimeException(org.xml.sax.SAXParseException: XML document structures must start and end within the same entity.)' Job running in-process (local Hadoop)

          People

          • Assignee:
            Bennie Schut
            Reporter:
            Bennie Schut
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development