Hive
  1. Hive
  2. HIVE-2988

Use of XMLEncoder to serialize MapredWork causes OOM in hive cli

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: CLI
    • Labels:

      Description

      When running queries on tables with 6000 partitions, hive cli if configured with 128M runs into OOM. Heapdump showed 37MB occupied by one XMLEncoder object while the MapredWork was 500K which is highly inefficient. We should switch to using something more efficient like XStream.

        Activity

        Hide
        Ashutosh Chauhan added a comment -

        I agree with Rohini Palaniswamy increasing heap size is not the ideal solution. We should switch to better implementation than XMLEncoder. I don't know much about XStream. Its license is BSD, so its compatible. Does it has better memory footprint or are there other libs which we should look at too?

        Show
        Ashutosh Chauhan added a comment - I agree with Rohini Palaniswamy increasing heap size is not the ideal solution. We should switch to better implementation than XMLEncoder. I don't know much about XStream. Its license is BSD, so its compatible. Does it has better memory footprint or are there other libs which we should look at too?
        Hide
        Rohini Palaniswamy added a comment -

        I ran with 128M to investigate the OOM. We have resorted to running with 1G as XmX because we keep hitting OOM with bigger tables in hive. There were other things that contributed to the memory usage - mostly Path objects because of the higher number of partitions. But they are absolutely needed. XMLEncoder is something that created too much garbage in a very short span and caused GC. That would be something easy to change/fix without having to touch the core logic.

        We should be looking at fixing the root cause of the problem instead of keeping on increasing the memory requirements. Ours is a highly multi-tenant system and there are lot of other programs(pig,etc) running too in the gateway. So running with a lower memory(256-512MB) will help.

        Found two other reports of this issue:
        http://mail-archives.apache.org/mod_mbox/hive-user/201106.mbox/%3CBANLkTik4THLNkxV87UygvqhoLri3UL9R3Q@mail.gmail.com%3E

        https://issues.apache.org/jira/browse/HIVE-1316

        • This fix increased the max heap size of CLI client and disabled GC overhead limit.
        Show
        Rohini Palaniswamy added a comment - I ran with 128M to investigate the OOM. We have resorted to running with 1G as XmX because we keep hitting OOM with bigger tables in hive. There were other things that contributed to the memory usage - mostly Path objects because of the higher number of partitions. But they are absolutely needed. XMLEncoder is something that created too much garbage in a very short span and caused GC. That would be something easy to change/fix without having to touch the core logic. We should be looking at fixing the root cause of the problem instead of keeping on increasing the memory requirements. Ours is a highly multi-tenant system and there are lot of other programs(pig,etc) running too in the gateway. So running with a lower memory(256-512MB) will help. Found two other reports of this issue: http://mail-archives.apache.org/mod_mbox/hive-user/201106.mbox/%3CBANLkTik4THLNkxV87UygvqhoLri3UL9R3Q@mail.gmail.com%3E https://issues.apache.org/jira/browse/HIVE-1316 This fix increased the max heap size of CLI client and disabled GC overhead limit.
        Hide
        Edward Capriolo added a comment -

        I think the patch is great, but the JDK heap default is 512MB in recent JDKs. I know 4K of ram job a man to the moon, but why are you running such a low default?

        Show
        Edward Capriolo added a comment - I think the patch is great, but the JDK heap default is 512MB in recent JDKs. I know 4K of ram job a man to the moon, but why are you running such a low default?
        Hide
        Ashutosh Chauhan added a comment -

        HIVE-2738 also reports problem with XMLEncoder, though looks unrelated.

        Show
        Ashutosh Chauhan added a comment - HIVE-2738 also reports problem with XMLEncoder, though looks unrelated.
        Hide
        Philip Tromans added a comment -

        This might not be related, but I've also seen an intermittent StackOverflowError (when Hive is serializing tasks at the beginning of a job) where most of the stack trace is within the XMLEncoder as well. Has anyone else had a problem with this?

        Show
        Philip Tromans added a comment - This might not be related, but I've also seen an intermittent StackOverflowError (when Hive is serializing tasks at the beginning of a job) where most of the stack trace is within the XMLEncoder as well. Has anyone else had a problem with this?

          People

          • Assignee:
            Unassigned
            Reporter:
            Rohini Palaniswamy
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development