Hive
  1. Hive
  2. HIVE-3253

ArrayIndexOutOfBounds exception for deeply nested structs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0, 0.10.0
    • Fix Version/s: 0.12.0
    • Labels:
      None
    • Release Note:
      Hide
      This change increases the number of levels of nesting supported in hive select queries. The limitation in the serialization format used by File Output Operator for these queries (LazySimpleSerde) was restricted the number of levels of nesting to 8 earlier, this has now been extended to 24. This extended levels of nesting is turned on by default.

      This change also improves the number of levels of nesting that you can use with tables that use LazySimpleSerde. It uses additional control charactors as delimiters. This means that your data should not have these charactors or you need to escape these charactors. As this change introduces a new requirement for the way data has been written, this is not backward compatible. Hence this is not enabled by default. To enabled this, you need to set the serde property hive.serialization.extend.nesting.levels to true.

      Look at 'ESCAPED BY' documentation for create-table, to learn how to enable escaping of the delimiter charactors. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
      Show
      This change increases the number of levels of nesting supported in hive select queries. The limitation in the serialization format used by File Output Operator for these queries (LazySimpleSerde) was restricted the number of levels of nesting to 8 earlier, this has now been extended to 24. This extended levels of nesting is turned on by default. This change also improves the number of levels of nesting that you can use with tables that use LazySimpleSerde. It uses additional control charactors as delimiters. This means that your data should not have these charactors or you need to escape these charactors. As this change introduces a new requirement for the way data has been written, this is not backward compatible. Hence this is not enabled by default. To enabled this, you need to set the serde property hive.serialization.extend.nesting.levels to true. Look at 'ESCAPED BY' documentation for create-table, to learn how to enable escaping of the delimiter charactors. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

      Description

      It was observed that while creating table with deeply nested structs might throw this exception:

      java.lang.ArrayIndexOutOfBoundsException: 9
              at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281)
      	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
      	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
      	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
      	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
      	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
      	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
      	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
      	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
      	at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354)
      

      The reason being that currently the separators array has been hardcoded to be of size 8 in the LazySimpleSerde.

      // Read the separators: We use 8 levels of separators by default, but we
      // should change this when we allow users to specify more than 10 levels
      // of separators through DDL.
      serdeParams.separators = new byte[8];
      

      If possible, we should increase this size or at least make it configurable to properly handle deeply nested structs.

      1. jsonout.hive
        16 kB
        Chuck Connell
      2. HIVE-3253.3.patch
        158 kB
        Thejas M Nair
      3. HIVE-3253.2.patch
        139 kB
        Thejas M Nair
      4. HIVE-3253_moar_nesting.1.patch
        0.9 kB
        Travis Crawford

        Issue Links

          Activity

          Aihua Xu made changes -
          Link This issue is related to HIVE-9500 [ HIVE-9500 ]
          Ashutosh Chauhan made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Thejas M Nair made changes -
          Release Note This change increases the number of levels of nesting supported in hive select queries. The limitation in the serialization format used by File Output Operator for these queries (LazySimpleSerde) was restricted the number of levels of nesting to 8 earlier, this has now been extended to 24. This extended levels of nesting is turned on by default.

          This change also improves the number of levels of nesting that you can use with tables that use LazySimpleSerde. It uses additional control charactors as delimiters. This means that your data should not have these charactors or you need to escape these charactors. As this change introduces a new requirement for the way data has been written, this is not backward compatible. Hence this is not enabled by default. To enabled this, you need to set the serde property hive.serialization.extend.nesting.levels to true.

          Look at 'ESCAPED BY' documentation for create-table, to learn how to enable escaping of the delimiter charactors. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
          Ashutosh Chauhan made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.12.0 [ 12324312 ]
          Resolution Fixed [ 1 ]
          Thejas M Nair made changes -
          Attachment HIVE-3253.3.patch [ 12590506 ]
          Brock Noland made changes -
          Link This issue is duplicated by HIVE-4571 [ HIVE-4571 ]
          Thejas M Nair made changes -
          Assignee Travis Crawford [ traviscrawford ] Thejas M Nair [ thejas ]
          Thejas M Nair made changes -
          Attachment HIVE-3253.2.patch [ 12587513 ]
          Chuck Connell made changes -
          Attachment jsonout.hive [ 12548436 ]
          Travis Crawford made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 0.10.0 [ 12320745 ]
          Travis Crawford made changes -
          Attachment HIVE-3253_moar_nesting.1.patch [ 12538485 ]
          Travis Crawford made changes -
          Assignee Travis Crawford [ traviscrawford ]
          Swarnim Kulkarni made changes -
          Description It was observed that while creating table with deeply nested structs might throw this exception:

          {code}
          java.lang.ArrayIndexOutOfBoundsException: 9
                  at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354)
          {code}

          The reason being that currently the separators array has been hardcoded to be of size 8 in the LazySimpleSerde.

          {code}
          // Read the separators: We use 8 levels of separators by default, but we
          // should change this when we allow users to specify more than 10 levels
          // of separators through DDL.
          serdeParams.separators = new byte[8];
          {code}

          We possible, we should increase this size or at least make it configurable to properly handle deeply nested structs.
          It was observed that while creating table with deeply nested structs might throw this exception:

          {code}
          java.lang.ArrayIndexOutOfBoundsException: 9
                  at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354)
          {code}

          The reason being that currently the separators array has been hardcoded to be of size 8 in the LazySimpleSerde.

          {code}
          // Read the separators: We use 8 levels of separators by default, but we
          // should change this when we allow users to specify more than 10 levels
          // of separators through DDL.
          serdeParams.separators = new byte[8];
          {code}

          If possible, we should increase this size or at least make it configurable to properly handle deeply nested structs.
          Swarnim Kulkarni made changes -
          Description It was observed that while creating table with deeply nested structs might throw this exception:

          {code}
          java.lang.ArrayIndexOutOfBoundsException: 9
                  at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354)
          {code}

          The reason being that currently the separators array has been hardcoded to be of size 8 in the LazySimpleSerde.

          // Read the separators: We use 8 levels of separators by default, but we
          // should change this when we allow users to specify more than 10 levels
          // of separators through DDL.
          serdeParams.separators = new byte[8];
          {code}

          We possible, we should increase this size or at least make it configurable to properly handle deeply nested structs.
          It was observed that while creating table with deeply nested structs might throw this exception:

          {code}
          java.lang.ArrayIndexOutOfBoundsException: 9
                  at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354)
          {code}

          The reason being that currently the separators array has been hardcoded to be of size 8 in the LazySimpleSerde.

          {code}
          // Read the separators: We use 8 levels of separators by default, but we
          // should change this when we allow users to specify more than 10 levels
          // of separators through DDL.
          serdeParams.separators = new byte[8];
          {code}

          We possible, we should increase this size or at least make it configurable to properly handle deeply nested structs.
          Swarnim Kulkarni made changes -
          Field Original Value New Value
          Description It was observed that while creating table with deeply nested structs might throw this exception:

          {code}
          java.lang.ArrayIndexOutOfBoundsException: 9
                  at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354)


          The reason being that currently the separators array has been hardcoded to be of size 8 in the LazySimpleSerde.

          {code}
          // Read the separators: We use 8 levels of separators by default, but we
          // should change this when we allow users to specify more than 10 levels
          // of separators through DDL.
          serdeParams.separators = new byte[8];
          {code}

          We possible, we should increase this size or at least make it configurable to properly handle deeply nested structs.
          It was observed that while creating table with deeply nested structs might throw this exception:

          {code}
          java.lang.ArrayIndexOutOfBoundsException: 9
                  at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:281)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:263)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyObjectInspector(LazyFactory.java:276)
          at org.apache.hadoop.hive.serde2.lazy.LazyFactory.createLazyStructInspector(LazyFactory.java:354)
          {code}

          The reason being that currently the separators array has been hardcoded to be of size 8 in the LazySimpleSerde.

          // Read the separators: We use 8 levels of separators by default, but we
          // should change this when we allow users to specify more than 10 levels
          // of separators through DDL.
          serdeParams.separators = new byte[8];
          {code}

          We possible, we should increase this size or at least make it configurable to properly handle deeply nested structs.
          Swarnim Kulkarni created issue -

            People

            • Assignee:
              Thejas M Nair
              Reporter:
              Swarnim Kulkarni
            • Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development