Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4902

Concurrent DDL may fail with a ConcurrentModificationException

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Catalog
    • Labels:

      Description

      Concurrent DDL operations may sometimes fail with a ConcurrentModificationException.
      Those commands which modify the tblproperties map of a table or partition are affected, for example:

      • COMPUTE STATS
      • COMPUTE INCREMENTAL STATS
      • DROP STATS
      • ALTER TABLE SET TBPROPERTIES
      • ALTER TABLE SET CACHED
      • Possibly others

      The logs will contain a stack trace like the one below where the exception is thrown in thrift-generated code while iterating over a HashMap.

      Original Description
      Recently, the stress test was modified to include COMPUTE STATS statements with a variety of MT_DOP options.

      Occasionally those fail with:

      I0208 03:06:26.634202 67490 jni-util.cc:169] java.util.ConcurrentModificationException
              at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
              at java.util.HashMap$EntryIterator.next(HashMap.java:962)
              at java.util.HashMap$EntryIterator.next(HashMap.java:960)
              at org.apache.impala.thrift.THdfsPartition$THdfsPartitionStandardScheme.write(THdfsPartition.java:1832)
              at org.apache.impala.thrift.THdfsPartition$THdfsPartitionStandardScheme.write(THdfsPartition.java:1546)
              at org.apache.impala.thrift.THdfsPartition.write(THdfsPartition.java:1389)
              at org.apache.impala.thrift.THdfsTable$THdfsTableStandardScheme.write(THdfsTable.java:1243)
              at org.apache.impala.thrift.THdfsTable$THdfsTableStandardScheme.write(THdfsTable.java:1071)
              at org.apache.impala.thrift.THdfsTable.write(THdfsTable.java:940)
              at org.apache.impala.thrift.TTable$TTableStandardScheme.write(TTable.java:1628)
              at org.apache.impala.thrift.TTable$TTableStandardScheme.write(TTable.java:1399)
              at org.apache.impala.thrift.TTable.write(TTable.java:1208)
              at org.apache.impala.thrift.TCatalogObject$TCatalogObjectStandardScheme.write(TCatalogObject.java:1241)
              at org.apache.impala.thrift.TCatalogObject$TCatalogObjectStandardScheme.write(TCatalogObject.java:1098)
              at org.apache.impala.thrift.TCatalogObject.write(TCatalogObject.java:938)
              at org.apache.impala.thrift.TGetAllCatalogObjectsResponse$TGetAllCatalogObjectsResponseStandardScheme.write(TGetAllCatalogObjectsResponse.java:487)
              at org.apache.impala.thrift.TGetAllCatalogObjectsResponse$TGetAllCatalogObjectsResponseStandardScheme.write(TGetAllCatalogObjectsResponse.java:421)
              at org.apache.impala.thrift.TGetAllCatalogObjectsResponse.write(TGetAllCatalogObjectsResponse.java:365)
              at org.apache.thrift.TSerializer.serialize(TSerializer.java:79)
              at org.apache.impala.service.JniCatalog.getCatalogObjects(JniCatalog.java:124)
      I0208 03:06:26.653929 67490 status.cc:114] ConcurrentModificationException: null
          @          0x11c5241  (unknown)
          @          0x1597bfe  (unknown)
          @          0x11b50b3  (unknown)
          @          0x1177fbb  (unknown)
          @          0x1192c33  (unknown)
          @          0x1191ba2  (unknown)
          @          0x1190ed9  (unknown)
          @          0x118fedb  (unknown)
          @          0x13430ce  (unknown)
          @          0x15e804b  (unknown)
          @          0x15ef024  (unknown)
          @          0x15eef67  (unknown)
          @          0x15eeec2  (unknown)
          @          0x1a5661a  (unknown)
          @       0x3b81a079d1  (unknown)
          @       0x3b816e88fd  (unknown)
      E0208 03:06:26.654156 67490 catalog-server.cc:282] ConcurrentModificationException: null
      

      (To quantify occasionally, last night's run had 3 such failures out of 2031 COMPUTE STATS statements, out of 10,000 total statements.)

      I haven't yet been able to figure out if there is some MT_DOP setting that makes this more likely to happen, or if we need concurrent COMPUTE STATS statements running concurrently, or what.

        Issue Links

          Activity

          Hide
          mikesbrown Michael Brown added a comment -

          To make this easier the next time, I filed IMPALA-4903.

          Show
          mikesbrown Michael Brown added a comment - To make this easier the next time, I filed IMPALA-4903 .
          Hide
          alex.behm Alexander Behm added a comment -

          commit a71636847fe742a9d0eb770516aff34ff16bbca1
          Author: Alex Behm <alex.behm@cloudera.com>
          Date: Fri Feb 17 10:00:55 2017 -0800

          IMPALA-4902: Copy parameters map in HdfsPartition.toThrift().

          The bug: When generating the toThrift() of an HdfsTable,
          each THdfsPartition used to contain a reference to its
          partition's parameters map. As a result, one thread trying
          to serialize a thrift table returned by toThrift() could
          conflict with another thread updating the parameters maps of
          the table partitions. Here are a few examples of operations
          that may modify the parameters map:
          COMPUTE [INCREMENTAL] STATS, DROP STATS,
          ALTER TABLE SET TBLPROPERTIES, ALTER TABLE SET CACHED, etc.

          The fix: Create a shallow copy of the parameters map in
          HdfsPartition.toThrift(). This means that toThrift() itself
          must be protected from concurrent modifications to the
          parameters map. Callers of toThrift() are now required
          to hold the table lock. One place where the lock was not
          already held needed to be adjusted.

          Testing:

          • I was unable to reproduce the issue locally, but the stacks
            from the JIRAs point directly to the parameters map, and
            the races are pretty obvious from looking at the code.
          • Passed a core/hdfs private run.

          Change-Id: Ic11277ad5512d2431cd3cc791715917c95395ddf
          Reviewed-on: http://gerrit.cloudera.org:8080/6127
          Reviewed-by: Alex Behm <alex.behm@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          alex.behm Alexander Behm added a comment - commit a71636847fe742a9d0eb770516aff34ff16bbca1 Author: Alex Behm <alex.behm@cloudera.com> Date: Fri Feb 17 10:00:55 2017 -0800 IMPALA-4902 : Copy parameters map in HdfsPartition.toThrift(). The bug: When generating the toThrift() of an HdfsTable, each THdfsPartition used to contain a reference to its partition's parameters map. As a result, one thread trying to serialize a thrift table returned by toThrift() could conflict with another thread updating the parameters maps of the table partitions. Here are a few examples of operations that may modify the parameters map: COMPUTE [INCREMENTAL] STATS, DROP STATS, ALTER TABLE SET TBLPROPERTIES, ALTER TABLE SET CACHED, etc. The fix: Create a shallow copy of the parameters map in HdfsPartition.toThrift(). This means that toThrift() itself must be protected from concurrent modifications to the parameters map. Callers of toThrift() are now required to hold the table lock. One place where the lock was not already held needed to be adjusted. Testing: I was unable to reproduce the issue locally, but the stacks from the JIRAs point directly to the parameters map, and the races are pretty obvious from looking at the code. Passed a core/hdfs private run. Change-Id: Ic11277ad5512d2431cd3cc791715917c95395ddf Reviewed-on: http://gerrit.cloudera.org:8080/6127 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              mikesbrown Michael Brown
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development