Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24492

SharedCache not able to estimate size for null field of TableWrapper

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • HiveServer2
    • None

    Description

      The following message appears various times in the logs indicating an error on estimating the size of some field of TableWrapper:

      2020-12-04T15:54:18,551 ERROR [CachedStore-CacheUpdateService: Thread-266] cache.SharedCache: Not able to estimate size
      java.lang.NullPointerException: null
              at sun.reflect.UnsafeFieldAccessorImpl.ensureObj(UnsafeFieldAccessorImpl.java:57) ~[?:1.8.0_261]
              at sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.get(UnsafeQualifiedObjectFieldAccessorImpl.java:38) ~[?:1.8.0_261]
              at java.lang.reflect.Field.get(Field.java:393) ~[?:1.8.0_261]
              at org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator$ObjectEstimator.estimate(IncrementalObjectSizeEstimator.java:399) ~[hive-storage-api-2.7.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.util.IncrementalObjectSizeEstimator$ObjectEstimator.estimate(IncrementalObjectSizeEstimator.java:386) ~[hive-storage-api-2.7.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.metastore.cache.SharedCache$TableWrapper.getTableWrapperSizeWithoutMaps(SharedCache.java:348) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.metastore.cache.SharedCache$TableWrapper.<init>(SharedCache.java:321) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.metastore.cache.SharedCache.createTableWrapper(SharedCache.java:1893) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.metastore.cache.SharedCache.populateTableInCache(SharedCache.java:1754) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.metastore.cache.CachedStore.prewarm(CachedStore.java:577) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.metastore.cache.CachedStore.triggerPreWarm(CachedStore.java:161) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.metastore.cache.CachedStore.access$600(CachedStore.java:90) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at org.apache.hadoop.hive.metastore.cache.CachedStore$CacheUpdateMasterWork.run(CachedStore.java:767) [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_261]
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_261]
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_261]
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_261]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_261]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_261]
              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]

      The message appears many times when running the TPC-DS perf tests:

      mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver

      From the stack trace it seems that we cannot estimate the size of a field cause it is null.

      If the value of a field is null then we shouldn't attempt to estimate the size since it will always lead to a NPE. Furthermore, there is no need to estimate and we can simply count it as zero.

      Looking a bit deeper in this use-case the field which causes the NPE is TableWrapper#location which comes from the storage descriptor (SDS table in metastore). So should this field be null in the first place?

      The content of the metastore shows that this happens for technical tables such as version, schemata, tables, table_privileges, etc:

      version                   | 
       db_version                | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/db_version
       funcs                     | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/funcs
       key_constraints           | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/key_constraints
       table_stats_view          | 
       columns                   | 
       web_site                  | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_30000.db/web_site
       inventory_i               | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_30000.db/inventory_i
       partition_stats_view      | 
       wm_resourceplans          | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_resourceplans
       wm_triggers               | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_triggers
       wm_pools                  | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_pools
       wm_pools_to_triggers      | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_pools_to_triggers
       wm_mappings               | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/wm_mappings
       scheduled_queries         | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/scheduled_queries
       scheduled_executions      | hdfs://localhost:40889/clusters/env-6cwwgq/warehouse-1580339123-xdmn/warehouse/tablespace/external/hive/sys.db/scheduled_executions
       schemata                  | 
       tables                    | 
       table_privileges          | 
       column_privileges         | 
       views                     | 
       scheduled_queries         | 
      

       but I didn't investigate how we can end up with this situation.

       

      Attachments

        Activity

          People

            zabetak Stamatis Zampetakis
            zabetak Stamatis Zampetakis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: