Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21028

get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

    XMLWordPrintableJSON

Details

    Description

      The getTableMeta call retrieves the tables, loops through the tables and during this loop it retrieves the database object to get the containing database name. DataNuclues does a lazy retrieval and so, when the first call to get all the tables is done, it does not retrieve the database objects.

      When this query is executed

      query = pm.newQuery(MTable.class, filterBuilder.toString());
      

      it loads all the tables, and when you do

      table.getDatabase().getName()
      

      it then goes and retrieves the database object.

      However, there could be another thread which actually has deleted the database!! If this happens, we end up with exceptions such as

      2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: [pool-7-thread-191]: Object with id "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
      2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: Exception thrown by StateManager.isLoaded
      No such database row
      org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row
      

      We see this happen especially with calls which retrieve all the tables in all the databases (basically a call to get_table_meta with dbNames="*" and tableNames="*").

      To avoid this, we can define a custom fetch plan and activate it only for the get_table_meta query. This fetch plan would fetch the database object along with the MTable object.

      We would first create a fetch plan on the pmf

      pmf.getFetchGroup(MTable.class, "mtable_db_fetch_group").addMember("database");
      

      Then we use it just before calling the query

      pm.getFetchPlan().addGroup("mtable_db_fetch_group");
      query = pm.newQuery(MTable.class, filterBuilder.toString());
      Collection<MTable> tables = (Collection<MTable>) query.executeWithArray(...);
      ...
      

      Before the API call ends, we can remove the fetch plan by

      pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
      

      Attachments

        1. HIVE-21028.1.patch
          22 kB
          Karthik Manamcheri
        2. HIVE-21028.2.patch
          22 kB
          Karthik Manamcheri
        3. HIVE-21028.3.patch
          22 kB
          Karthik Manamcheri
        4. HIVE-21028.4.patch
          22 kB
          Karthik Manamcheri
        5. HIVE-21028.5.patch
          22 kB
          Karthik Manamcheri
        6. HIVE-21028.branch-3.patch
          23 kB
          Karthik Manamcheri

        Issue Links

          Activity

            People

              karthik.manamcheri Karthik Manamcheri
              karthik.manamcheri Karthik Manamcheri
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: