Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21028

get_table_meta should use a fetch plan to avoid race conditions ending up in NucleusObjectNotFoundException

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The getTableMeta call retrieves the tables, loops through the tables and during this loop it retrieves the database object to get the containing database name. DataNuclues does a lazy retrieval and so, when the first call to get all the tables is done, it does not retrieve the database objects.

      When this query is executed

      query = pm.newQuery(MTable.class, filterBuilder.toString());
      

      it loads all the tables, and when you do

      table.getDatabase().getName()
      

      it then goes and retrieves the database object.

      However, there could be another thread which actually has deleted the database!! If this happens, we end up with exceptions such as

      2018-12-04 22:25:06,525 INFO  DataNucleus.Datastore.Retrieve: [pool-7-thread-191]: Object with id "6930391[OID]org.apache.hadoop.hive.metastore.model.MTable" not found !
      2018-12-04 22:25:06,527 WARN  DataNucleus.Persistence: [pool-7-thread-191]: Exception thrown by StateManager.isLoaded
      No such database row
      org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row
      

      We see this happen especially with calls which retrieve all the tables in all the databases (basically a call to get_table_meta with dbNames="*" and tableNames="*").

      To avoid this, we can define a custom fetch plan and activate it only for the get_table_meta query. This fetch plan would fetch the database object along with the MTable object.

      We would first create a fetch plan on the pmf

      pmf.getFetchGroup(MTable.class, "mtable_db_fetch_group").addMember("database");
      

      Then we use it just before calling the query

      pm.getFetchPlan().addGroup("mtable_db_fetch_group");
      query = pm.newQuery(MTable.class, filterBuilder.toString());
      Collection<MTable> tables = (Collection<MTable>) query.executeWithArray(...);
      ...
      

      Before the API call ends, we can remove the fetch plan by

      pm.getFetchPlan().removeGroup("mtable_db_fetch_group");
      

      Attachments

        1. HIVE-21028.branch-3.patch
          23 kB
          Karthik Manamcheri
        2. HIVE-21028.5.patch
          22 kB
          Karthik Manamcheri
        3. HIVE-21028.4.patch
          22 kB
          Karthik Manamcheri
        4. HIVE-21028.3.patch
          22 kB
          Karthik Manamcheri
        5. HIVE-21028.2.patch
          22 kB
          Karthik Manamcheri
        6. HIVE-21028.1.patch
          22 kB
          Karthik Manamcheri

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            karthik.manamcheri Karthik Manamcheri Assign to me
            karthik.manamcheri Karthik Manamcheri
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment