Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-1264

Feed didn't release lock if the ingesting hit some exceptions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Cannot Reproduce
    • None
    • None
    • ING - Ingestion

    Description

      This is a discussed issue in the mailing list. I copy it here to make it more tractable and shareable.

      I hit an wield issue that is reproducible, but only if the data has duplications and also is large enough. Let me explained it step by step:

      1. The dataset is very simple that only has two fields.
      DDL AQL:

      drop dataverse test if exists;
      create dataverse test;
      use dataverse test;
      
      create type t_test as closed{
        fa: int64,
        fb : int64
      }
      
      create dataset ds_test(t_test) primary key fa;
      
      create feed fd_test using socket_adapter
      (
          ("sockets"="nc1:10001"),
          ("address-type"="nc"),
          ("type-name"="t_test"),
          ("format"="adm"),
          ("duration"="1200")
      );
      
      set wait-for-completion-feed "false";
      connect feed fd_test to dataset ds_test using policy AdvancedFT_Discard;
      

      ——————————————————————————————

      That AdvancedFT_Discard policy will ignore the exception from the insertion and keep ingesting.

      2. Ingesting the data by a very simple socked adapter which reads the record one by one from an adm file. The src is here:https://github.com/JavierJia/twitter-tracker/blob/master/src/main/java/edu/uci/ics/twitter/asterix/feed/FileFeedSocketAdapterClient.java
      The data and the app package is provided here: https://drive.google.com/folderview?id=0B423M7wGZj9dYVQ1TkpBNzcwSlE&usp=sharing
      To feed the data you can run:

      ./bin/feedFile -u 172.17.0.2 -p 10001 -c 5000000 ~/data/twitter/test.adm

      -u for sever url
      -p for server port
      -c for count of line you want to ingest

      3. After ingestion, all the requests about the ds_test was hanging. There is no exception and no responds for hours. However it can respond any other queries that on other datasets, like Metadata.

      That data contains some duplicated records which should trigger the insert exception. If I change the count from 5000000 to lower, let’s say 3000000, it has no problems, although it contains duplications as well.

      Answer from amoudi :
      I know exactly what is going on here. The problem is you pointed out is
      caused by the duplicate keys. If I remember correctly, the main issue is
      that locks that are placed on the primary keys are not released.

      Attachments

        1. nc.log
          22 kB
          Jianfeng Jia
        2. cc.log
          50 kB
          Jianfeng Jia
        3. nc.stack
          228 kB
          Jianfeng Jia

        Activity

          People

            javierjia Jianfeng Jia
            javierjia Jianfeng Jia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: