Uploaded image for project: 'Apache Apex Malhar'
  1. Apache Apex Malhar
  2. APEXMALHAR-2197

TimeBasedPriorityQueue.removeLRU throws NoSuchElementException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.5.0
    • None
    • None

    Description

      When there are lots of keys and large throughput, the following exception will be throw.
      java.util.NoSuchElementException
      at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1113)
      at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
      at org.apache.apex.malhar.lib.state.spillable.TimeBasedPriorityQueue.removeLRU(TimeBasedPriorityQueue.java:75)
      at org.apache.apex.malhar.lib.state.spillable.WindowBoundedMapCache.endWindow(WindowBoundedMapCache.java:119)
      at org.apache.apex.malhar.lib.state.spillable.SpillableByteArrayListMultimapImpl.endWindow(SpillableByteArrayListMultimapImpl.java:281)
      at com.datatorrent.benchmark.spillable.SpillableDSBenchmarkTest.testSpillableMutimap(SpillableDSBenchmarkTest.java:139)

      The reason is as following:
      In TimeBasedPriorityQueue.TimeWrapper<T>, when implement compareTo(), only compare by time. While in equals(), compare both time and key.

      Generally, "compareTo() == 0” means “equals() == true”. But in TimeBasedPriorityQueue, compareTo() was used by “sortedTimestamp” and it would be OK only compare by time.
      But it would cause problem in removeLRU() if lots of key written in same time. In this case, one time(even different key) is one entry of sortedTimestamp, but the counter and count was calculated by key.
      So the counter could be much larger than the number of entry of sortedTimestamp. And NoSuchElementException would throw when reach the end of the iterator.

      Solutions:
      Add key as the less important factor could solve this problem. But probably need to limit key to support compareTo()

      Another idea I think also work is use two maps, the benefit is don’t need to compare by key( only compare by time)

      • timeToKeys: multiple list map from time to keys
      • keyToRecentTime: key to most recent time
        implementation:
      • upSert(T key): get time from keyToRecentTime map; if don’t have time, put (key, time) to both keyToRecentTime and timeToKeys; if have time and time is different, remove key from timeToKeys, update keyToRecentTime
      • removeLRU(int count): get time sortedTimestamp, and then get and remove keys from timeToKeys, and remove entry from keyToRecentTime.

      Another quick solution is also add key as the less important factor of compareTo(), but instead of add a limitation to the key( key should comparable), use equals() to compare the key( we care about equality of key, but in fact don't care the order of the key).

      Attachments

        Issue Links

          Activity

            People

              brightchen Bright Chen
              brightchen Bright Chen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 120h
                  120h
                  Remaining:
                  Remaining Estimate - 120h
                  120h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified