Accumulo
  1. Accumulo
  2. ACCUMULO-665

large values, complex iterator stacks, and RFile readers can consume a surprising amount of memory

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.4.1
    • Component/s: tserver
    • Labels:
      None
    • Environment:

      large cluster

      Description

      On a production cluster, with a complex iterator tree, a large value (~350M) was causing a 4G tserver to fail with out-of-memory.

      There were several factors contributing to the problem:

      1. a bug: the query should not have been looking to the big data
      2. complex iterator tree, causing many copies of the data to be held at the same time
      3. RFile doubles the buffer it uses to load values, and continues to use that large buffer for future values

      This ticket is for the last point. If we know we're not even going to look at the value, we can read past it without storing it in memory. It is surprising that skipping past a large value would cause the server to run out of memory, especially since it should fit into memory enough times to be returned to the caller.

      The provided iterators inside core/org/apache/accumulo/iterators should be revisited to ensure that they properly set the seekColumnFamilies where necessary, specifically the IntersectingIterator.

        Activity

        Hide
        Josh Elser added a comment -

        The big area that this can present itself is via the seekColumnFamilies on the SortedKeyValueIterator#seek() method. In the IntersectingIterator inside the core module, you could run into "excessive" memory usage.

        Say you're intersecting over the two terms "foo" and "bar" and that you have some column "documents" which is directly before the term "foo". Assume the keys in the "documents" column are in their own locality group and have very large Values associated with them. The IntersectingIterator only uses any seek-column-families that are passed in but does not set any itself. Meaning, even though the "documents" column is in its own section of the RFile, by not specifically setting the seek column families for each term to just the term itself, the underlying Accumulo code will still open up all locality groups (the one for "documents" and the default locality group).

        The javadoc for SortedKeyValueIterator should also be updated to inform other users of the implications that (not) setting the seekColumnFamilies has.

        Show
        Josh Elser added a comment - The big area that this can present itself is via the seekColumnFamilies on the SortedKeyValueIterator#seek() method. In the IntersectingIterator inside the core module, you could run into "excessive" memory usage. Say you're intersecting over the two terms "foo" and "bar" and that you have some column "documents" which is directly before the term "foo". Assume the keys in the "documents" column are in their own locality group and have very large Values associated with them. The IntersectingIterator only uses any seek-column-families that are passed in but does not set any itself. Meaning, even though the "documents" column is in its own section of the RFile, by not specifically setting the seek column families for each term to just the term itself, the underlying Accumulo code will still open up all locality groups (the one for "documents" and the default locality group). The javadoc for SortedKeyValueIterator should also be updated to inform other users of the implications that (not) setting the seekColumnFamilies has.
        Hide
        Josh Elser added a comment -

        Taking a look at the OrIterator and IntersectingIterator, they are subject to the same faults this ticket describes. The attached patch corrects the usage of the columnFamilies passed to the seek() method, and makes the appropriate changes to the IndexDocIterator which extends the IntersectingIterator.

        Show
        Josh Elser added a comment - Taking a look at the OrIterator and IntersectingIterator, they are subject to the same faults this ticket describes. The attached patch corrects the usage of the columnFamilies passed to the seek() method, and makes the appropriate changes to the IndexDocIterator which extends the IntersectingIterator.
        Hide
        Josh Elser added a comment -

        Update to SortedKeyValueIterator#seek javadoc. Changes to IntersectingIterator, OrIterator, and IndexedDocIterator to avoid confusion about the columnFamilies argument to the seek() method.

        Show
        Josh Elser added a comment - Update to SortedKeyValueIterator#seek javadoc. Changes to IntersectingIterator, OrIterator, and IndexedDocIterator to avoid confusion about the columnFamilies argument to the seek() method.
        Hide
        Eric Newton added a comment -

        Josh, Isn't there something that needs to change in the AndIterator, in the wikisearch example, too?

        Show
        Eric Newton added a comment - Josh, Isn't there something that needs to change in the AndIterator, in the wikisearch example, too?
        Hide
        Billie Rinaldi added a comment -

        Was this fixed in 1.4.1, or should we change the fix version of this ticket?

        Show
        Billie Rinaldi added a comment - Was this fixed in 1.4.1, or should we change the fix version of this ticket?
        Hide
        Josh Elser added a comment -

        In regards to Eric's comment, it looks like the AndIterator in the wikisearch code is correct (r1359639).

        Billie, as far as I can tell, this ticket is complete. I'll flip the version back to 1.4.1 and close it. We can re-open/rev-bump if we determine that it's not the case.

        Show
        Josh Elser added a comment - In regards to Eric's comment, it looks like the AndIterator in the wikisearch code is correct (r1359639). Billie, as far as I can tell, this ticket is complete. I'll flip the version back to 1.4.1 and close it. We can re-open/rev-bump if we determine that it's not the case.
        Hide
        Josh Elser added a comment -

        Patch applied by Eric.

        Show
        Josh Elser added a comment - Patch applied by Eric.

          People

          • Assignee:
            Eric Newton
            Reporter:
            Eric Newton
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development