Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4744

Using RFile API with cache and multiple files hides data

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.8.0, 1.8.1
    • Fix Version/s: 1.9.0
    • Component/s: None

      Description

      Noticed this bug in source code while working on ACCUMULO-4641. When using the RFile API introduced in 1.8 to read from multiple files with cache enabled, not all data may be seen. This happens because internally the code gives all input sources the same cache id. Therefore index and data blocks from multiple files collide in the cache.

      This bug does not happen when reading data through tserver, only the RFile API.

        Scanner scanner =
             RFile.newScanner()
                 .from(file1, file2, file3)   //multiple input files
                 .withFileSystem(localFs)
                 .withIndexCache(1000000)   //enabled cache 
                 .withDataCache(10000000)  //enabled cache
                 .build();
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kturner Keith Turner
                Reporter:
                kturner Keith Turner
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m