Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4744

Using RFile API with cache and multiple files hides data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.8.0, 1.8.1
    • 1.9.0
    • None

    Description

      Noticed this bug in source code while working on ACCUMULO-4641. When using the RFile API introduced in 1.8 to read from multiple files with cache enabled, not all data may be seen. This happens because internally the code gives all input sources the same cache id. Therefore index and data blocks from multiple files collide in the cache.

      This bug does not happen when reading data through tserver, only the RFile API.

        Scanner scanner =
             RFile.newScanner()
                 .from(file1, file2, file3)   //multiple input files
                 .withFileSystem(localFs)
                 .withIndexCache(1000000)   //enabled cache 
                 .withDataCache(10000000)  //enabled cache
                 .build();
      

      Attachments

        Issue Links

          Activity

            People

              kturner Keith Turner
              kturner Keith Turner
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m