Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-2988

Improve SSTableReader.load() when loading index files

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Low
    • Resolution: Fixed
    • 1.0.0, 1.1.0
    • None
    • None

    Description

      • when we create BufferredRandomAccessFile, we pass skipCache=true. This hurts the read performance because we always process the index files sequentially. Simple fix would be set it to false.
      • multiple index files of a single column family can be loaded in parallel. This buys a lot when you have multiple super large index files.
      • we may also change how we buffer. By using BufferredRandomAccessFile, for every read, we need bunch of checking like
      • do we need to rebuffer?
      • isEOF()?
      • assertions
        These can be simplified to some extent. We can blindly buffer the index file by chunks and process the buffer until a key lies across boundary of a chunk. Then we rebuffer and start from the beginning of the partially read key. Conceptually, this is same as what BRAF does but w/o the overhead in the read**() methods in BRAF.

      Attachments

        1. c2988-parallel-load-sstables.patch
          7 kB
          Michael Wu
        2. c2988-modified-buffer.patch
          8 kB
          Michael Wu
        3. 2988-parallel-v2.txt
          8 kB
          Jonathan Ellis
        4. c2988-2-v2
          7 kB
          Michael Wu
        5. 2988-2-cleaned.txt
          7 kB
          Jonathan Ellis
        6. 2988-2-v2.txt
          3 kB
          Jonathan Ellis

        Activity

          People

            mw Michael Wu
            mw Michael Wu
            Michael Wu
            Jonathan Ellis
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: