Cassandra
  1. Cassandra
  2. CASSANDRA-3003

Trunk single-pass streaming doesn't handle large row correctly

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Fix Version/s: 1.0.0
    • Component/s: Core
    • Labels:

      Description

      For normal column family, trunk streaming always buffer the whole row into memory. In uses

        ColumnFamily.serializer().deserializeColumns(in, cf, true, true);
      

      on the input bytes.
      We must avoid this for rows that don't fit in the inMemoryLimit.

      Note that for regular column families, for a given row, there is actually no need to even recreate the bloom filter of column index, nor to deserialize the columns. It is enough to filter the key and row size to feed the index writer, but then simply dump the rest on disk directly. This would make streaming more efficient, avoid a lot of object creation and avoid the pitfall of big rows.

      Counters column family are unfortunately trickier, because each column needs to be deserialized (to mark them as 'fromRemote'). However, we don't need to do the double pass of LazilyCompactedRow for that. We can simply use a SSTableIdentityIterator and deserialize/reserialize input as it comes.

      1. v3003-v4.txt
        11 kB
        Yuki Morishita
      2. ASF.LICENSE.NOT.GRANTED--3003-v2.txt
        10 kB
        Yuki Morishita
      3. ASF.LICENSE.NOT.GRANTED--3003-v1.txt
        9 kB
        Yuki Morishita
      4. 3003-v5.txt
        11 kB
        Yuki Morishita
      5. 3003-v3.txt
        11 kB
        Yuki Morishita

        Activity

        Sylvain Lebresne created issue -
        Sylvain Lebresne made changes -
        Field Original Value New Value
        Priority Major [ 3 ] Critical [ 2 ]
        Yuki Morishita made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Jonathan Ellis made changes -
        Fix Version/s 1.0 [ 12316349 ]
        Affects Version/s 1.0 [ 12316349 ]
        Yuki Morishita made changes -
        Attachment 3003-v1.txt [ 12491091 ]
        Yuki Morishita made changes -
        Attachment mylyn-context.zip [ 12491092 ]
        Yuki Morishita made changes -
        Attachment mylyn-context.zip [ 12491092 ]
        Yuki Morishita made changes -
        Status In Progress [ 3 ] Open [ 1 ]
        Yuki Morishita made changes -
        Attachment 3003-v2.txt [ 12491798 ]
        Yuki Morishita made changes -
        Attachment mylyn-context.zip [ 12491799 ]
        Yuki Morishita made changes -
        Attachment mylyn-context.zip [ 12491799 ]
        Yuki Morishita made changes -
        Attachment 3003-v3.txt [ 12492255 ]
        Jonathan Ellis made changes -
        Reviewer slebresne
        Yuki Morishita made changes -
        Attachment v3003-v4.txt [ 12492476 ]
        Yuki Morishita made changes -
        Attachment 3003-v5.txt [ 12492974 ]
        Sylvain Lebresne made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12626538 ] patch-available, re-open possible [ 12752939 ]
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12752939 ] reopen-resolved, no closed status, patch-avail, testing [ 12758544 ]

          People

          • Assignee:
            Yuki Morishita
            Reporter:
            Sylvain Lebresne
            Reviewer:
            Sylvain Lebresne
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development