Uploaded image for project: 'James Server'
  1. James Server
  2. JAMES-3937

Improve CassandraThreadIdGuessingAlgorithm

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.8.0
    • 3.9.0
    • cassandra, mailbox
    • None

    Description

      Why?

      CassandraThreadIdGuessingAlgorithm tables occupies a non neglictible amount of space.

      Out of a 20 GB database I have in one of my production platform:

      		Table: threadlookuptable
      		SSTable count: 4
      		Space used (total): 360 263 739
      
      		Table: threadtable
      		SSTable count: 8
      		Space used (total): 1 050 590 715
      

      Which is non neglictible.

      The goal here would be to reduce the space used in database by thread allocation.

      Other concerns

      Storing subjects as is linked to usernames is likely problematic in terms of privacy.

      How ?

      Thread Guessing Algorithm do not need raw values to operate but works with hashs as demonstrated by https://issues.apache.org/jira/browse/JAMES-3937.

      Impact ?

      As threads are partitionned by users risk of collision is extremly low and false posotives might only result in incorrect thread grouping, making this use case none sensitive to hash collisions. Use of non cryptographic hash methods is thus acceptable.

      We expect a significant space reduction.

      Migration: We will just create a new table and drop the old one. THis will cause a discontinuity in thread allocation: 2 threads instead of one. This seems acceptable and preferable to a complex migration in our eyes.

      Attachments

        Activity

          People

            Unassigned Unassigned
            btellier Benoit Tellier
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m