Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-801

Serialized String comparison, Unicode support

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • None
    • pre-apache
    • None

    Description

      The StringComparator now works on serialized data.

      To this end new string read/write/copy/compare methods were introduced, which use a variable-length encoding for the characters.

      key-points:

      • The most significant bits are written/read first.
      • The first 2 bits of the character are used to encode the size of the character.
      • A character is at most 3 Bytes big.

      Additionally, the StringSerializer now has full unicode support. i couldn't find a unicode character that uses more than 22 bits, as such 3 Bytes should be sufficient.

      ---------------- Imported from GitHub ----------------
      Url: https://github.com/stratosphere/stratosphere/pull/801
      Created by: zentol
      Labels:
      Created at: Tue May 13 18:06:22 CEST 2014
      State: open

      Attachments

        Activity

          People

            Unassigned Unassigned
            github-import GitHub Import
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: