Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15352

FST BlockEncoder

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • regionserver
    • None

    Description

      We could improve on the existing PREFIX_TREE block encoder by upgrading the persistent data structure from a trie to a finite state transducer. This would theoretically allow us to reuse bytes not just for rowkey prefixes, but infixes and suffixes as well. My read of the literature means we may also be able to encode values as well, further reducing storage size when values are repeated (ie, a "customer id" field with very low cardinality – probably happens a lot in our denormalized world). There's a really nice blog post about this data structure, and apparently our siblings in Lucene make heavy use of their implementation.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ndimiduk Nick Dimiduk
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: