Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-1034

Faster bitmap indexes with Roaring bitmaps

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Future
    • v1.1
    • Others

    Description

      Kylin is using ConciseSet for bitmap indexing. It was found that Roaring bitmaps are often much better than ConciseSet (e.g., see experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The compression is often better and the speed difference can be substantial. This is even more so with version 0.5 of Roaring.

      There is a high quality Java implementation that is used by Apache Spark and Druid.io. The Druid people found that switching to Roaring bitmaps could improve real-world performance by 30% or more.

      Source code:
      https://github.com/lemire/RoaringBitmap/

      Importing from Maven:

      http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/

      <dependencies>
      <dependency>
      <groupId>org.roaringbitmap</groupId>
      <artifactId>RoaringBitmap</artifactId>
      <version>[0.5,)</version>
      </dependency>
      </dependencies>

      Online API :

      http://lemire.me/docs/RoaringBitmap/

      JavaDoc Jar:

      http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/RoaringBitmap-0.5.3-javadoc.jar

      When desired, the library supports memory file mapping, so that out-of-JVM heap memory is used instead. This can greatly improve IO issues. The library is available under the Apache license and is patent-free.

      There is an extensive real-data benchmark framework which you can run for yourself to compare Roaring with competitive alternatives such as ConciseSet :

      https://github.com/lemire/RoaringBitmap/tree/master/jmh

      Running such a benchmark can be as simple as launching a script.

      For Druid, the bitmap format was made "pluggable" so you can switch from one format to the other using a configuration flag. This is implemented through simple wrappers, e.g., see

      https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap

      So it can be really easy to make it possible to switch the format while preserving backward compatibility if needed...

      It is probably not difficult work to integrate Roaring in Kylin (maybe a day or so of programming) and it could potentially improve performance while reducing memory storage.

      Note: Roaring bitmaps were also adopted in Apache Lucene, though they have their own implementation, see
      https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps

      Attachments

        1. roaring-0001.patch
          32 kB
          Daniel Lemire

        Activity

          People

            liyang.gmt8@gmail.com liyang
            lemire Daniel Lemire
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: