Did an offline discussion with namit on this jira.
The basic question is how to use this bitmap indexing. Given there are millions of rows in one block, the block will contain all distinct values this column has. So the bitmap index will not be very useful. A possibly use case maybe do a bitmap and/or. eg, need to find out all records about Male in Japan. Male and Japan are both bitmap indexed. what we can do today is to first do a JOIN and BITMAP AND operation on the 2 index tables, and then find all the matching blocks, which is ok, but there requires a join operation. If we can support an bitmap index with more than 1 index columns, it will help in this case. I mean each index column in the index table has its own bitmap. Eg, FILE_NAME, BLK_OFFSET, GENDER, bitmapForGENDER, COUNTY, bitmapForCountry. bitmapForGENDER will have two bitmaps internally, one for Male, one for Female. And bitmapForCountry will have bitmaps for each country.
And if hive can support skip rows, the bitmap index will be very useful. I mean with bitmap indexing, block pruning maybe not good enough. For example, in a block, we only find the row1, row3, lastRow satisfy the predicate. We can just skip row2, and row4 to lastRow-1.
what do you think?