Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2934

HBaseStorage filter optimizations

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
    • Release Note:
      HBaseStorage filter performance improvements

      Description

      Our HBase pal/guru Gary Helmling was kind enough to do a code review of HBaseStorage. He suggested some good filter optimizations:

      • when using the "lt*" and "gt*" options, set the start/stop rows on the Scan instance, at least in addition to the RowFilters. Without this you're doing a full table scan, regardless of the RowFilters.
      • when selecting specific columns or entire families to return, it would be more efficient to set the family + columns on the Scan object (addFamily(), addColumn()), instead of using a FilterList. I'm not familiar with the family:prefix handling you mention, but that would still seem to require filters. But if that's not being used, it would be better to avoid the FilterList for columns. At minimum, we should probably call Scan.addFamily() with the distinct families, so we can skip entire column families that are not being used. In the case of a table with 4 CFs, if, say, only 1 is being used, this could be a big gain.

        Attachments

          Activity

            People

            • Assignee:
              billgraham Bill Graham
              Reporter:
              billgraham Bill Graham
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: