HBase
  1. HBase
  2. HBASE-5416

Filter on one CF and if a match, then load and return full row (WAS: Improve performance of scans with some kind of filters)

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      New method is added to Filter which allows filter to specify which CF is needed to it's operation.

      public boolean isFamilyEssential(byte[] name);

      When new row is considered, only data for essential family is loaded and filter applied. And only if filter accepts the row, rest of data is loaded.

      This feature is off by default. You can use Scan.setLoadColumnFamiliesOnDemand() to enable it on a per Scan basis. If not indicated for the Scan, boolean value for "hbase.hregion.scan.loadColumnFamiliesOnDemand" would be used (default to false).
      Show
      New method is added to Filter which allows filter to specify which CF is needed to it's operation. public boolean isFamilyEssential(byte[] name); When new row is considered, only data for essential family is loaded and filter applied. And only if filter accepts the row, rest of data is loaded. This feature is off by default. You can use Scan.setLoadColumnFamiliesOnDemand() to enable it on a per Scan basis. If not indicated for the Scan, boolean value for "hbase.hregion.scan.loadColumnFamiliesOnDemand" would be used (default to false).

      Description

      When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed.

      But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter.

      For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed.

      Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed.

      1. 5416-TestJoinedScanners-0.94.txt
        7 kB
        Ted Yu
      2. 5416-drop-new-method-from-filter.txt
        5 kB
        Ted Yu
      3. 5416-0.94-v3.txt
        33 kB
        Lars Hofhansl
      4. org.apache.hadoop.hbase.regionserver.TestHRegion-output.txt
        4.23 MB
        Ted Yu
      5. 5416-v16.patch
        59 kB
        Ted Yu
      6. 5416-0.94-v2.txt
        33 kB
        Lars Hofhansl
      7. 5416-v15.patch
        59 kB
        Ted Yu
      8. 5416-v14.patch
        59 kB
        Ted Yu
      9. 5416-0.94-v1.txt
        33 kB
        Lars Hofhansl
      10. 5416-v13.patch
        59 kB
        Ted Yu
      11. HBASE-5416-v12.patch
        64 kB
        Sergey Shelukhin
      12. HBASE-5416-v12.patch
        64 kB
        Sergey Shelukhin
      13. HBASE-5416-v11.patch
        60 kB
        Sergey Shelukhin
      14. HBASE-5416-v10.patch
        60 kB
        Sergey Shelukhin
      15. HBASE-5416-v9.patch
        59 kB
        Sergey Shelukhin
      16. HBASE-5416-v8.patch
        59 kB
        Sergey Shelukhin
      17. HBASE-5416-v7-rebased.patch
        32 kB
        Sergey Shelukhin
      18. Filtered_scans_v7.patch
        32 kB
        Max Lapan
      19. 5416-Filtered_scans_v6.patch
        22 kB
        Ted Yu
      20. Filtered_scans_v5.1.patch
        23 kB
        Max Lapan
      21. Filtered_scans_v5.patch
        22 kB
        Max Lapan
      22. 5416-v6.txt
        15 kB
        Ted Yu
      23. 5416-v5.txt
        16 kB
        Ted Yu
      24. Filtered_scans_v4.patch
        16 kB
        Max Lapan
      25. Filtered_scans_v3.patch
        16 kB
        Max Lapan
      26. Filtered_scans_v2.patch
        10 kB
        Max Lapan
      27. Filtered_scans.patch
        8 kB
        Max Lapan

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Max Lapan created issue -
          Max Lapan made changes -
          Field Original Value New Value
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.90.4 [ 12316406 ]
          Max Lapan made changes -
          Attachment 0001-Optimization-of-scans-using-filters.patch [ 12514861 ]
          Max Lapan made changes -
          Affects Version/s 0.90.4 [ 12316406 ]
          Affects Version/s 0.94.0 [ 12316419 ]
          Max Lapan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Max Lapan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Max Lapan made changes -
          Fix Version/s 0.90.4 [ 12316406 ]
          Max Lapan made changes -
          Attachment 0001-Optimization-of-scans-using-filters.patch [ 12514861 ]
          Max Lapan made changes -
          Attachment Filtered-scans_0.90.4.patch [ 12514882 ]
          Max Lapan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Max Lapan made changes -
          Attachment Filtered-scans_trunk.patch [ 12514884 ]
          Max Lapan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Max Lapan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Max Lapan made changes -
          Attachment Filtered-scans_0.90.4.patch [ 12514882 ]
          Max Lapan made changes -
          Attachment Filtered-scans_trunk.patch [ 12514884 ]
          Max Lapan made changes -
          Attachment Filtered-scans_0.90.4.patch [ 12514891 ]
          Max Lapan made changes -
          Attachment Filtered-scans_trunk.patch [ 12514892 ]
          Max Lapan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Max Lapan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Max Lapan made changes -
          Attachment Filtered-scans_0.90.4.patch [ 12514891 ]
          Max Lapan made changes -
          Attachment Filtered-scans_trunk.patch [ 12514892 ]
          Max Lapan made changes -
          Attachment Filtered_scans.patch [ 12515249 ]
          Max Lapan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Max Lapan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Max Lapan made changes -
          Attachment Filtered_scans_v2.patch [ 12515558 ]
          Max Lapan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Max Lapan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Max Lapan made changes -
          Attachment Filtered_scans_v3.patch [ 12515904 ]
          Max Lapan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Max Lapan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Max Lapan made changes -
          Attachment Filtered_scans_v4.patch [ 12515906 ]
          Max Lapan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Max Lapan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Ted Yu made changes -
          Attachment 5416-v5.txt [ 12515923 ]
          Ted Yu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hadoop Flags Reviewed [ 10343 ]
          Ted Yu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Ted Yu made changes -
          Attachment 5416-v6.txt [ 12515929 ]
          Max Lapan made changes -
          Attachment Filtered_scans_v5.patch [ 12529061 ]
          Max Lapan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Max Lapan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Max Lapan made changes -
          Attachment Filtered_scans_v5.patch [ 12529061 ]
          Max Lapan made changes -
          Attachment Filtered_scans_v5.patch [ 12529065 ]
          Max Lapan made changes -
          Attachment Filtered_scans_v5.patch [ 12529065 ]
          Max Lapan made changes -
          Attachment Filtered_scans_v5.patch [ 12529071 ]
          Max Lapan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Max Lapan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Max Lapan made changes -
          Attachment Filtered_scans_v5.1.patch [ 12529706 ]
          Max Lapan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hadoop Flags Reviewed [ 10343 ]
          Ted Yu made changes -
          Attachment 5416-Filtered_scans_v6.patch [ 12530695 ]
          Ted Yu made changes -
          Attachment 5416-Filtered_scans_v6.patch [ 12530695 ]
          Ted Yu made changes -
          Attachment 5416-Filtered_scans_v6.patch [ 12530699 ]
          Max Lapan made changes -
          Attachment Filtered_scans_v7.patch [ 12534189 ]
          Ted Yu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Ted Yu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.96.0 [ 12320040 ]
          Max Lapan made changes -
          Link This issue is broken by HBASE-6499 [ HBASE-6499 ]
          Otis Gospodnetic made changes -
          Link This issue relates to HBASE-74 [ HBASE-74 ]
          Sergey Shelukhin made changes -
          Attachment HBASE-5416-v7-rebased.patch [ 12560912 ]
          Sergey Shelukhin made changes -
          Attachment HBASE-5416-v8.patch [ 12561070 ]
          Sergey Shelukhin made changes -
          Attachment HBASE-5416-v9.patch [ 12561329 ]
          Sergey Shelukhin made changes -
          Link This issue blocks HBASE-7383 [ HBASE-7383 ]
          Sergey Shelukhin made changes -
          Attachment HBASE-5416-v10.patch [ 12561614 ]
          Sergey Shelukhin made changes -
          Assignee Max Lapan [ shmuma ] Sergey Shelukhin [ sershe ]
          Sergey Shelukhin made changes -
          Attachment HBASE-5416-v11.patch [ 12561806 ]
          Sergey Shelukhin made changes -
          Attachment HBASE-5416-v12.patch [ 12562021 ]
          Sergey Shelukhin made changes -
          Attachment HBASE-5416-v12.patch [ 12562033 ]
          Ted Yu made changes -
          Attachment 5416-v13.patch [ 12562493 ]
          Lars Hofhansl made changes -
          Attachment 5416-0.94-v1.txt [ 12562737 ]
          Ted Yu made changes -
          Attachment 5416-v14.patch [ 12562752 ]
          Ted Yu made changes -
          Attachment 5416-v15.patch [ 12562757 ]
          Lars Hofhansl made changes -
          Attachment 5416-0.94-v2.txt [ 12562766 ]
          Ted Yu made changes -
          Attachment 5416-v16.patch [ 12562796 ]
          Ted Yu made changes -
          Attachment 5416-v16.patch [ 12562796 ]
          Ted Yu made changes -
          Attachment 5416-v16.patch [ 12563428 ]
          Ted Yu made changes -
          Attachment 5416-v16.patch [ 12563428 ]
          Ted Yu made changes -
          Attachment 5416-v16.patch [ 12563434 ]
          Ted Yu made changes -
          Hadoop Flags Reviewed [ 10343 ]
          Release Note New method is added to Filter which allows filter to specify which CF is needed to it's operation.

          public boolean isFamilyEssential(byte[] name);

          When new row is considered, only data for essential family is loaded and filter applied. And only if filter accepts the row, rest of data is loaded.

          This feature is off by default. You can use Scan.setLoadColumnFamiliesOnDemand() to enable it on a per Scan basis. If not indicated for the Scan, boolean value for "hbase.hregion.scan.loadColumnFamiliesOnDemand" would be used (default to false).
          Ted Yu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Ted Yu made changes -
          Lars Hofhansl made changes -
          Fix Version/s 0.94.5 [ 12323874 ]
          Lars Hofhansl made changes -
          Attachment 5416-0.94-v3.txt [ 12564340 ]
          Lars Hofhansl made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Lars Hofhansl made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Ted Yu made changes -
          Attachment 5416-drop-new-method-from-filter.txt [ 12570620 ]
          Ted Yu made changes -
          Link This issue relates to HBASE-7920 [ HBASE-7920 ]
          stack made changes -
          Fix Version/s 0.95.0 [ 12324094 ]
          Fix Version/s 0.96.0 [ 12320040 ]
          Fix Version/s 0.94.5 [ 12323874 ]
          Lars Hofhansl made changes -
          Fix Version/s 0.94.5 [ 12323874 ]
          Ted Yu made changes -
          Attachment 5416-TestJoinedScanners-0.94.txt [ 12577471 ]
          Ted Yu made changes -
          Link This issue is related to HBASE-8334 [ HBASE-8334 ]
          stack made changes -
          Summary Improve performance of scans with some kind of filters. Filter on one CF and if a match, then load and return full row (WAS: Improve performance of scans with some kind of filters)

            People

            • Assignee:
              Sergey Shelukhin
              Reporter:
              Max Lapan
            • Votes:
              0 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development