Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-11558

Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+

    Details

    • Release Note:
      Hide
      TableMapReduceUtil now restores the option to set scanner caching by setting it on the Scan object that is passe in. The priority order for choosing the scanner caching is as follows:

      1. Caching set on the scan object.
      2. Caching specified via the config "hbase.client.scanner.caching", which can either be set manually on the conf or via the helper method TableMapReduceUtil.setScannerCaching().
      3. The default value HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING, which is set to 100 currently.
      Show
      TableMapReduceUtil now restores the option to set scanner caching by setting it on the Scan object that is passe in. The priority order for choosing the scanner caching is as follows: 1. Caching set on the scan object. 2. Caching specified via the config "hbase.client.scanner.caching", which can either be set manually on the conf or via the helper method TableMapReduceUtil.setScannerCaching(). 3. The default value HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING, which is set to 100 currently.

      Description

      0.94 and before, if one sets caching on the Scan object in the Job by calling scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly read and used by the mappers during a mapreduce job. This is because Scan.write respects and serializes caching, which is used internally by TableMapReduceUtil to serialize and transfer the scan object to the mappers.

      0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect caching anymore as ClientProtos.Scan does not have the field caching. Caching is passed via the ScanRequest object to the server and so is not needed in the Scan object. However, this breaks application code that relies on the earlier behavior. This will lead to sudden degradation in Scan performance 0.96+ for users relying on the old behavior.

      There are 2 options here:
      1. Add caching to Scan object, adding an extra int to the payload for the Scan object which is really not needed in the general case.
      2. Document and preach that TableMapReduceUtil.setScannerCaching must be called by the client.

        Attachments

        1. HBASE_11558_v2.patch
          18 kB
          Andrew Purtell
        2. HBASE_11558-0.98_v2.patch
          17 kB
          Ishan Chhabra
        3. HBASE_11558-0.96_v2.patch
          17 kB
          Ishan Chhabra
        4. HBASE_11558_v2.patch
          18 kB
          Ishan Chhabra
        5. HBASE_11558.patch
          17 kB
          Ishan Chhabra
        6. HBASE_11558-0.98.patch
          17 kB
          Ishan Chhabra
        7. HBASE_11558-0.96.patch
          16 kB
          Ishan Chhabra

          Issue Links

            Activity

              People

              • Assignee:
                ishanc Ishan Chhabra
                Reporter:
                ishanc Ishan Chhabra
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: