Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-16973

Revisiting default value for hbase.client.scanner.caching

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      We are observing below logs for a long-running scan:

      2016-10-30 08:51:41,692 WARN  [B.defaultRpcServer.handler=50,queue=12,port=16020] ipc.RpcServer:
      (responseTooSlow-LongProcessTime): {"processingtimems":24329,
      "call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)",
      "client":"11.251.157.108:50415","scandetails":"table: ae_product_image region: ae_product_image,494:
      ,1476872321454.33171a04a683c4404717c43ea4eb8978.","param":"scanner_id: 5333521 number_of_rows: 2147483647
      close_scanner: false next_call_seq: 8 client_handles_partials: true client_handles_heartbeats: true",
      "starttimems":1477788677363,"queuetimems":0,"class":"HRegionServer","responsesize":818,"method":"Scan"}
      

      From which we found the "number_of_rows" is as big as Integer.MAX_VALUE

      And we also observed a long filter list on the customized scan. After checking application code we confirmed that there's no Scan.setCaching or hbase.client.scanner.caching setting on client side, so it turns out using the default value the caching for Scan will be Integer.MAX_VALUE, which is really a big surprise.

      After checking code and commit history, I found it's HBASE-11544 which changes HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING from 100 to Integer.MAX_VALUE, and from the release note there I could see below notation:

      Scan caching default has been changed to Integer.Max_Value 
      This value works together with the new maxResultSize value from HBASE-12976 (defaults to 2MB) 
      Results returned from server on basis of size rather than number of rows 
      Provides better use of network since row size varies amongst tables
      

      And I'm afraid this lacks of consideration of the case of scan with filters, which may involve many rows but only return with a small result.

      What's more, we still have below comment/code in Scan.java

        /*
         * -1 means no caching
         */
        private int caching = -1;
      

      But actually the implementation does not follow (instead of no caching, we are caching Integer.MAX_VALUE...).

      So here I'd like to bring up two points:
      1. Change back the default value of HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING to some small value like 128
      2. Reenforce the semantic of "no caching"

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            liyu Yu Li Assign to me
            liyu Yu Li

            Dates

              Created:
              Updated:

              Slack

                Issue deployment