Apache Gora
  1. Apache Gora
  2. GORA-117

gora hbase does not have a mechanism to set the caching on a scanner, which makes for poor performance on map/reduce jobs

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.2
    • Fix Version/s: 0.4
    • Component/s: gora-hbase
    • Labels:
      None

      Description

      goraci runs a map/reduce job over all the data that it generates. The hbase storage uses a scanner that doesn't cache rows, which means every fetch requires an RPC call. I experimented with

      scan.setCaching(1000);

      and goraci Verify ran about 30x faster.

      1. GORA-117.patch
        4 kB
        Alfonso Nishikawa

        Activity

        Hide
        Hudson added a comment -

        SUCCESS: Integrated in gora-trunk #968 (See https://builds.apache.org/job/gora-trunk/968/)
        update CHANGES.txt for GORA-117 (lewismc: http://svn.apache.org/viewvc/gora/trunk/?view=rev&rev=1556218)

        • /gora/trunk/CHANGES.txt
        Show
        Hudson added a comment - SUCCESS: Integrated in gora-trunk #968 (See https://builds.apache.org/job/gora-trunk/968/ ) update CHANGES.txt for GORA-117 (lewismc: http://svn.apache.org/viewvc/gora/trunk/?view=rev&rev=1556218 ) /gora/trunk/CHANGES.txt
        Hide
        Lewis John McGibbney added a comment -

        Committed @revision's 1556216 and 1556217. in GORA_94 branch.
        Thank you

        Show
        Lewis John McGibbney added a comment - Committed @revision's 1556216 and 1556217. in GORA_94 branch. Thank you
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in gora-trunk #967 (See https://builds.apache.org/job/gora-trunk/967/)
        GORA-117 gora hbase does not have a mechanism to set the caching on a scanner, which makes for poor performance on map/reduce jobs (alfonsonishikawa: http://svn.apache.org/viewvc/gora/trunk/?view=rev&rev=1556044)

        • /gora/trunk/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java
        • /gora/trunk/gora-hbase/src/test/conf/gora.properties
        • /gora/trunk/gora-hbase/src/test/java/org/apache/gora/hbase/store/TestHBaseStore.java
        Show
        Hudson added a comment - SUCCESS: Integrated in gora-trunk #967 (See https://builds.apache.org/job/gora-trunk/967/ ) GORA-117 gora hbase does not have a mechanism to set the caching on a scanner, which makes for poor performance on map/reduce jobs (alfonsonishikawa: http://svn.apache.org/viewvc/gora/trunk/?view=rev&rev=1556044 ) /gora/trunk/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java /gora/trunk/gora-hbase/src/test/conf/gora.properties /gora/trunk/gora-hbase/src/test/java/org/apache/gora/hbase/store/TestHBaseStore.java
        Hide
        Alfonso Nishikawa added a comment - - edited

        Commited GORA-117.patch to /trunk as r1556044

        Show
        Alfonso Nishikawa added a comment - - edited Commited GORA-117 .patch to /trunk as r1556044
        Hide
        Lewis John McGibbney added a comment -

        Would be great to test it with goraci, but since we are busy, I am +1 for commit and move on.
        I am going to spin up HBase cluster on Amazon with Whirr and try goraci soon enough.
        Once I've done that I will collate and present results and we can take it from there folks.

        Show
        Lewis John McGibbney added a comment - Would be great to test it with goraci, but since we are busy, I am +1 for commit and move on. I am going to spin up HBase cluster on Amazon with Whirr and try goraci soon enough. Once I've done that I will collate and present results and we can take it from there folks.
        Hide
        Henry Saputra added a comment -

        Alfonso Nishikawa, per Lewis recommendation please push the patch if no more review coming. We could let ASF Jenkins to build it couple times.

        Show
        Henry Saputra added a comment - Alfonso Nishikawa , per Lewis recommendation please push the patch if no more review coming. We could let ASF Jenkins to build it couple times.
        Hide
        Lewis John McGibbney added a comment -

        Can someone who has tested this commit it to trunk please? Please upload the patch you are testing with and I will port the change to GORA_94 branch.

        Show
        Lewis John McGibbney added a comment - Can someone who has tested this commit it to trunk please? Please upload the patch you are testing with and I will port the change to GORA_94 branch.
        Hide
        Henry Saputra added a comment -

        +1

        Works in my env and just scanning the log for testing looks like it returns the right data.
        Once GoraCI is validated we should push this. Thanks Alfonso.

        Show
        Henry Saputra added a comment - +1 Works in my env and just scanning the log for testing looks like it returns the right data. Once GoraCI is validated we should push this. Thanks Alfonso.
        Hide
        Alfonso Nishikawa added a comment - - edited

        Let's push this, then.

        Uploaded proposal GORA-117.patch.
        Can anyone check it with goraci? I don't have it configured and I am quite busy to use it for my first time
        Tests passes without problem, but I don't know if it works as expected.

        It adds the option to gora.properties:

        gora.hbasestore.scanner.caching=1000
        

        with default value 0, and a get/setScannerCaching() to HBaseStore.

        Show
        Alfonso Nishikawa added a comment - - edited Let's push this, then. Uploaded proposal GORA-117 .patch. Can anyone check it with goraci? I don't have it configured and I am quite busy to use it for my first time Tests passes without problem, but I don't know if it works as expected. It adds the option to gora.properties: gora.hbasestore.scanner.caching=1000 with default value 0, and a get/setScannerCaching() to HBaseStore.
        Hide
        Otis Gospodnetic added a comment -

        Should this get committed? I see Julien Nioche mentioned this in the Nutch draft for the board... Thanks.

        Show
        Otis Gospodnetic added a comment - Should this get committed? I see Julien Nioche mentioned this in the Nutch draft for the board... Thanks.
        Hide
        Lewis John McGibbney added a comment -

        Set and classify

        Show
        Lewis John McGibbney added a comment - Set and classify
        Hide
        stack added a comment -

        @Ferdy True. And looking at goraci, it implements Tool. I just tried passing args on the command line to goraci and it seems to work:

        $ PATH=/export1/stack/hadoop-1.0.2/bin:$PATH ./goraci.sh Verify -Dhbase.client.scanner.caching=1000 -Dmapred.map.tasks.speculative.execution=false v5 100
        

        Scan rate goes from 20k/second to 500k/second.

        So, there is no issue here? Or maybe we should add a bit of doc on it (Where would you suggest?)?

        I'll send the goraci lads a README patch to add above command-line stuff for the Verify step at least.

        Show
        stack added a comment - @Ferdy True. And looking at goraci, it implements Tool. I just tried passing args on the command line to goraci and it seems to work: $ PATH=/export1/stack/hadoop-1.0.2/bin:$PATH ./goraci.sh Verify -Dhbase.client.scanner.caching=1000 -Dmapred.map.tasks.speculative.execution= false v5 100 Scan rate goes from 20k/second to 500k/second. So, there is no issue here? Or maybe we should add a bit of doc on it (Where would you suggest?)? I'll send the goraci lads a README patch to add above command-line stuff for the Verify step at least.
        Hide
        Ferdy Galema added a comment -

        Never mind the comment about hbase.client.scanner.caching property. (Of course this is something you already know.) But it's not fully clear to me yet why this property has no effect in the current store implementation.

        Thanks for raising this issue. I'll await Stack's suggestions.

        Show
        Ferdy Galema added a comment - Never mind the comment about hbase.client.scanner.caching property. (Of course this is something you already know.) But it's not fully clear to me yet why this property has no effect in the current store implementation. Thanks for raising this issue. I'll await Stack's suggestions.
        Hide
        Ferdy Galema added a comment -

        Caching certainly makes scanning a lot faster. However it is already fully configurable, namely as a HBase property hbase.client.scanner.caching

        I do propose to make it obvious to the user that this is one of the tweaks that are really worth configuring. Such as logging a line in HBaseStore initialization, something like:
        autoflush=..., scannercachhing=...

        Show
        Ferdy Galema added a comment - Caching certainly makes scanning a lot faster. However it is already fully configurable, namely as a HBase property hbase.client.scanner.caching I do propose to make it obvious to the user that this is one of the tweaks that are really worth configuring. Such as logging a line in HBaseStore initialization, something like: autoflush=..., scannercachhing=...
        Hide
        stack added a comment -

        Let me have a go at making this at least configurable and start w/ something a little sensible.

        Show
        stack added a comment - Let me have a go at making this at least configurable and start w/ something a little sensible.

          People

          • Assignee:
            stack
            Reporter:
            Eric Newton
          • Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development