Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.9.0
    • Fix Version/s: None
    • Component/s: kv
    • Labels:

      Description

      RocksDB java now has the option to expose stats, this can be exposed for graphs and monitoring purposes. We can implement this in our RocksDB kv store implementation to expose these metrics.

      1. hs_err_pid14766.log
        38 kB
        Gustavo Anatoly
      2. hs_err_pid4332.log
        38 kB
        Gustavo Anatoly
      3. SAMZA-449.patch
        3 kB
        Gustavo Anatoly
      4. SAMZA-449-v1.patch
        17 kB
        Gustavo Anatoly
      5. SAMZA-449-v2.patch
        26 kB
        Gustavo Anatoly
      6. SAMZA-449-v3.patch
        21 kB
        Gustavo Anatoly
      7. SAMZA-449-v4.patch
        21 kB
        Gustavo Anatoly
      8. SAMZA-449-V5.patch
        36 kB
        Gustavo Anatoly

        Issue Links

          Activity

          Hide
          criccomini Chris Riccomini added a comment -

          I think the way we'd been thinking about this was to expose the RocksDB stats through Samza's metrics API, rather than via trace() logging. This would allow us to pipe RocksDB metrics into whatever reporting framework you're using, rather than just logging the metrics to a file. Not sure how RocksDB exposes the metrics, though. I was assuming they'd expose per-metrics information that we could get() and just update counters/gauges appropriately.

          Show
          criccomini Chris Riccomini added a comment - I think the way we'd been thinking about this was to expose the RocksDB stats through Samza's metrics API, rather than via trace() logging. This would allow us to pipe RocksDB metrics into whatever reporting framework you're using, rather than just logging the metrics to a file. Not sure how RocksDB exposes the metrics, though. I was assuming they'd expose per-metrics information that we could get() and just update counters/gauges appropriately.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Hi, Chris Riccomini

          Thanks for the comments. I'm going to work to expose the RocksDB statistic through metrics.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Hi, Chris Riccomini Thanks for the comments. I'm going to work to expose the RocksDB statistic through metrics.
          Hide
          closeuris Yan Fang added a comment -
          Show
          closeuris Yan Fang added a comment - RB link: https://reviews.apache.org/r/35933/
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Thanks, Yan Fang.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Thanks, Yan Fang .
          Hide
          closeuris Yan Fang added a comment -

          Hi Gustavo Anatoly, thanks for the patch. It looks good. Left a few comments in the RB, mainly about the name convention. Thank you.

          Show
          closeuris Yan Fang added a comment - Hi Gustavo Anatoly , thanks for the patch. It looks good. Left a few comments in the RB, mainly about the name convention. Thank you.
          Hide
          closeuris Yan Fang added a comment -

          Two further comments in RB. Thanks.

          Show
          closeuris Yan Fang added a comment - Two further comments in RB. Thanks.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Thanks, Yan Fang for the comments. I'm working on it.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Thanks, Yan Fang for the comments. I'm working on it.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Hi, Yan Fang

          The changes is available: https://reviews.apache.org/r/35933/diff/2/

          Thanks.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Hi, Yan Fang The changes is available: https://reviews.apache.org/r/35933/diff/2/ Thanks.
          Hide
          closeuris Yan Fang added a comment -

          Gustavo Anatoly, thanks. The patch looks good. There seems a few redundant code I pointed in the RB. Except that, LGTM.

          Show
          closeuris Yan Fang added a comment - Gustavo Anatoly , thanks. The patch looks good. There seems a few redundant code I pointed in the RB. Except that, LGTM.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Could you please review again and test in your environment the patch? I ask you because I found a supposed bug in statistic running over Mac OSX. The patch runs fine on Linux. The log has been attached to share with other devs.

          Thanks.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Could you please review again and test in your environment the patch? I ask you because I found a supposed bug in statistic running over Mac OSX. The patch runs fine on Linux. The log has been attached to share with other devs. Thanks.
          Hide
          closeuris Yan Fang added a comment -

          Hi Gustavo Anatoly, sorry for the late reply. Was busy with other work. I did a simple test, did not see the errors like that in your log.

          Does this error come out immediately after starting the job? Thanks.

          Show
          closeuris Yan Fang added a comment - Hi Gustavo Anatoly , sorry for the late reply. Was busy with other work. I did a simple test, did not see the errors like that in your log. Does this error come out immediately after starting the job? Thanks.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Hi, Yan Fang

          Thanks for test the patch, the error come up when the unit test runs, but this happens only Mac OSX on linux works fine. I'm happy that the tests works in your environment.

          Thanks again.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Hi, Yan Fang Thanks for test the patch, the error come up when the unit test runs, but this happens only Mac OSX on linux works fine. I'm happy that the tests works in your environment. Thanks again.
          Hide
          closeuris Yan Fang added a comment -

          Which version of the Mac are you using?

          Show
          closeuris Yan Fang added a comment - Which version of the Mac are you using?
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          I can tell you when arrive at home, because I'm not with the notebook at this moment.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - I can tell you when arrive at home, because I'm not with the notebook at this moment.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment - - edited

          OSX Yosemite: 10.10.3

          Show
          gustavoanatoly Gustavo Anatoly added a comment - - edited OSX Yosemite: 10.10.3
          Hide
          closeuris Yan Fang added a comment -

          My OS version is 10.9.5. Retested it. Not see any issue and I can see the metrics in the jmx.

          Can anyone in the community help test this patch, especially with the same Yosemite (10.10.3) version? Thank you.

          Show
          closeuris Yan Fang added a comment - My OS version is 10.9.5. Retested it. Not see any issue and I can see the metrics in the jmx. Can anyone in the community help test this patch, especially with the same Yosemite (10.10.3) version? Thank you.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Yan Fang Thanks.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Yan Fang Thanks.
          Hide
          closeuris Yan Fang added a comment -

          Just curious, does this error occur when you only run the metrics? If yes, maybe related to the RocksDB, and if we can reproduce it simply ( a few lines ), worth shooting a message in RocksDB.

          A little conservative to commit this path because it may cause the JVM to crash.

          Show
          closeuris Yan Fang added a comment - Just curious, does this error occur when you only run the metrics? If yes, maybe related to the RocksDB, and if we can reproduce it simply ( a few lines ), worth shooting a message in RocksDB. A little conservative to commit this path because it may cause the JVM to crash.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Tonight I can reproduce it and attach a new log file and specify better how reproduce it, that is basically run tests from command line to cause this error.
          Also looked in the RocksDB community to check out if something was reported, but nothing was found and agree with you, it's better wait before to commit the patch and investigate a little bit more.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Tonight I can reproduce it and attach a new log file and specify better how reproduce it, that is basically run tests from command line to cause this error. Also looked in the RocksDB community to check out if something was reported, but nothing was found and agree with you, it's better wait before to commit the patch and investigate a little bit more.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          I did make two tests one applying patch v3 and second one without a patch, and both the error come up. But the interest point now it's that the log shows exactly problematic frame:

          # Problematic frame:
          # C  [librocksdbjni1383296511323822335..jnilib+0x54ac]  rocksdb::MergeOperators::CreateFromStringId(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)+0xb1c
          

          The error was reproduced, calling the tests from:

           $./gradlew test
          Show
          gustavoanatoly Gustavo Anatoly added a comment - I did make two tests one applying patch v3 and second one without a patch, and both the error come up. But the interest point now it's that the log shows exactly problematic frame: # Problematic frame: # C [librocksdbjni1383296511323822335..jnilib+0x54ac] rocksdb::MergeOperators::CreateFromStringId(std::__1::basic_string< char , std::__1::char_traits< char >, std::__1::allocator< char > > const &)+0xb1c The error was reproduced, calling the tests from: $./gradlew test
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Hi, Yan Fang

          Let me infom you about discussion, generated by the exception reported here and reported to RocksDB community: https://github.com/facebook/rocksdb/issues/658#issuecomment-123863480

          Thanks.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Hi, Yan Fang Let me infom you about discussion, generated by the exception reported here and reported to RocksDB community: https://github.com/facebook/rocksdb/issues/658#issuecomment-123863480 Thanks.
          Hide
          closeuris Yan Fang added a comment -

          That's great! Thank you, Gustavo.

          Show
          closeuris Yan Fang added a comment - That's great! Thank you, Gustavo.
          Hide
          closeuris Yan Fang added a comment -

          Lets revisit it after SAMZA-747 ( after upgrading to rocksdb 3.11)

          Show
          closeuris Yan Fang added a comment - Lets revisit it after SAMZA-747 ( after upgrading to rocksdb 3.11)
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Interesting. Thanks Yan Fang.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Interesting. Thanks Yan Fang .
          Hide
          navina Navina Ramesh added a comment -

          Hi Gustavo Anatoly ,
          Just upgraded RocksDB and committed the changes. Shall we resume work on your patch?

          Thanks!

          Show
          navina Navina Ramesh added a comment - Hi Gustavo Anatoly , Just upgraded RocksDB and committed the changes. Shall we resume work on your patch? Thanks!
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Hi, Navina Ramesh

          Yeah, I will do it. I think the patch V3 looks good, but I going to test again and verify if the error come up.

          Thanks.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Hi, Navina Ramesh Yeah, I will do it. I think the patch V3 looks good, but I going to test again and verify if the error come up. Thanks.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Hi, Navina Ramesh

          The patch (v4) is available and I updated (numberOfOperationsAdded) on TestRocksDbKeyValueStore.testFlushStatistic.

          The tests passed successfully.

          https://reviews.apache.org/r/39422/

          Thanks.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Hi, Navina Ramesh The patch (v4) is available and I updated (numberOfOperationsAdded) on TestRocksDbKeyValueStore.testFlushStatistic. The tests passed successfully. https://reviews.apache.org/r/39422/ Thanks.
          Hide
          navina Navina Ramesh added a comment -

          Thanks for uploading, Gustavo Anatoly !
          Left some comments in the RB.

          Show
          navina Navina Ramesh added a comment - Thanks for uploading, Gustavo Anatoly ! Left some comments in the RB.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Hi, Navina Ramesh

          Thanks for review the patch. I have a doubt where to put control to turn on some statistics, so what do you think about to put it on in a configuration file?

          Cheers.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Hi, Navina Ramesh Thanks for review the patch. I have a doubt where to put control to turn on some statistics, so what do you think about to put it on in a configuration file? Cheers.
          Hide
          navina Navina Ramesh added a comment -

          Gustavo Anatoly : I feel a config file to enable a bunch of metrics is an overkill.

          It's good to have the read / write metrics as default. For all other metrics, I was suggesting more like a single config such as :
          stores.<store-name>.rocksdb.metrics.enable = [ all, none (default) ].

          This is the simplest I can think of.

          Another alternative is to let the user specify a list of additional configs. For example:
          stores.<store-name>.rocksdb.metrics-list=<metric-name-1>,<metric-name2> ... <metric-name-i>
          The above solution will only make sense if we can provide a comprehensive list of metrics that rocksdb exposes or the link to an existing list.

          There are some other ways we can cut down on metrics:

          • Check if the store has enabled WAL. If it has, add metrics related to that. Otherwise, skip it.
          • Until we add multi-get api to Samza, you can remove the multi-get related metrics

          I am ok with solution like above. But don't want to add the overhead of specifying this in a separate config file, if that's what you meant in your comment.

          Show
          navina Navina Ramesh added a comment - Gustavo Anatoly : I feel a config file to enable a bunch of metrics is an overkill. It's good to have the read / write metrics as default. For all other metrics, I was suggesting more like a single config such as : stores.<store-name>.rocksdb.metrics.enable = [ all, none (default) ]. This is the simplest I can think of. Another alternative is to let the user specify a list of additional configs. For example: stores.<store-name>.rocksdb.metrics-list=<metric-name-1>,<metric-name2> ... <metric-name-i> The above solution will only make sense if we can provide a comprehensive list of metrics that rocksdb exposes or the link to an existing list. There are some other ways we can cut down on metrics: Check if the store has enabled WAL. If it has, add metrics related to that. Otherwise, skip it. Until we add multi-get api to Samza, you can remove the multi-get related metrics I am ok with solution like above. But don't want to add the overhead of specifying this in a separate config file, if that's what you meant in your comment.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Navina Ramesh, I agree entirely with you, but I was thinking to seize some file to put the configurations mentioned in your comment. I going to work on that.

          Thanks, for share the ideas.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Navina Ramesh , I agree entirely with you, but I was thinking to seize some file to put the configurations mentioned in your comment. I going to work on that. Thanks, for share the ideas.
          Hide
          navina Navina Ramesh added a comment -

          Gustavo Anatoly Any updates here?

          Show
          navina Navina Ramesh added a comment - Gustavo Anatoly Any updates here?
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Navina Ramesh Sorry for the late reply, I was a little bit busy. I going deliver the new patch soon as a possible.

          Thanks for help

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Navina Ramesh Sorry for the late reply, I was a little bit busy. I going deliver the new patch soon as a possible. Thanks for help
          Hide
          TaoFeng Tao Feng added a comment - - edited

          Hi Gustavo Anatoly, any update for this? We want to add this patch to expose RocksDB stats for performance monitoring for RocksDB. If you don't have cycle, do you think I could help on this patch? Thanks.

          Show
          TaoFeng Tao Feng added a comment - - edited Hi Gustavo Anatoly , any update for this? We want to add this patch to expose RocksDB stats for performance monitoring for RocksDB. If you don't have cycle, do you think I could help on this patch? Thanks.
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Hi, Tao Feng

          Thanks for the concern and actually I'm in debit with the community. I going to finish it in this weekend and sorry for delay to deliver it.

          Thanks again

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Hi, Tao Feng Thanks for the concern and actually I'm in debit with the community. I going to finish it in this weekend and sorry for delay to deliver it. Thanks again
          Hide
          gustavoanatoly Gustavo Anatoly added a comment -

          Hi,
          New patch is available and for review: https://reviews.apache.org/r/41416/diff/1/

          Thanks.

          Show
          gustavoanatoly Gustavo Anatoly added a comment - Hi, New patch is available and for review: https://reviews.apache.org/r/41416/diff/1/ Thanks.

            People

            • Assignee:
              gustavoanatoly Gustavo Anatoly
              Reporter:
              naveenatceg Naveen Somasundaram
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development