Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Issue Links

        Activity

        Hide
        ananthg.apex Ananth added a comment -

        Integration with xgboost python package gives the following readings

        The xgBoost ensemble of trees was generated for four depths ( and this resulted in varying number of trees ). The readings are given for all four of these modelling configurations

        • 2012 Macbook Pro (2.6 GHz Intel Core i7 with 16GB RAM), No GPU was enabled for either modelling or scoring
        • The model was to perform iris data set recognition
        • The source code for the modelling and the binary version of the model can be located in the resources folder of the git project ( link in the second comment )
        • Readings in microseconds

        Result "github.ananthc.sampleapps.apex.xgboost.XGBoostJepBenchMarkDepth3.testXGBoostPredictIrisDepth3 ( 60 trees )":
        475.027 ±(99.9%) 5.441 us/op [Average]
        (min, avg, max) = (428.774, 475.027, 567.648), stdev = 23.037
        CI (99.9%): [469.586, 480.468] (assumes normal distribution)

        1. Run complete. Total time: 00:08:28

        Benchmark Mode Cnt Score Error Units
        XGBoostJepBenchMarkDepth3.testXGBoostPredictIrisDepth3 avgt 200 475.027 ± 5.441 us/op

        Result "github.ananthc.sampleapps.apex.xgboost.XGBoostJepBenchMarkDepth9.testXGBoostPredictIrisDepth9 ( 120 trees )":
        479.907 ±(99.9%) 6.342 us/op [Average]
        (min, avg, max) = (427.637, 479.907, 576.946), stdev = 26.852
        CI (99.9%): [473.565, 486.249] (assumes normal distribution)

        1. Run complete. Total time: 00:08:31

        Benchmark Mode Cnt Score Error Units
        XGBoostJepBenchMarkDepth9.testXGBoostPredictIrisDepth9 avgt 200 479.907 ± 6.342 us/op

        Result "github.ananthc.sampleapps.apex.xgboost.XGBoostJepBenchMarkDepth27.testXGBoostPredictIrisDepth27 ( 300 trees )":
        524.516 ±(99.9%) 13.392 us/op [Average]
        (min, avg, max) = (423.894, 524.516, 838.232), stdev = 56.701
        CI (99.9%): [511.124, 537.908] (assumes normal distribution)

        1. Run complete. Total time: 00:08:30

        Benchmark Mode Cnt Score Error Units
        XGBoostJepBenchMarkDepth27.testXGBoostPredictIrisDepth27 avgt 200 524.516 ± 13.392 us/op

        Result "github.ananthc.sampleapps.apex.xgboost.XGBoostJepBenchMarkDepth125.testXGBoostPredictIrisDepth125 ( 900 trees )":
        519.460 ±(99.9%) 10.647 us/op [Average]
        (min, avg, max) = (458.625, 519.460, 693.956), stdev = 45.082
        CI (99.9%): [508.812, 530.107] (assumes normal distribution)

        1. Run complete. Total time: 00:08:35

        Benchmark Mode Cnt Score Error Units
        XGBoostJepBenchMarkDepth125.testXGBoostPredictIrisDepth125 avgt 200 519.460 ± 10.647 us/op

        Show
        ananthg.apex Ananth added a comment - Integration with xgboost python package gives the following readings The xgBoost ensemble of trees was generated for four depths ( and this resulted in varying number of trees ). The readings are given for all four of these modelling configurations 2012 Macbook Pro (2.6 GHz Intel Core i7 with 16GB RAM), No GPU was enabled for either modelling or scoring The model was to perform iris data set recognition The source code for the modelling and the binary version of the model can be located in the resources folder of the git project ( link in the second comment ) Readings in microseconds Result "github.ananthc.sampleapps.apex.xgboost.XGBoostJepBenchMarkDepth3.testXGBoostPredictIrisDepth3 ( 60 trees )": 475.027 ±(99.9%) 5.441 us/op [Average] (min, avg, max) = (428.774, 475.027, 567.648), stdev = 23.037 CI (99.9%): [469.586, 480.468] (assumes normal distribution) Run complete. Total time: 00:08:28 Benchmark Mode Cnt Score Error Units XGBoostJepBenchMarkDepth3.testXGBoostPredictIrisDepth3 avgt 200 475.027 ± 5.441 us/op Result "github.ananthc.sampleapps.apex.xgboost.XGBoostJepBenchMarkDepth9.testXGBoostPredictIrisDepth9 ( 120 trees )": 479.907 ±(99.9%) 6.342 us/op [Average] (min, avg, max) = (427.637, 479.907, 576.946), stdev = 26.852 CI (99.9%): [473.565, 486.249] (assumes normal distribution) Run complete. Total time: 00:08:31 Benchmark Mode Cnt Score Error Units XGBoostJepBenchMarkDepth9.testXGBoostPredictIrisDepth9 avgt 200 479.907 ± 6.342 us/op Result "github.ananthc.sampleapps.apex.xgboost.XGBoostJepBenchMarkDepth27.testXGBoostPredictIrisDepth27 ( 300 trees )": 524.516 ±(99.9%) 13.392 us/op [Average] (min, avg, max) = (423.894, 524.516, 838.232), stdev = 56.701 CI (99.9%): [511.124, 537.908] (assumes normal distribution) Run complete. Total time: 00:08:30 Benchmark Mode Cnt Score Error Units XGBoostJepBenchMarkDepth27.testXGBoostPredictIrisDepth27 avgt 200 524.516 ± 13.392 us/op Result "github.ananthc.sampleapps.apex.xgboost.XGBoostJepBenchMarkDepth125.testXGBoostPredictIrisDepth125 ( 900 trees )": 519.460 ±(99.9%) 10.647 us/op [Average] (min, avg, max) = (458.625, 519.460, 693.956), stdev = 45.082 CI (99.9%): [508.812, 530.107] (assumes normal distribution) Run complete. Total time: 00:08:35 Benchmark Mode Cnt Score Error Units XGBoostJepBenchMarkDepth125.testXGBoostPredictIrisDepth125 avgt 200 519.460 ± 10.647 us/op
        Hide
        ananthg.apex Ananth added a comment - - edited

        Tensorflow integration with Keras as the wrapper gave the following readings:

        • 2012 Macbook Pro (2.6 GHz Intel Core i7 with 16GB RAM), No GPU was enabled for either modelling or scoring
        • The model was to perform MNIST digit recognition
        • The source code for the modelling and the binary version of the model can be located in the resources folder of the git project ( link in the second comment )
        • Readings in microseconds

        Result "github.ananthc.sampleapps.apex.keras.KerasMnistJepBenchMark.testMNISTKerasWithTF":
        1694.491 ±(99.9%) 25.518 us/op [Average]
        (min, avg, max) = (1615.805, 1694.491, 2127.963), stdev = 108.046
        CI (99.9%): [1668.973, 1720.009] (assumes normal distribution)

        1. Run complete. Total time: 00:08:31

        Benchmark Mode Cnt Score Error Units
        KerasMnistJepBenchMark.testMNISTKerasWithTF avgt 200 1694.491 ± 25.518 us/op

        Show
        ananthg.apex Ananth added a comment - - edited Tensorflow integration with Keras as the wrapper gave the following readings: 2012 Macbook Pro (2.6 GHz Intel Core i7 with 16GB RAM), No GPU was enabled for either modelling or scoring The model was to perform MNIST digit recognition The source code for the modelling and the binary version of the model can be located in the resources folder of the git project ( link in the second comment ) Readings in microseconds Result "github.ananthc.sampleapps.apex.keras.KerasMnistJepBenchMark.testMNISTKerasWithTF": 1694.491 ±(99.9%) 25.518 us/op [Average] (min, avg, max) = (1615.805, 1694.491, 2127.963), stdev = 108.046 CI (99.9%): [1668.973, 1720.009] (assumes normal distribution) Run complete. Total time: 00:08:31 Benchmark Mode Cnt Score Error Units KerasMnistJepBenchMark.testMNISTKerasWithTF avgt 200 1694.491 ± 25.518 us/op
        Hide
        ananthg.apex Ananth added a comment - - edited

        Scoring a scikit learn SVM model on a 2012 macbook pro gave the following readings. The model is a standard SVM model on the iris data set.

        The test was measuring for latencies as opposed to throughput. All readings are in microseconds.

        Result "github.ananthc.sampleapps.apex.scikitlearn.ScikitLearnSVMJepBenchmark.testSVMIrisPredict":
        150.427 ±(99.9%) 7.332 us/op [Average]
        (min, avg, max) = (117.998, 150.427, 278.607), stdev = 31.045
        CI (99.9%): [143.095, 157.760] (assumes normal distribution)

        1. Run complete. Total time: 00:08:44

        Benchmark Mode Cnt Score Error Units
        ScikitLearnSVMJepBenchmark.testSVMIrisPredict avgt 200 150.427 ± 7.332 us/op

        Show
        ananthg.apex Ananth added a comment - - edited Scoring a scikit learn SVM model on a 2012 macbook pro gave the following readings. The model is a standard SVM model on the iris data set. The test was measuring for latencies as opposed to throughput. All readings are in microseconds . Result "github.ananthc.sampleapps.apex.scikitlearn.ScikitLearnSVMJepBenchmark.testSVMIrisPredict": 150.427 ±(99.9%) 7.332 us/op [Average] (min, avg, max) = (117.998, 150.427, 278.607), stdev = 31.045 CI (99.9%): [143.095, 157.760] (assumes normal distribution) Run complete. Total time: 00:08:44 Benchmark Mode Cnt Score Error Units ScikitLearnSVMJepBenchmark.testSVMIrisPredict avgt 200 150.427 ± 7.332 us/op
        Hide
        ananthg.apex Ananth added a comment - - edited

        A design discussion thread will follow in the mailing lists regarding the python execution operator in Malhar. In the mean time here is some analysis regarding the execution latencies that are indicative for this operator. Note that that actual latencies for this operator will be a little bit more as the benchmark only measures the scoring component but not any data formatting or writing to the ports etc.

        The following 3 comments are based on JMH benchmarks generated by using the JEP . The source code for the benchmarks is here

        Show
        ananthg.apex Ananth added a comment - - edited A design discussion thread will follow in the mailing lists regarding the python execution operator in Malhar. In the mean time here is some analysis regarding the execution latencies that are indicative for this operator. Note that that actual latencies for this operator will be a little bit more as the benchmark only measures the scoring component but not any data formatting or writing to the ports etc. The following 3 comments are based on JMH benchmarks generated by using the JEP . The source code for the benchmarks is here
        Hide
        ananthg.apex Ananth added a comment - - edited

        Thanks for the comment Vikram Patil . My understanding of the requirement is that https://issues.apache.org/jira/browse/APEXMALHAR-2261 is about having an ability to use apex from a python environment i.e. The streaming application is launched via python and this JIRA [2260] is more about invoking python code from a java Apex application. I see a lot of value in both of these use cases.

        I glanced at the pull request 613 before and it looked the pull request is addressing ApexMalhar-2261 in its entirety and not ApexMalhar-2260. The use case I am trying to solve is the latter wherein we want to invoke a python function for scoring with the data points extracted and streamed from an upstream operator and the application is primarily coded in java. The pain points that this use case is going to solve is the following situations. A data scientist develops the model and pickles the model into a repo and this is then pulled in by this operator or an operator derived thereof to execute and collect back a score. The params to the python scoring function are possibly coming from the upstream operator say a cassandra read operator and basic feature engineering done in the current operator before it invokes the configured python function. Other interesting aspects that I would like to see is to use a virtualenv construct for this operator so that multiple versions of python libraries can exist on the datanode where the operator is currently executing etc.

        Happy to collaborate and discuss regarding the pull request 613 but wanted to confirm the above thinking before the task is taken up.

        Show
        ananthg.apex Ananth added a comment - - edited Thanks for the comment Vikram Patil . My understanding of the requirement is that https://issues.apache.org/jira/browse/APEXMALHAR-2261 is about having an ability to use apex from a python environment i.e. The streaming application is launched via python and this JIRA [2260] is more about invoking python code from a java Apex application. I see a lot of value in both of these use cases. I glanced at the pull request 613 before and it looked the pull request is addressing ApexMalhar-2261 in its entirety and not ApexMalhar-2260. The use case I am trying to solve is the latter wherein we want to invoke a python function for scoring with the data points extracted and streamed from an upstream operator and the application is primarily coded in java. The pain points that this use case is going to solve is the following situations. A data scientist develops the model and pickles the model into a repo and this is then pulled in by this operator or an operator derived thereof to execute and collect back a score. The params to the python scoring function are possibly coming from the upstream operator say a cassandra read operator and basic feature engineering done in the current operator before it invokes the configured python function. Other interesting aspects that I would like to see is to use a virtualenv construct for this operator so that multiple versions of python libraries can exist on the datanode where the operator is currently executing etc. Happy to collaborate and discuss regarding the pull request 613 but wanted to confirm the above thinking before the task is taken up.
        Hide
        vikram25 Vikram Patil added a comment -

        Hi Ananth,

        PR request is opened for this task. ( rather for https://issues.apache.org/jira/browse/APEXMALHAR-2261 )
        https://github.com/apache/apex-malhar/pull/613

        You are welcome to discuss & collaborate efforts on this one.

        Show
        vikram25 Vikram Patil added a comment - Hi Ananth, PR request is opened for this task. ( rather for https://issues.apache.org/jira/browse/APEXMALHAR-2261 ) https://github.com/apache/apex-malhar/pull/613 You are welcome to discuss & collaborate efforts on this one.
        Hide
        ananthg.apex Ananth added a comment -

        Hello Thomas Weise - Can I work on this if no one else is working on this ?

        Show
        ananthg.apex Ananth added a comment - Hello Thomas Weise - Can I work on this if no one else is working on this ?

          People

          • Assignee:
            ananthg.apex Ananth
            Reporter:
            thw Thomas Weise
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development