Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Issue Links

        Activity

        Hide
        ananthg.apex Ananth added a comment -

        Hello Thomas Weise - Can I work on this if no one else is working on this ?

        Show
        ananthg.apex Ananth added a comment - Hello Thomas Weise - Can I work on this if no one else is working on this ?
        Hide
        vikram25 Vikram Patil added a comment -

        Hi Ananth,

        PR request is opened for this task. ( rather for https://issues.apache.org/jira/browse/APEXMALHAR-2261 )
        https://github.com/apache/apex-malhar/pull/613

        You are welcome to discuss & collaborate efforts on this one.

        Show
        vikram25 Vikram Patil added a comment - Hi Ananth, PR request is opened for this task. ( rather for https://issues.apache.org/jira/browse/APEXMALHAR-2261 ) https://github.com/apache/apex-malhar/pull/613 You are welcome to discuss & collaborate efforts on this one.
        Hide
        ananthg.apex Ananth added a comment - - edited

        Thanks for the comment Vikram Patil . My understanding of the requirement is that https://issues.apache.org/jira/browse/APEXMALHAR-2261 is about having an ability to use apex from a python environment i.e. The streaming application is launched via python and this JIRA [2260] is more about invoking python code from a java Apex application. I see a lot of value in both of these use cases.

        I glanced at the pull request 613 before and it looked the pull request is addressing ApexMalhar-2261 in its entirety and not ApexMalhar-2260. The use case I am trying to solve is the latter wherein we want to invoke a python function for scoring with the data points extracted and streamed from an upstream operator and the application is primarily coded in java. The pain points that this use case is going to solve is the following situations. A data scientist develops the model and pickles the model into a repo and this is then pulled in by this operator or an operator derived thereof to execute and collect back a score. The params to the python scoring function are possibly coming from the upstream operator say a cassandra read operator and basic feature engineering done in the current operator before it invokes the configured python function. Other interesting aspects that I would like to see is to use a virtualenv construct for this operator so that multiple versions of python libraries can exist on the datanode where the operator is currently executing etc.

        Happy to collaborate and discuss regarding the pull request 613 but wanted to confirm the above thinking before the task is taken up.

        Show
        ananthg.apex Ananth added a comment - - edited Thanks for the comment Vikram Patil . My understanding of the requirement is that https://issues.apache.org/jira/browse/APEXMALHAR-2261 is about having an ability to use apex from a python environment i.e. The streaming application is launched via python and this JIRA [2260] is more about invoking python code from a java Apex application. I see a lot of value in both of these use cases. I glanced at the pull request 613 before and it looked the pull request is addressing ApexMalhar-2261 in its entirety and not ApexMalhar-2260. The use case I am trying to solve is the latter wherein we want to invoke a python function for scoring with the data points extracted and streamed from an upstream operator and the application is primarily coded in java. The pain points that this use case is going to solve is the following situations. A data scientist develops the model and pickles the model into a repo and this is then pulled in by this operator or an operator derived thereof to execute and collect back a score. The params to the python scoring function are possibly coming from the upstream operator say a cassandra read operator and basic feature engineering done in the current operator before it invokes the configured python function. Other interesting aspects that I would like to see is to use a virtualenv construct for this operator so that multiple versions of python libraries can exist on the datanode where the operator is currently executing etc. Happy to collaborate and discuss regarding the pull request 613 but wanted to confirm the above thinking before the task is taken up.

          People

          • Assignee:
            Unassigned
            Reporter:
            thw Thomas Weise
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development