Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: MLlib
    • Labels:
      None

      Description

      Streaming LDA can be a natural extension from online LDA.

      Yet for now we need to settle down the implementation for LDA prediction, to support the predictOn method in the streaming version.

        Issue Links

          Activity

          Hide
          redsofa Rene Richard added a comment -

          Hello,

          We'd like to use Online LDA to do something like change point detection. We need to have access to the intermediate topic lists after each new batch is processed. That way we can see how the topics change over time. As far as I understand it, the current implementation of OnlineLDA in MLLib doesn't expose intermediate topic lists per mini-batch processing. Will the predictOn method give us access to topics as they evolve with new data? I am relatively new to Spark but I find that having two APIs (spark.ml and spark.mllib) is a bit confusing. Will these be merged together in the future ?

          Show
          redsofa Rene Richard added a comment - Hello, We'd like to use Online LDA to do something like change point detection. We need to have access to the intermediate topic lists after each new batch is processed. That way we can see how the topics change over time. As far as I understand it, the current implementation of OnlineLDA in MLLib doesn't expose intermediate topic lists per mini-batch processing. Will the predictOn method give us access to topics as they evolve with new data? I am relatively new to Spark but I find that having two APIs (spark.ml and spark.mllib) is a bit confusing. Will these be merged together in the future ?
          Hide
          mrmorgan Matthew Morgan added a comment -

          Here's my request for it; I'd like to use it for a workflow that analyzes a stream of newly published public documents as they become available.

          Show
          mrmorgan Matthew Morgan added a comment - Here's my request for it; I'd like to use it for a workflow that analyzes a stream of newly published public documents as they become available.
          Hide
          josephkb Joseph K. Bradley added a comment -

          I have not actually heard many requests for it. Have you, or do you have an important use case?

          Show
          josephkb Joseph K. Bradley added a comment - I have not actually heard many requests for it. Have you, or do you have an important use case?
          Hide
          yuhaoyan yuhao yang added a comment -

          Hi Joseph K. Bradley, I got a prototype on this. Is this a desirable feature? Surely we can put it on hold if more evaluation is required.

          Show
          yuhaoyan yuhao yang added a comment - Hi Joseph K. Bradley , I got a prototype on this. Is this a desirable feature? Surely we can put it on hold if more evaluation is required.

            People

            • Assignee:
              Unassigned
              Reporter:
              yuhaoyan yuhao yang
            • Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:

                Development