Uploaded image for project: 'SAMOA'
  1. SAMOA
  2. SAMOA-40

Add Kafka stream reader modules to consume data from Kafka framework

    XMLWordPrintableJSON

Details

    • Task
    • Status: To Do
    • Minor
    • Resolution: Unresolved
    • Infrastructure, SAMOA-API
    • OS X Version 10.10.3
    • Patch

    Description

      Apache SAMOA is designed to process streaming data and develop streaming machine learning
      algorithm. Currently, SAMOA framework supports stream data read from Arff files only.
      Thus, while using SAMOA as a streaming machine learning component in real time use-cases,
      writing and reading data from files is slow and inefficient.

      A single Kafka broker can handle hundreds of megabytes of reads and writes per second
      from thousands of clients. The ability to read data directly from Apache Kafka into SAMOA will
      not only improve performance but also make SAMOA pluggable to many real time machine
      learning use cases such as Internet of Things(IoT).

      GOAL:
      Add code that enables SAMOA to read data from Apache Kafka as a stream data.
      Kafka stream reader supports following different options for streaming:

      a) Topic selection - Kafka topic to read data
      b) Partition selection - Kafka partition to read data
      c) Batching - Number of data instances read from Kafka in one read request to Kafka
      d) Configuration options - Kafka port number, seed information, time delay between two read requests

      Components:
      KafkaReader - Consists for APIs to read data from Kafka
      KafkaStream - Stream source for SAMOA providing data read from Kafka
      Dependencies for Kafka are added in pom.xml for in samoa-api component.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              karande Vishal Karande
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 168h
                  168h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified