Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-71

Dataset to Matrix Reader

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Later
    • None
    • None
    • None
    • None

    Description

      This component should allow the input datasets to be read as Matrix Rows.

      A Map-Reduce Algorithm should handle any dataset in a matrix format, where the collumns are the attributes (and one of them is the Label) and the rows are the datas.

      Working with Hadoop, we'll need to pass the dataset in the mapper's input, so it must be a file (or many files). We'll then need a custom InputFormat to feed the mappers with the data, and here comes the lovely-named "row-wise splitting matrix input format".

      Now we want to be able to work with any given dataset file format (including the ARFF and my custom format), and thus the InputFormat needs a decoder that converts the dataset lines into matrix rows.

      Attachments

        Activity

          People

            adeneche Abdel Hakim Deneche
            adeneche Abdel Hakim Deneche
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: