Details
-
New Feature
-
Status: Closed
-
Minor
-
Resolution: Later
-
None
-
None
-
None
-
None
Description
This component should allow the input datasets to be read as Matrix Rows.
A Map-Reduce Algorithm should handle any dataset in a matrix format, where the collumns are the attributes (and one of them is the Label) and the rows are the datas.
Working with Hadoop, we'll need to pass the dataset in the mapper's input, so it must be a file (or many files). We'll then need a custom InputFormat to feed the mappers with the data, and here comes the lovely-named "row-wise splitting matrix input format".
Now we want to be able to work with any given dataset file format (including the ARFF and my custom format), and thus the InputFormat needs a decoder that converts the dataset lines into matrix rows.