Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8449

HDF5 read/write support for Spark MLlib

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.4.0
    • None
    • MLlib

    Description

      Add support for reading and writing HDF5 file format to/from LabeledPoint. HDFS and local file system have to be supported. Other Spark formats to be discussed.

      Interface proposal:
      /* path - directory path in any Hadoop-supported file system URI */
      MLUtils.saveAsHDF5(sc: SparkContext, path: String, RDD[LabeledPoint]): Unit
      /* path - file or directory path in any Hadoop-supported file system URI */
      MLUtils.loadHDF5(sc: SparkContext, path: String): RDD[LabeledPoint]

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              avulanov Alexander Ulanov
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 96h
                  96h
                  Remaining:
                  Remaining Estimate - 96h
                  96h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified