Uploaded image for project: 'REEF'
  1. REEF
  2. REEF-1771 Enable IDistributedDataSet in .NET for Parquet files
  3. REEF-1765

Building a Parquet Reader for Potential Integrations with Other ML Frameworks

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.16
    • Fix Version/s: 0.16
    • Component/s: REEF.NET IO
    • Labels:

      Description

      Parquet file format is very common in some well-known frameworks like Hadoop and Spark. By enabling REEF to read parquet file, we could potentially integrate with those frameworks. Currently we want to only support data of non-nested types with a table-like property. This allows us to transform the data into formats like RDDs, etc.

      A draft of ParquetReader is provided here in a PR: https://github.com/apache/reef/pull/1283

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              shouhengyi Shouheng Yi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 336h
                336h
                Remaining:
                Remaining Estimate - 336h
                336h
                Logged:
                Time Spent - Not Specified
                Not Specified