Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-307

Support executor side scan using CarbonInputFormat

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.1.0-incubating
    • None
    • spark-integration
    • None

    Description

      Currently, there are two read path in carbon-spark module:
      1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
      In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan.

      2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor
      In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan

      Because of this, there are unnecessary duplicate code, they need to be unified.
      The target approach should be:
      sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor

      Attachments

        Activity

          People

            Unassigned Unassigned
            jackylk Jacky Li
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 50m
                2h 50m