Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.1.0-incubating
-
None
-
None
Description
Currently, there are two read path in carbon-spark module:
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor for scan.
2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan
Because of this, there are unnecessary duplicate code, they need to be unified.
The target approach should be:
sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => CarbonInputFormat(CarbonRecordReader) => QueryExecutor
Attachments
1.
|
Use CarbonInputFormat in CarbonScanRDD compute | Resolved | Jacky Li |
|
||||||||
2.
|
Support two types of ReadSupport in CarbonRecordReader | Open | Unassigned | |||||||||
3.
|
Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation | Open | Unassigned | |||||||||
4.
|
Update CarbonSource to use CarbonDatasourceHadoopRelation | Open | Unassigned | |||||||||
5.
|
Make CarbonContext to use standard Datasource strategy | Open | Unassigned |