[SPARK-7591] FSBasedRelation interface tweaks - ASF JIRA

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Convert to Issue

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Renaming FSBasedRelation to HadoopFsRelation
Since itss all coupled with Hadoop FileSystem and job API.
HadoopFsRelation should have a no-arg constructor
paths and partitionColumns should just be methods to be overridden, rather than constructor arguments. This makes data source developers life easier by having a no-arg constructor and being serialization friendly.
Renaming HadoopFsRelation.prepareForWrite to HadoopFsRelation.prepareJobForWrite
The new name explicitly suggests developers should only touch the Job instance for preparation work (which is also documented in Scaladoc).
Allowing serialization while creating {{OutputWriter}}s
To be more precise, {{OutputWriter}}s are never created on driver side and serialized to executor side. But the factory that creates {{OutputWriter}}s should be created on driver side and serialized.
The reason behind this is that, passing all needed materials to OutputWriter instances via Hadoop Configuration is doable but sometimes neither intuitive nor convenient. Resorting to serialization makes data source developers' life easier. Actually this happens when I was migrating the Parquet data source, and wanted to pass the final output path (instead of temporary work path) to the output writer (see here). There I have to put a property into the Configuration object.