XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.4.0
    • SQL
    • None
    • Spark 1.5 release

    Description

      1. Renaming FSBasedRelation to HadoopFsRelation
        Since itss all coupled with Hadoop FileSystem and job API.
      2. HadoopFsRelation should have a no-arg constructor
        paths and partitionColumns should just be methods to be overridden, rather than constructor arguments. This makes data source developers life easier by having a no-arg constructor and being serialization friendly.
      3. Renaming HadoopFsRelation.prepareForWrite to HadoopFsRelation.prepareJobForWrite
        The new name explicitly suggests developers should only touch the Job instance for preparation work (which is also documented in Scaladoc).
      4. Allowing serialization while creating {{OutputWriter}}s
        To be more precise, {{OutputWriter}}s are never created on driver side and serialized to executor side. But the factory that creates {{OutputWriter}}s should be created on driver side and serialized.
        The reason behind this is that, passing all needed materials to OutputWriter instances via Hadoop Configuration is doable but sometimes neither intuitive nor convenient. Resorting to serialization makes data source developers' life easier. Actually this happens when I was migrating the Parquet data source, and wanted to pass the final output path (instead of temporary work path) to the output writer (see here). There I have to put a property into the Configuration object.

      Attachments

        Activity

          People

            lian cheng Cheng Lian
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: