Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14459

SQL partitioning must match existing tables, but is not checked.

    XMLWordPrintableJSON

    Details

    • Target Version/s:

      Description

      Writing into partitioned Hive tables has unexpected results because the table's partitioning is not detected and applied during the analysis phase.

      For example, if I have two tables, source and partitioned, with the same column types:

      CREATE TABLE source (id bigint, data string, part string);
      CREATE TABLE partitioned (id bigint, data string) PARTITIONED BY (part string);
      
      // copy from source to partitioned
      sqlContext.table("source").write.insertInto("partitioned")
      

      Copying from source to partitioned succeeds, but results in 0 rows. This works if I explicitly partition by adding ...write.partitionBy("part").insertInto(...). This work-around isn't obvious and is prone to error because the partitionBy must match the table's partitioning, though it is not checked.

      I think when relations are resolved, the partitioning should be checked and updated if it isn't set.

        Attachments

          Activity

            People

            • Assignee:
              rdblue Ryan Blue
              Reporter:
              rdblue Ryan Blue
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: