Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      When using saveAsTable in append mode, data will be written to the wrong location for non-managed Datasource tables. The following example illustrates this.

      It seems somehow pass the wrong table path to InsertIntoHadoopFsRelation from DataFrameWriter. Also, we should probably remove the repair table call at the end of saveAsTable in DataFrameWriter. That shouldn't be needed in either the Hive or Datasource case.

      scala> spark.sqlContext.range(100).selectExpr("id", "id as A", "id as B").write.partitionBy("A", "B").mode("overwrite").parquet("/tmp/test")
      
      scala> sql("create table test (id long, A int, B int) USING parquet OPTIONS (path '/tmp/test') PARTITIONED BY (A, B)")
      
      scala> sql("msck repair table test")
      
      scala> sql("select * from test where A = 1").count
      res6: Long = 1
      
      scala> spark.sqlContext.range(10).selectExpr("id", "id as A", "id as B").write.partitionBy("A", "B").mode("append").saveAsTable("test")
      
      scala> sql("select * from test where A = 1").count
      res8: Long = 1
      

        Attachments

          Activity

            People

            • Assignee:
              ekhliang Eric Liang
              Reporter:
              ekhliang Eric Liang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: