Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-852

Add validation to check Table name when Append Mode is used in DataSource writer

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      Copied from user's description in mailing list:

      Table name is not respected while inserting record with different table name with Append mode

       

      // While running commands from Hudi quick start guide, I found that the
      library does not check for the table name in the request against the table
      name in the metadata available in the base path, I think it should throw
      TableAlreadyExist, In case of Save mode: *overwrite *it warns.
      
      *spark-2.4.4-bin-hadoop2.7/bin/spark-shell   --packages
      org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4
       --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'*
      
      scala> df.write.format("hudi").
           |     options(getQuickstartWriteConfigs).
           |     option(PRECOMBINE_FIELD_OPT_KEY, "ts").
           |     option(RECORDKEY_FIELD_OPT_KEY, "uuid").
           |     option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
      *     |     option(TABLE_NAME, "test_table").*
           |     mode(*Append*).
           |     save(basePath)
      20/04/29 17:23:42 WARN DefaultSource: Snapshot view not supported yet via
      data source, for MERGE_ON_READ tables. Please query the Hive table
      registered using Spark SQL.
      
      scala>
      
      No exception is thrown if we run this
      
      scala> df.write.format("hudi").
           |     options(getQuickstartWriteConfigs).
           |     option(PRECOMBINE_FIELD_OPT_KEY, "ts").
           |     option(RECORDKEY_FIELD_OPT_KEY, "uuid").
           |     option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
      *     |     option(TABLE_NAME, "foo_table").*
           |     mode(*Append*).
           |     save(basePath)
      20/04/29 17:24:37 WARN DefaultSource: Snapshot view not supported yet via
      data source, for MERGE_ON_READ tables. Please query the Hive table
      registered using Spark SQL.
      
      scala>
      
      
      scala> df.write.format("hudi").
           |   options(getQuickstartWriteConfigs).
           |   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
           |   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
           |   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
           |   option(TABLE_NAME, *tableName*).
           |   mode(*Overwrite*).
           |   save(basePath)
      *20/04/29 22:25:16 WARN HoodieSparkSqlWriter$: hoodie table at
      file:/tmp/hudi_trips_cow already exists. Deleting existing data &
      overwriting with new data.*
      20/04/29 22:25:18 WARN DefaultSource: Snapshot view not supported yet via
      data source, for MERGE_ON_READ tables. Please query the Hive table
      registered using Spark SQL.
      
      scala>
      
      
      

       

        Attachments

          Activity

            People

            • Assignee:
              aakashpradeep Aakash Pradeep
              Reporter:
              bhavanisudha Bhavani Sudha

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment