Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39203

Fix remote table location based on database location

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.1.1, 3.2.0, 3.3.0, 3.4.0
    • None
    • SQL
    • None

    Description

      We have HDFS and Hive on cluster A. We have Spark on cluster B and need to read data from cluster A. The table location is incorrect:

      spark-sql> desc formatted  default.test_table;
      fas_acct_id         	decimal(18,0)
      fas_acct_cd         	string
      cmpny_cd            	string
      entity_id           	string
      cre_date            	date
      cre_user            	string
      upd_date            	timestamp
      upd_user            	string
      
      # Detailed Table Information
      Database             default
      Table               	test_table
      Type                	EXTERNAL
      Provider            	parquet
      Statistics          	25310025737 bytes
      Location            	/user/hive/warehouse/test_table
      Serde Library       	org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
      InputFormat         	org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
      OutputFormat        	org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
      Storage Properties  	[compression=snappy]
      
      spark-sql> desc database default;
      Namespace Name	default
      Comment
      Location	viewfs://clusterA/user/hive/warehouse/
      Owner     hive_dba
      

      The correct table location should be viewfs://clusterA/user/hive/warehouse/test_table.

      Attachments

        Activity

          People

            Unassigned Unassigned
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: