Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34192

Move char padding to write side

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.1.1
    • SQL
    • None

    Description

      On the read side, the char length check and padding bring issues to CBO and PPD and other issues to the catalyst.

      It's more reasonable to do it on the write side, as Spark doesn't take full control of the storage layer.

      https://issues.apache.org/jira/browse/HIVE-13618

      For varchar and string, the below case still exists for the limitation of Hive metastore
      For char, we now write fixed-length values, the issue should be fixed

        test("SPARK-34192: Know issue of hive for tailing spaces") {
          // https://issues.apache.org/jira/browse/HIVE-13618
          // Trailing spaces in partition column will be treated differently
          // This is because Mysql and Derby(used in tests) considers 'a' = 'a '
          // whereas others like (Postgres, Oracle) doesn't exhibit this problem.
          Seq("char(5)", "string", "VARCHAR(5)").foreach { typ =>
            withTable("t") {
              sql(s"CREATE TABLE t(i STRING, c $typ) USING $format PARTITIONED BY (c)")
              sql(s"INSERT INTO t VALUES ('1', 'a ')")
              val e = intercept[AnalysisException](sql(s"INSERT INTO t VALUES ('1', 'a  ')"))
              assert(e.getMessage.contains("Expecting a partition with name c=a  ,"))
            }
          }
        }
      
      

      Attachments

        Activity

          People

            Qin Yao Kent Yao 2
            Qin Yao Kent Yao 2
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: