Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-7377

Add option not to add CF to the Spark Column name in Spark Connector

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The fix of the  PHOENIX-4981 introduced a bracking change in the way the schema was inferred.

      In previous versions of the connector, for non default column family , columns mapped to "columnName" in DataFrame. Now, they are mapped to "columnFamily.columnName".

      There are no unit tests that cover this case, all tests uses tables with default column family "0".

      The change is made is this pull request (the project was moved to another git repo since):

      • In previous version code uses `ColumnInfo.getDisplayName` to define the name of the column in the DF.
      • The new class SparkSchemaUtil the method used is  `ColumnInfo.getColumnName` which returns the columnName as `columnFamilyName.columnName`.

      The pull request is related to this ticket PHOENIX-4981 the change is not documented.

      This change breaks jobs reading from tables having a non default column family.

      The saprk3 connector have the same issue since code has been duplicated from spark2 module to spark3 module.

      Since V1 api has been modified to use same method to resolve schema it has the same behavior and it should not bcause they are now a deprecated classes and should not contain a braking change.

       

      Resolution proposal:

      The best way to fix the issue is to add a property to have both options for schema  non default column family column name mapping.

      The issue is in spark connector and it's resolution will not have a side effect on other phoenix-connectors like phoenix5-hive for example.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rbr216 rejeb ben rejeb
            rbr216 rejeb ben rejeb
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment