Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-8439

Getting col__ prefix for columns that are not special when extractHeader is enabled

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.21.0
    • None
    • Metadata, SQL Parser
    • Enabled extractHeader in the csv config of dfs plugin.
      No. of drillbits: Single
      OS: Windows

    Description

      As per documentation, Drill appends col_ to the columns that start with a number or special characters.

      /**
       * Prefix used to replace non-alphabetic characters at the start of
       * a column name. For example, $foo becomes col_foo. Used
       * because SQL does not allow _foo.
       */
      public static final String COLUMN_PREFIX = "col_";
      

      But in my case I'm getting it even for all alphabetical column name.


      I have the following data in the CSV file,

      PRODUCTID PRODUCTNAME SUPPLIERID CATEGORYID UNIT PRICE
      1 Chais 1 1 10 boxes x 20 bags 18
      2 Chang 1 1 24 - 12 oz bottles 19
      3 Aniseed Syrup 1 2 12 - 550 ml bottles 10
      4 Chef Anton's Cajun Seasoning 2 2 48 - 6 oz jars 22
      5 Chef Anton's Gumbo Mix 2 2 36 boxes 21.35

       

      While querying on the csv file using following query:

      SELECT * FROM dfs.`/var/lib/PRODUCT.csv`

      The output is 


      I know about other criterias like

      #UNITS is changed to col_UNITS

      FINANCIAL$RECORD is changed to FINANCIAL_RECORD

      But what's with PRODUCTID; Why is it changed to col__PRODUCTID_? In this case it has appended extra underscores also. 

      Attachments

        1. bomInColData.PNG
          19 kB
          Diksha Chaturvedi
        2. bomInColDataInBeginning.PNG
          20 kB
          Diksha Chaturvedi
        3. bomInEnd.PNG
          4 kB
          Diksha Chaturvedi
        4. bomInMiddle.PNG
          6 kB
          Diksha Chaturvedi
        5. bomInsideColumnName.PNG
          4 kB
          Diksha Chaturvedi
        6. bomInsideColumnName-1.PNG
          4 kB
          Diksha Chaturvedi
        7. image-2023-06-05-18-05-25-417.png
          7 kB
          Diksha Chaturvedi
        8. image-2023-06-05-18-16-47-293.png
          11 kB
          Diksha Chaturvedi

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            dikshac Diksha Chaturvedi

            Dates

              Created:
              Updated:

              Slack

                Issue deployment