Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7588

Function TABLE + option lineDelimiter = '\r\n' eats sometime first char of a row

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.17.0
    • Fix Version/s: None
    • Component/s: Functions - Drill
    • Labels:
      None

      Description

      With a TSV file (demo.tsv.gz in attachment) generated on Windows (EOL = \r\n).
      The file contains some special char like

      http://bouzbal-fans.blogspot.com/search/label/Ã\230£Ã\230®Ã\230¨Ã\230§Ã\230± Ã\230¨Ã\231Ë\206Ã\230²Ã\230¨Ã\230§Ã\231â\200\236
      

      The next request sometimes eat the first char of a line

      --CREATE TABLE dfs.test.`result_pqt` AS (
      SELECT 
        columns[0] as d
       ,CAST(to_timestamp(columns[0],'MM/dd/yy HH:mm:ss a') AS TIMESTAMP) 
      FROM TABLE(dfs.test.`demo.tsv` (type => 'text', extractHeader => false, fieldDelimiter => '\t', lineDelimiter => '\r\n'))
      --)
      java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: Invalid format: "/19/2015 9:33:39 AM"
      

      The string "^/19/2015 9:33:39 AM" doesn't exists. Month is already present in this field in the TSV (so here there is "3/19/2015 9:33:39 AM" in the file demo.tsv).

      If '\r\n' are replaced by '\n' with sed before the request, the result is correct as well with lineDelimiter => '\r\n' as lineDelimiter => '\n' or without function TABLE (there is no error and the date is correctly converted with to_timestamp function / columns d is correct in the result_pqt)

      keeping '\r\n' and trying to move (in another line in demo.tsv) the line that produce error can prevent error (why ?)
      keeping '\r\n' and trying to remove/modify one or more special char (like in "thá»\235i trang jean") can prevent error (why ?)

      Didn't manage to reduce more the file demo.tsv while keeping the problem.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              benj641 benj

              Dates

              • Created:
                Updated:

                Issue deployment