Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7588

Function TABLE + option lineDelimiter = '\r\n' eats sometime first char of a row

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.17.0
    • None
    • Functions - Drill
    • None

    Description

      With a TSV file (demo.tsv.gz in attachment) generated on Windows (EOL = \r\n).
      The file contains some special char like

      http://bouzbal-fans.blogspot.com/search/label/Ã\230£Ã\230®Ã\230¨Ã\230§Ã\230± Ã\230¨Ã\231Ë\206Ã\230²Ã\230¨Ã\230§Ã\231â\200\236
      

      The next request sometimes eat the first char of a line

      --CREATE TABLE dfs.test.`result_pqt` AS (
      SELECT 
        columns[0] as d
       ,CAST(to_timestamp(columns[0],'MM/dd/yy HH:mm:ss a') AS TIMESTAMP) 
      FROM TABLE(dfs.test.`demo.tsv` (type => 'text', extractHeader => false, fieldDelimiter => '\t', lineDelimiter => '\r\n'))
      --)
      java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: Invalid format: "/19/2015 9:33:39 AM"
      

      The string "^/19/2015 9:33:39 AM" doesn't exists. Month is already present in this field in the TSV (so here there is "3/19/2015 9:33:39 AM" in the file demo.tsv).

      If '\r\n' are replaced by '\n' with sed before the request, the result is correct as well with lineDelimiter => '\r\n' as lineDelimiter => '\n' or without function TABLE (there is no error and the date is correctly converted with to_timestamp function / columns d is correct in the result_pqt)

      keeping '\r\n' and trying to move (in another line in demo.tsv) the line that produce error can prevent error (why ?)
      keeping '\r\n' and trying to remove/modify one or more special char (like in "thá»\235i trang jean") can prevent error (why ?)

      Didn't manage to reduce more the file demo.tsv while keeping the problem.

      Attachments

        1. demo.tsv.gz
          2.20 MB
          benj
        2. drill_json_profile_tsv.log
          10 kB
          benj
        3. drill_tsv.log
          227 kB
          benj

        Activity

          People

            Unassigned Unassigned
            benj641 benj
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: