Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7125

Support strings in the DELIMITED BY statement

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.13.0
    • None
    • Query Processor
    • None

    Description

      Hi,
      I came to work with a dataset which look like that:
      dataset.txt:
      salut|;les|;|amiches
      comment|;|allez|;|vous

      This dataset's delimiter is not a specific character like | or ; but a string, |;| in this case.

      Therefore I have created an external table with this delimiter:
      hive> create external table ds (f1 string, f2 string, f3 string)
      row format delimited fields terminated by '|;|'
      location '/user/remy/dataset';

      But I got this error:

      MismatchedTokenException(5!=301)
      at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
      at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
      at org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormatFieldIdentifier(HiveParser.java:31433)
      at org.apache.hadoop.hive.ql.parse.HiveParser.rowFormatDelimited(HiveParser.java:30386)
      at org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormat(HiveParser.java:30662)
      at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:4683)
      at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2144)
      at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
      at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
      at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
      at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:373)
      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:291)
      at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:944)
      at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:880)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:870)
      at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
      at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
      at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
      FAILED: ParseException line 1:102 mismatched input '|' expecting StringLiteral near 'by' in table row format's field separator

      The workaround was to run a mapreduce job to preprocess the data and replace the delimiter by a single and unused character (my client uses a three characters delimiter in order to ensure that the sequence won't appear elsewhere in the csv).
      However, it would be nice to be able to directly integrate it into an external table.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rémy Rémy Saissy
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: