Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2204

Underscore in where does not work for multi-line text

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.2
    • Impala 2.5.0
    • None

    Description

      I exported a table as text using create table as select, and found that I could not find records that contained text with an underscore in it.

      To reproduce.
      Create file underscore.txt with this content:

      First record has_underscore in the first line
      but not in the second line;Second 
      record has_underscore in the second line;And third
      record
      has_underscore
      in the third line
      

      Put it in HDFS

      hadoop fs -mkdir /user/alanj/Underscore
      hadoop fs -put underscore.txt /user/alanj/Underscore/
      

      Create a table and query it:

      > create external table underscore(val string) row format delimited lines terminated by ';' location '/user/alanj/Underscore';
      > select count(*) from underscore;
      Query: select count(*) from underscore
      +----------+
      | count(*) |
      +----------+
      | 3        |
      +----------+
      Fetched 1 row(s) in 0.13s
      > select count(*) from underscore where val like '%_%';;
      Query: select count(*) from underscore where val like '%_%'
      +----------+
      | count(*) |
      +----------+
      | 0        |
      +----------+
      Fetched 1 row(s) in 0.13s
      > select count(*) from underscore where val like '%has_underscore%';
      Query: select count(*) from underscore where val like '%has_underscore%'
      +----------+
      | count(*) |
      +----------+
      | 0        |
      +----------+
      Fetched 1 row(s) in 0.13s
      > select count(*) from underscore where val like '%underscore%';
      Query: select count(*) from underscore where val like '%underscore%'
      +----------+
      | count(*) |
      +----------+
      | 3        |
      +----------+
      Fetched 1 row(s) in 0.16s
      

      Notice that all the queries that had an _ in them didn't work. It seems like this works with single-line text but not multi-line based on a very short experiment I did. I think it also works in non-text formats, but if you use create table as select you often wind up with text.

      Attachments

        Activity

          People

            kwho Michael Ho
            alanj_impala_5a78 Alan Jackoway
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: