Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2456

regexp_replace using hex codes fails on larger JSON data sets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.0
    • Future
    • Functions - Drill
    • None
    • Drill 0.7
      MapR 4.0.1
      CentOS

    Description

      This query works with only 1 file

      select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id) from dfs.twitter.`/feed/2015/03/13/17/FlumeData.1426267859699.json` group by `text` order by count(id) desc limit 10;

      This one fails with multiple files

      select regexp_replace(`text`, '[^\x20-\xad]', '°'), count(id) from dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 10;

      Query failed: Query failed: Failure while trying to start remote fragment, Encountered an illegal char on line 1, column 31: '' [ 43ff1aa4-4a71-455d-b817-ec5eb8d179bb on twitternode:31010 ]

      Using text in regexp_replace does work for same dataset.
      This query works fine on full data set.

      select regexp_replace(`text`, '[^ -~¡-ÿ]', '°'), count(id) from dfs.twitter.`/feed/2015/03/13` group by `text` order by count(id) desc limit 10;

      Attached snippet drillbit.log for error

      Attachments

        1. drillbit.log
          24 kB
          Andries Engelbrecht

        Activity

          People

            Unassigned Unassigned
            aengelbrecht Andries Engelbrecht
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: