Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6140

trim udf is very slow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • UDF
    • None

    Description

      Paraphrasing what was reported by cartershanklin -

      I used the attached Perl script to generate 500 million two-character strings which always included a space. I loaded it using:
      create table letters (l string);
      load data local inpath '/home/sandbox/data.csv' overwrite into table letters;
      Then I ran this SQL script:
      select count(l) from letters where l = 'l ';
      select count(l) from letters where trim(l) = 'l';

      First query = 170 seconds
      Second query = 514 seconds

      Attachments

        1. temp.pl
          0.1 kB
          Thejas Nair

        Activity

          People

            analog.sony Anandha L Ranganathan
            thejas Thejas Nair
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: