Pig
  1. Pig
  2. PIG-2514

REGEX_EXTRACT not returning correct group with non greedy regex

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.11
    • Fix Version/s: 0.11
    • Component/s: internal-udfs
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Hadoop Flags:
      Reviewed

      Description

      Hello,

      REGEX_EXTRACT is using Matcher.find() instead of Matcher.matches() and so does not work with some non greedy regular expression.

      Is it the wanted behavior?

      Thanks,

      Romain

      http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html

      The matches method attempts to match the entire input sequence against the pattern.

      The find method scans the input sequence looking for the next subsequence that matches the pattern.

      System.out.println("Pig's way with m.find()");
      String a = "hdfs://mygrid.com/projects/";
      Matcher m = Pattern.compile("(.+?)/?").matcher(a);
      System.out.println(m.find());
      System.out.println(m.group(1));
      System.out.println(m.start());
      System.out.println(m.end());

      System.out.println("\nm.matches()");
      a = "hdfs://mygrid.com/projects/";
      m = Pattern.compile("(.+?)/?").matcher(a);
      System.out.println(m.matches());
      System.out.println(m.group(1));
      System.out.println(m.start());
      System.out.println(m.end());

      System.out.println("\nREGEX_EXTRACT m.find()");
      Tuple t = TupleFactory.getInstance().newTuple();
      t.append(a);
      t.append("(.+?)/?");
      t.append(1);
      System.out.println(new TestPigExtractAll().new REGEX_EXTRACT().exec(t));

      1. PIG-2514.2.patch
        6 kB
        Romain Rigaux
      2. PIG-2514-doc.patch
        0.6 kB
        Romain Rigaux
      3. PIG-2514.patch
        3 kB
        Romain Rigaux

        Activity

        Romain Rigaux created issue -
        Romain Rigaux made changes -
        Field Original Value New Value
        Attachment PIG-2514.txt [ 12513551 ]
        Romain Rigaux made changes -
        Attachment PIG-2514.patch [ 12515635 ]
        Attachment PIG-2514-doc.patch [ 12515636 ]
        Romain Rigaux made changes -
        Attachment PIG-2514.txt [ 12513551 ]
        Romain Rigaux made changes -
        Attachment PIG-2514.2.patch [ 12516603 ]
        Daniel Dai made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Resolution Fixed [ 1 ]
        Bill Graham made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Romain Rigaux
            Reporter:
            Romain Rigaux
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development