Pig
  1. Pig
  2. PIG-2880

Pig current releases lack a UDF charAt.This UDF returns the char value at the specified index.

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: piggybank
    • Labels:
    • Hadoop Flags:
      Incompatible change
    • Tags:
      pig,CharAt,udf

      Description

      Pig current releases lack a UDF charAt.This UDF returns the char value at the specified index. An index ranges from 0 to length() - 1. The first char value of the sequence is at index 0, the next at index 1, and so on.

        Activity

        Hide
        Sunitha Muralidharan added a comment -

        Can we define a new UDF as follows

        package pigudf;

        import java.io.IOException;

        import org.apache.pig.EvalFunc;
        import org.apache.pig.data.Tuple;

        public class CharAt extends EvalFunc<String> {

        @Override
        public String exec(Tuple input) throws IOException

        { String str=(String) input.get(0);//string input int index=(Integer) input.get(1);//index return str.charAt(index)+""; }

        }

        Show
        Sunitha Muralidharan added a comment - Can we define a new UDF as follows package pigudf; import java.io.IOException; import org.apache.pig.EvalFunc; import org.apache.pig.data.Tuple; public class CharAt extends EvalFunc<String> { @Override public String exec(Tuple input) throws IOException { String str=(String) input.get(0);//string input int index=(Integer) input.get(1);//index return str.charAt(index)+""; } }
        Hide
        Dmitriy V. Ryaboy added a comment -

        Isn't this just a special case of substring?

        If you do feel strongly that this is a useful udf, I would like to suggest a couple of improvements:

        • use PrimitiveEvalFunc<String, String> instead of EvalFunc. It'll take care of null or empty inputs, etc, which the UDF you proposed will blow up on.
        • why add an empty string to a char? String.valueOf is quite a bit more efficient.
        Show
        Dmitriy V. Ryaboy added a comment - Isn't this just a special case of substring? If you do feel strongly that this is a useful udf, I would like to suggest a couple of improvements: use PrimitiveEvalFunc<String, String> instead of EvalFunc. It'll take care of null or empty inputs, etc, which the UDF you proposed will blow up on. why add an empty string to a char? String.valueOf is quite a bit more efficient.
        Hide
        Sunitha Muralidharan added a comment -

        Hi..
        Thanks for your response. This UDF will output a character at the specified index.
        For this UDF, we need two inputs. first one will be the string and the second one will be an index position. So in such situation, can we use PrimitiveEvalFunc<String,String>?

        Show
        Sunitha Muralidharan added a comment - Hi.. Thanks for your response. This UDF will output a character at the specified index. For this UDF, we need two inputs. first one will be the string and the second one will be an index position. So in such situation, can we use PrimitiveEvalFunc<String,String>?
        Hide
        Dmitriy V. Ryaboy added a comment -

        Ah, forgot about the second argument. In that case, just take care of nulls, wrong # or type of arguments, etc, in your code.

        Also note you can provide the @OutputSchema annotation to tell Pig what to expect to come out of this UDF.

        Show
        Dmitriy V. Ryaboy added a comment - Ah, forgot about the second argument. In that case, just take care of nulls, wrong # or type of arguments, etc, in your code. Also note you can provide the @OutputSchema annotation to tell Pig what to expect to come out of this UDF.
        Hide
        Archana A added a comment -

        Submitting the patch for CharAt UDF.

        Show
        Archana A added a comment - Submitting the patch for CharAt UDF.
        Hide
        Archana A added a comment -

        Attaching the patch for the CharAt

        Show
        Archana A added a comment - Attaching the patch for the CharAt
        Hide
        Jonathan Coveney added a comment -

        Removing patch status pending response to the following comments:

        1. you can in-line the checkIfNumber method, I don't know if there is a huge gain from making it a method? Don't really care.
        2. Please removed auto-generated IDE comments
        3. 2 space indents, no tabs please
        4. Since you refer to input.get(0) and input.get(1) multiple times, it's probably easier to just save those in local variables

        Show
        Jonathan Coveney added a comment - Removing patch status pending response to the following comments: 1. you can in-line the checkIfNumber method, I don't know if there is a huge gain from making it a method? Don't really care. 2. Please removed auto-generated IDE comments 3. 2 space indents, no tabs please 4. Since you refer to input.get(0) and input.get(1) multiple times, it's probably easier to just save those in local variables
        Hide
        Daniel Dai added a comment -

        Sunitha Muralidharan, are you still working on it?

        Show
        Daniel Dai added a comment - Sunitha Muralidharan , are you still working on it?

          People

          • Assignee:
            Unassigned
            Reporter:
            Sabir Ayappalli
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development