Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.8.1
    • Component/s: UDF
    • Labels:
    • Environment:

      Hadoop 0.20.1
      Java 1.6.0

    • Tags:
      UDF, urlencode,urldecode

      Description

      Current releases of Hive lacks a function which would encode URL or form parameters or it escapes the URI.
      The function URI_ESCAPE (uri) would return the encoded form of the URI which would be useful while using HiveQL.Its always advisable to encode URL or form parameters; plain form parameter is vulnerable to cross site attack, SQL injection and may direct our web application into some unpredicted output.

      Functionality :-

      Function Name: URI_ESCAPE (uri)

      Returns the encoded form of the uri.
      Example: hive> SELECT URI_ESCAPE('http://www.example.com?a=l&t');
      -> 'http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t'

      Usage :-

      Case 1 : To get encoded uri corresponding to a particular uri

      hive> SELECT URI_ESCAPE('http://google.com/resource?key=value1 & value2');

      -> 'http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2'

      Case 2 : To query a table to get encoded form of the urls corresponding to users
      Table :- USER_URLS
      userid |url

      USR00001|http://www.example.com?a=l&t
      USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf
      USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4
      USR01000|http://google.com/resource?key=value
      USR10000|http://google.com/resource?key=value1 & value2
      USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
      USR10010|gopher://gopher.voa.gov
      USR10100|http://www.apple.com/index.html
      USR11000|file:/data/letters/to_mom.txt
      USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html

      Query : select userid,url,uri_escape(uri) from USER_URLS;

      Result :-
      USR00001|http://www.example.com?a=l&t|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t
      USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf|http://search.barnesandnoble.com/booksearch/first%20book.pdf
      USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
      USR01000|http://google.com/resource?key=value|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
      USR10000|http://google.com/resource?key=value1 & value2|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
      USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
      USR10010|gopher://gopher.voa.gov|gopher%3A%2F%2Fgopher.voa.gov
      USR10100|http://www.apple.com/index.html|http%3A%2F%2Fwww.apple.com%2Findex.html
      USR11000|file:/data/letters/to_mom.txt|file%3A%2Fdata%2Fletters%2Fto_mom.txt
      USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html

      Current releases of Hive lacks a function which would decode the encoded uri.
      The function URI_UNESCAPE (uri) would return the decoded form of the encoded URI which would be useful while using HiveQL.This function converts the specified string by replacing any escape sequences with their unescaped representation.

      Functionality :-

      Function Name: URI_UNESCAPE (uri)

      Returns the decoded form of the encoded uri.
      Example: hive> SELECT URI_UNESCAPE('http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t');
      -> 'http://www.example.com?a=l&t'

      Usage :-

      Case 1 : To get decoded uri corresponding to a particular encoded uri

      hive> SELECT URI_UNESCAPE('http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2');
      -> 'http://google.com/resource?key=value1 & value2'

      Case 2 : To query a table to get decoded form of the encoded urls corresponding to users
      Table :- USER_URLS
      userid |encodedurl

      USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t
      USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf
      USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
      USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
      USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
      USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
      USR10010|gopher%3A%2F%2Fgopher.voa.gov
      USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html
      USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt
      USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html

      Query : select userid,encodedurl,uri_unescape(encodedurl) from USER_URLS;

      Result :-
      USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t|http://www.example.com?a=l&t
      USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf|http://search.barnesandnoble.com/booksearch/first book.pdf
      USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4
      USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue|http://google.com/resource?key=value
      USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2|http://google.com/resource?key=value1 & value2
      USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
      USR10010|gopher%3A%2F%2Fgopher.voa.gov|gopher://gopher.voa.gov
      USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html|http://www.apple.com/index.html
      USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt|file:/data/letters/to_mom.txt
      USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html|http://www.cuug.ab.ca:8001/~branderr/csce.html

      1. HIVE-3906.1.patch.txt
        13 kB
        Jothy Babu
      2. udf_uri_escape.q
        0.2 kB
        Jothy Babu
      3. udf_uri_escape.q.out
        2 kB
        Jothy Babu
      4. udf_uri_unescape.q
        0.3 kB
        Jothy Babu
      5. udf_uri_unescape.q.out
        2 kB
        Jothy Babu

        Activity

        Liu Zongquan created issue -
        Jothy Babu made changes -
        Field Original Value New Value
        Description Current releases of Hive lacks a function which would encode URL or form parameters or it escapes the URI.
        The function URI_ESCAPE (uri) would return the encoded form of the URI which would be useful while using HiveQL.Its always advisable to encode URL or form parameters; plain form parameter is vulnerable to cross site attack, SQL injection and may direct our web application into some unpredicted output.

        Functionality :-

        Function Name: URI_ESCAPE (uri)

        Returns the encoded form of the uri.
        Example: hive> SELECT URI_ESCAPE('http://www.example.com?a=l&t');
        -> 'http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t'

        Usage :-

        Case 1 : To get encoded uri corresponding to a particular uri

        hive> SELECT URI_ESCAPE('http://google.com/resource?key=value1 & value2');

        -> 'http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2'

        Case 2 : To query a table to get encoded form of the urls corresponding to users
        Table :- USER_URLS
        userid |url

        USR00001|http://www.example.com?a=l&t
        USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf
        USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4
        USR01000|http://google.com/resource?key=value
        USR10000|http://google.com/resource?key=value1 & value2
        USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
        USR10010|gopher://gopher.voa.gov
        USR10100|http://www.apple.com/index.html
        USR11000|file:/data/letters/to_mom.txt
        USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html

        Query : select userid,url,uri_escape(uri) from USER_URLS;

        Result :-
        USR00001|http://www.example.com?a=l&t|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t
        USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf|http://search.barnesandnoble.com/booksearch/first%20book.pdf
        USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
        USR01000|http://google.com/resource?key=value|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
        USR10000|http://google.com/resource?key=value1 & value2|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
        USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
        USR10010|gopher://gopher.voa.gov|gopher%3A%2F%2Fgopher.voa.gov
        USR10100|http://www.apple.com/index.html|http%3A%2F%2Fwww.apple.com%2Findex.html
        USR11000|file:/data/letters/to_mom.txt|file%3A%2Fdata%2Fletters%2Fto_mom.txt
        USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html


        Current releases of Hive lacks a function which would decode the encoded uri.
        The function URI_UNESCAPE (uri) would return the decoded form of the encoded URI which would be useful while using HiveQL.This function converts the specified string by replacing any escape sequences with their unescaped representation.

        Functionality :-

        Function Name: URI_UNESCAPE (uri)

        Returns the decoded form of the encoded uri.
        Example: hive> SELECT URI_UNESCAPE('http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t');
        -> 'http://www.example.com?a=l&t'

        Usage :-

        Case 1 : To get decoded uri corresponding to a particular encoded uri

        hive> SELECT URI_UNESCAPE('http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2');
        -> 'http://google.com/resource?key=value1 & value2'

        Case 2 : To query a table to get decoded form of the encoded urls corresponding to users
        Table :- USER_URLS
        userid |encodedurl

        USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t
        USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf
        USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
        USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
        USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
        USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
        USR10010|gopher%3A%2F%2Fgopher.voa.gov
        USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html
        USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt
        USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html

        Query : select userid,encodedurl,uri_unescape(encodedurl) from USER_URLS;

        Result :-
        USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t|http://www.example.com?a=l&t
        USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf|http://search.barnesandnoble.com/booksearch/first book.pdf
        USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4
        USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue|http://google.com/resource?key=value
        USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2|http://google.com/resource?key=value1 & value2
        USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
        USR10010|gopher%3A%2F%2Fgopher.voa.gov|gopher://gopher.voa.gov
        USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html|http://www.apple.com/index.html
        USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt|file:/data/letters/to_mom.txt
        USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html|http://www.cuug.ab.ca:8001/~branderr/csce.html


        Jothy Babu made changes -
        Attachment HIVE-3906.1.patch.txt [ 12567520 ]
        Attachment udf_uri_escape.q [ 12567521 ]
        Attachment udf_uri_escape.q.out [ 12567522 ]
        Attachment udf_uri_unescape.q [ 12567523 ]
        Attachment udf_uri_unescape.q.out [ 12567524 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Liu Zongquan
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 96h
              96h
              Remaining:
              Remaining Estimate - 96h
              96h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development