Hive
  1. Hive
  2. HIVE-1719

Move RegexSerDe out of hive-contrib and over to hive-serde

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
      currently in hive-serde. I think we should move it over to the hive-serde module so that
      users don't have to go to the added effort of manually registering the contrib jar before
      using it.

      1. ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3051.1.patch
        10 kB
        Phabricator
      2. ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3051.2.patch
        10 kB
        Phabricator
      3. ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3141.1.patch
        39 kB
        Phabricator
      4. ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.1.patch
        29 kB
        Phabricator
      5. ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.2.patch
        28 kB
        Phabricator
      6. ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.3.patch
        28 kB
        Phabricator
      7. ASF.LICENSE.NOT.GRANTED--HIVE-1719.D3249.4.patch
        28 kB
        Phabricator
      8. HIVE-1719.3.patch
        27 kB
        Shreepadma Venugopalan
      9. HIVE-1719.D3249.1.patch
        29 kB
        Shreepadma Venugopalan

        Activity

        Hide
        Lars Francke added a comment -

        Shreepadma Venugopalan Could it be that you attached a wrong patch? I can't find any deprecation warning in the old RegexSerde so things are now confusing for everyone.

        Show
        Lars Francke added a comment - Shreepadma Venugopalan Could it be that you attached a wrong patch? I can't find any deprecation warning in the old RegexSerde so things are now confusing for everyone.
        Hide
        Shreepadma Venugopalan added a comment -

        It was left in contrib so that we don't break backwards compatibility for existing users.

        Show
        Shreepadma Venugopalan added a comment - It was left in contrib so that we don't break backwards compatibility for existing users.
        Hide
        efan lee added a comment -

        Why is the RegexSerDe.java still exists in contrib directory? Is'nt it necessary to remove the file from contrib?

        Show
        efan lee added a comment - Why is the RegexSerDe.java still exists in contrib directory? Is'nt it necessary to remove the file from contrib?
        Ashutosh Chauhan made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Ashutosh Chauhan added a comment -

        This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

        Show
        Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
        HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde
        (Shreepadma Venugopalan via Carl Steinbach)

        Summary:
        Regex Serde Changes

        RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
        currently in hive-serde. I think we should move it over to the hive-serde module so that
        users don't have to go to the added effort of manually registering the contrib jar before
        using it.

        Test Plan: EMPTY

        Reviewers: JIRA, cwsteinbach

        Reviewed By: cwsteinbach

        Differential Revision: https://reviews.facebook.net/D3249 (Revision 1340256)

        Result = ABORTED
        cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1340256
        Files :

        • /hive/trunk/contrib/src/test/queries/clientnegative/serde_regex.q
        • /hive/trunk/contrib/src/test/queries/clientpositive/serde_regex.q
        • /hive/trunk/contrib/src/test/results/clientnegative/serde_regex.q.out
        • /hive/trunk/contrib/src/test/results/clientpositive/serde_regex.q.out
        • /hive/trunk/ql/src/test/queries/clientnegative/serde_regex.q
        • /hive/trunk/ql/src/test/queries/clientnegative/serde_regex2.q
        • /hive/trunk/ql/src/test/queries/clientnegative/serde_regex3.q
        • /hive/trunk/ql/src/test/queries/clientpositive/serde_regex.q
        • /hive/trunk/ql/src/test/results/clientnegative/serde_regex.q.out
        • /hive/trunk/ql/src/test/results/clientnegative/serde_regex2.q.out
        • /hive/trunk/ql/src/test/results/clientnegative/serde_regex3.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/serde_regex.q.out
        • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde (Shreepadma Venugopalan via Carl Steinbach) Summary: Regex Serde Changes RegexSerDe is as much a part of the standard Hive distribution as the other SerDes currently in hive-serde. I think we should move it over to the hive-serde module so that users don't have to go to the added effort of manually registering the contrib jar before using it. Test Plan: EMPTY Reviewers: JIRA, cwsteinbach Reviewed By: cwsteinbach Differential Revision: https://reviews.facebook.net/D3249 (Revision 1340256) Result = ABORTED cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1340256 Files : /hive/trunk/contrib/src/test/queries/clientnegative/serde_regex.q /hive/trunk/contrib/src/test/queries/clientpositive/serde_regex.q /hive/trunk/contrib/src/test/results/clientnegative/serde_regex.q.out /hive/trunk/contrib/src/test/results/clientpositive/serde_regex.q.out /hive/trunk/ql/src/test/queries/clientnegative/serde_regex.q /hive/trunk/ql/src/test/queries/clientnegative/serde_regex2.q /hive/trunk/ql/src/test/queries/clientnegative/serde_regex3.q /hive/trunk/ql/src/test/queries/clientpositive/serde_regex.q /hive/trunk/ql/src/test/results/clientnegative/serde_regex.q.out /hive/trunk/ql/src/test/results/clientnegative/serde_regex2.q.out /hive/trunk/ql/src/test/results/clientnegative/serde_regex3.q.out /hive/trunk/ql/src/test/results/clientpositive/serde_regex.q.out /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1439 (See https://builds.apache.org/job/Hive-trunk-h0.21/1439/)
        HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde
        (Shreepadma Venugopalan via Carl Steinbach)

        Summary:
        Regex Serde Changes

        RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
        currently in hive-serde. I think we should move it over to the hive-serde module so that
        users don't have to go to the added effort of manually registering the contrib jar before
        using it.

        Test Plan: EMPTY

        Reviewers: JIRA, cwsteinbach

        Reviewed By: cwsteinbach

        Differential Revision: https://reviews.facebook.net/D3249 (Revision 1340256)

        Result = FAILURE
        cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1340256
        Files :

        • /hive/trunk/contrib/src/test/queries/clientnegative/serde_regex.q
        • /hive/trunk/contrib/src/test/queries/clientpositive/serde_regex.q
        • /hive/trunk/contrib/src/test/results/clientnegative/serde_regex.q.out
        • /hive/trunk/contrib/src/test/results/clientpositive/serde_regex.q.out
        • /hive/trunk/ql/src/test/queries/clientnegative/serde_regex.q
        • /hive/trunk/ql/src/test/queries/clientnegative/serde_regex2.q
        • /hive/trunk/ql/src/test/queries/clientnegative/serde_regex3.q
        • /hive/trunk/ql/src/test/queries/clientpositive/serde_regex.q
        • /hive/trunk/ql/src/test/results/clientnegative/serde_regex.q.out
        • /hive/trunk/ql/src/test/results/clientnegative/serde_regex2.q.out
        • /hive/trunk/ql/src/test/results/clientnegative/serde_regex3.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/serde_regex.q.out
        • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1439 (See https://builds.apache.org/job/Hive-trunk-h0.21/1439/ ) HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde (Shreepadma Venugopalan via Carl Steinbach) Summary: Regex Serde Changes RegexSerDe is as much a part of the standard Hive distribution as the other SerDes currently in hive-serde. I think we should move it over to the hive-serde module so that users don't have to go to the added effort of manually registering the contrib jar before using it. Test Plan: EMPTY Reviewers: JIRA, cwsteinbach Reviewed By: cwsteinbach Differential Revision: https://reviews.facebook.net/D3249 (Revision 1340256) Result = FAILURE cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1340256 Files : /hive/trunk/contrib/src/test/queries/clientnegative/serde_regex.q /hive/trunk/contrib/src/test/queries/clientpositive/serde_regex.q /hive/trunk/contrib/src/test/results/clientnegative/serde_regex.q.out /hive/trunk/contrib/src/test/results/clientpositive/serde_regex.q.out /hive/trunk/ql/src/test/queries/clientnegative/serde_regex.q /hive/trunk/ql/src/test/queries/clientnegative/serde_regex2.q /hive/trunk/ql/src/test/queries/clientnegative/serde_regex3.q /hive/trunk/ql/src/test/queries/clientpositive/serde_regex.q /hive/trunk/ql/src/test/results/clientnegative/serde_regex.q.out /hive/trunk/ql/src/test/results/clientnegative/serde_regex2.q.out /hive/trunk/ql/src/test/results/clientnegative/serde_regex3.q.out /hive/trunk/ql/src/test/results/clientpositive/serde_regex.q.out /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
        Shreepadma Venugopalan made changes -
        Attachment HIVE-1719.patch [ 12526449 ]
        Shreepadma Venugopalan made changes -
        Attachment HIVE-1719.3.patch.txt [ 12526288 ]
        Carl Steinbach made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.10.0 [ 12320745 ]
        Resolution Fixed [ 1 ]
        Hide
        Carl Steinbach added a comment -

        Committed to trunk. Thanks Shreepadma!

        Show
        Carl Steinbach added a comment - Committed to trunk. Thanks Shreepadma!
        Hide
        Phabricator added a comment -

        shreepadma has closed the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".

        Closed by cws.

        REVISION DETAIL
        https://reviews.facebook.net/D3249

        COMMIT
        https://reviews.facebook.net/rHIVE1340256

        To: JIRA, cwsteinbach, shreepadma

        Show
        Phabricator added a comment - shreepadma has closed the revision " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". Closed by cws. REVISION DETAIL https://reviews.facebook.net/D3249 COMMIT https://reviews.facebook.net/rHIVE1340256 To: JIRA, cwsteinbach, shreepadma
        Phabricator made changes -
        Attachment HIVE-1719.D3249.4.patch [ 12528140 ]
        Hide
        Phabricator added a comment -

        shreepadma updated the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
        Reviewers: JIRA, cwsteinbach

        • fix test to workaround order by issue

        REVISION DETAIL
        https://reviews.facebook.net/D3249

        AFFECTED FILES
        contrib/src/test/queries/clientnegative/serde_regex.q
        contrib/src/test/queries/clientpositive/serde_regex.q
        contrib/src/test/results/clientnegative/serde_regex.q.out
        contrib/src/test/results/clientpositive/serde_regex.q.out
        ql/src/test/queries/clientnegative/serde_regex.q
        ql/src/test/queries/clientnegative/serde_regex2.q
        ql/src/test/queries/clientnegative/serde_regex3.q
        ql/src/test/queries/clientpositive/serde_regex.q
        ql/src/test/results/clientnegative/serde_regex.q.out
        ql/src/test/results/clientnegative/serde_regex2.q.out
        ql/src/test/results/clientnegative/serde_regex3.q.out
        ql/src/test/results/clientpositive/serde_regex.q.out
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java

        To: JIRA, cwsteinbach, shreepadma

        Show
        Phabricator added a comment - shreepadma updated the revision " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". Reviewers: JIRA, cwsteinbach fix test to workaround order by issue REVISION DETAIL https://reviews.facebook.net/D3249 AFFECTED FILES contrib/src/test/queries/clientnegative/serde_regex.q contrib/src/test/queries/clientpositive/serde_regex.q contrib/src/test/results/clientnegative/serde_regex.q.out contrib/src/test/results/clientpositive/serde_regex.q.out ql/src/test/queries/clientnegative/serde_regex.q ql/src/test/queries/clientnegative/serde_regex2.q ql/src/test/queries/clientnegative/serde_regex3.q ql/src/test/queries/clientpositive/serde_regex.q ql/src/test/results/clientnegative/serde_regex.q.out ql/src/test/results/clientnegative/serde_regex2.q.out ql/src/test/results/clientnegative/serde_regex3.q.out ql/src/test/results/clientpositive/serde_regex.q.out serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java To: JIRA, cwsteinbach, shreepadma
        Shreepadma Venugopalan made changes -
        Attachment HIVE-1719.3.patch [ 12528135 ]
        Hide
        Phabricator added a comment -

        cwsteinbach has accepted the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".

        +1. Will commit if tests pass.

        REVISION DETAIL
        https://reviews.facebook.net/D3249

        BRANCH
        HIVE-1719-trunk

        To: JIRA, cwsteinbach, shreepadma

        Show
        Phabricator added a comment - cwsteinbach has accepted the revision " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". +1. Will commit if tests pass. REVISION DETAIL https://reviews.facebook.net/D3249 BRANCH HIVE-1719 -trunk To: JIRA, cwsteinbach, shreepadma
        Phabricator made changes -
        Attachment HIVE-1719.D3249.3.patch [ 12527651 ]
        Hide
        Phabricator added a comment -

        shreepadma updated the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
        Reviewers: JIRA, cwsteinbach

        REVISION DETAIL
        https://reviews.facebook.net/D3249

        AFFECTED FILES
        contrib/src/test/queries/clientnegative/serde_regex.q
        contrib/src/test/queries/clientpositive/serde_regex.q
        contrib/src/test/results/clientnegative/serde_regex.q.out
        contrib/src/test/results/clientpositive/serde_regex.q.out
        ql/src/test/queries/clientnegative/serde_regex.q
        ql/src/test/queries/clientnegative/serde_regex2.q
        ql/src/test/queries/clientnegative/serde_regex3.q
        ql/src/test/queries/clientpositive/serde_regex.q
        ql/src/test/results/clientnegative/serde_regex.q.out
        ql/src/test/results/clientnegative/serde_regex2.q.out
        ql/src/test/results/clientnegative/serde_regex3.q.out
        ql/src/test/results/clientpositive/serde_regex.q.out
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java

        To: JIRA, cwsteinbach, shreepadma

        Show
        Phabricator added a comment - shreepadma updated the revision " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". Reviewers: JIRA, cwsteinbach HIVE-1719 : Changes requested by Carl REVISION DETAIL https://reviews.facebook.net/D3249 AFFECTED FILES contrib/src/test/queries/clientnegative/serde_regex.q contrib/src/test/queries/clientpositive/serde_regex.q contrib/src/test/results/clientnegative/serde_regex.q.out contrib/src/test/results/clientpositive/serde_regex.q.out ql/src/test/queries/clientnegative/serde_regex.q ql/src/test/queries/clientnegative/serde_regex2.q ql/src/test/queries/clientnegative/serde_regex3.q ql/src/test/queries/clientpositive/serde_regex.q ql/src/test/results/clientnegative/serde_regex.q.out ql/src/test/results/clientnegative/serde_regex2.q.out ql/src/test/results/clientnegative/serde_regex3.q.out ql/src/test/results/clientpositive/serde_regex.q.out serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java To: JIRA, cwsteinbach, shreepadma
        Phabricator made changes -
        Attachment HIVE-1719.D3249.2.patch [ 12527646 ]
        Hide
        Phabricator added a comment -

        shreepadma updated the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
        Reviewers: JIRA, cwsteinbach

        Regex Serde Changes

        RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
        currently in hive-serde. I think we should move it over to the hive-serde module so that
        users don't have to go to the added effort of manually registering the contrib jar before
        using it.

        This includes the changes requested by Carl from the previous revision.

        REVISION DETAIL
        https://reviews.facebook.net/D3249

        AFFECTED FILES
        contrib/src/test/queries/clientnegative/serde_regex.q
        contrib/src/test/queries/clientpositive/serde_regex.q
        contrib/src/test/results/clientnegative/serde_regex.q.out
        contrib/src/test/results/clientpositive/serde_regex.q.out
        ql/src/test/queries/clientnegative/serde_regex.q
        ql/src/test/queries/clientnegative/serde_regex2.q
        ql/src/test/queries/clientnegative/serde_regex3.q
        ql/src/test/queries/clientpositive/serde_regex.q
        ql/src/test/results/clientnegative/serde_regex.q.out
        ql/src/test/results/clientnegative/serde_regex2.q.out
        ql/src/test/results/clientnegative/serde_regex3.q.out
        ql/src/test/results/clientpositive/serde_regex.q.out
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java

        To: JIRA, cwsteinbach, shreepadma

        Show
        Phabricator added a comment - shreepadma updated the revision " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". Reviewers: JIRA, cwsteinbach Regex Serde Changes RegexSerDe is as much a part of the standard Hive distribution as the other SerDes currently in hive-serde. I think we should move it over to the hive-serde module so that users don't have to go to the added effort of manually registering the contrib jar before using it. This includes the changes requested by Carl from the previous revision. REVISION DETAIL https://reviews.facebook.net/D3249 AFFECTED FILES contrib/src/test/queries/clientnegative/serde_regex.q contrib/src/test/queries/clientpositive/serde_regex.q contrib/src/test/results/clientnegative/serde_regex.q.out contrib/src/test/results/clientpositive/serde_regex.q.out ql/src/test/queries/clientnegative/serde_regex.q ql/src/test/queries/clientnegative/serde_regex2.q ql/src/test/queries/clientnegative/serde_regex3.q ql/src/test/queries/clientpositive/serde_regex.q ql/src/test/results/clientnegative/serde_regex.q.out ql/src/test/results/clientnegative/serde_regex2.q.out ql/src/test/results/clientnegative/serde_regex3.q.out ql/src/test/results/clientpositive/serde_regex.q.out serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java To: JIRA, cwsteinbach, shreepadma
        Hide
        Phabricator added a comment -

        cwsteinbach has requested changes to the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".

        INLINE COMMENTS
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:45 Please remove "It can also serialize the row object using a format string."
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:42 Please change to "RegexSerDe uses a regular expression to deserialize row data. It does not support data serialization."
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:93 Please remove the outputFormatString variable and just do:

        if (null != tbl.getProperty("output.format.string"))

        { LOG.warn(....); }

        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:102 Throw the exception here instead of in the next IF block.
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:186 if (!alreadyLoggedNoMatch)

        { ... }

        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:158 Adding "Count" to the end of this variable name would help to make it clear that this is a scalar variable as opposed to a collection type.
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:184 Does this logic do anything? It looks like the code will log the first unmatched row it finds, set alreadyLoggedNoMatch = true, and subsequently never log another warning. Is there any reason to call getNextNumberToDisplay()?
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:205 I think it's fine to log one warning for the first row that has partial matches. No need to get fancy with this exponential backoff logging strategy.

        REVISION DETAIL
        https://reviews.facebook.net/D3249

        BRANCH
        HIVE-1719-trunk

        To: JIRA, cwsteinbach, shreepadma

        Show
        Phabricator added a comment - cwsteinbach has requested changes to the revision " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". INLINE COMMENTS serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:45 Please remove "It can also serialize the row object using a format string." serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:42 Please change to "RegexSerDe uses a regular expression to deserialize row data. It does not support data serialization." serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:93 Please remove the outputFormatString variable and just do: if (null != tbl.getProperty("output.format.string")) { LOG.warn(....); } serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:102 Throw the exception here instead of in the next IF block. serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:186 if (!alreadyLoggedNoMatch) { ... } serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:158 Adding "Count" to the end of this variable name would help to make it clear that this is a scalar variable as opposed to a collection type. serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:184 Does this logic do anything? It looks like the code will log the first unmatched row it finds, set alreadyLoggedNoMatch = true, and subsequently never log another warning. Is there any reason to call getNextNumberToDisplay()? serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:205 I think it's fine to log one warning for the first row that has partial matches. No need to get fancy with this exponential backoff logging strategy. REVISION DETAIL https://reviews.facebook.net/D3249 BRANCH HIVE-1719 -trunk To: JIRA, cwsteinbach, shreepadma
        Shreepadma Venugopalan made changes -
        Attachment HIVE-1719.D3249.1.patch [ 12527537 ]
        Phabricator made changes -
        Attachment HIVE-1719.D3249.1.patch [ 12527514 ]
        Hide
        Phabricator added a comment -

        shreepadma requested code review of "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
        Reviewers: JIRA, cwsteinbach

        Regex Serde Changes

        RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
        currently in hive-serde. I think we should move it over to the hive-serde module so that
        users don't have to go to the added effort of manually registering the contrib jar before
        using it.

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D3249

        AFFECTED FILES
        contrib/src/test/queries/clientnegative/serde_regex.q
        contrib/src/test/queries/clientpositive/serde_regex.q
        contrib/src/test/results/clientnegative/serde_regex.q.out
        contrib/src/test/results/clientpositive/serde_regex.q.out
        ql/src/test/queries/clientnegative/serde_regex.q
        ql/src/test/queries/clientnegative/serde_regex2.q
        ql/src/test/queries/clientnegative/serde_regex3.q
        ql/src/test/queries/clientpositive/serde_regex.q
        ql/src/test/results/clientnegative/serde_regex.q.out
        ql/src/test/results/clientnegative/serde_regex2.q.out
        ql/src/test/results/clientnegative/serde_regex3.q.out
        ql/src/test/results/clientpositive/serde_regex.q.out
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java

        MANAGE HERALD DIFFERENTIAL RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/7341/

        To: JIRA, cwsteinbach, shreepadma
        Cc: shreepadma

        Show
        Phabricator added a comment - shreepadma requested code review of " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". Reviewers: JIRA, cwsteinbach Regex Serde Changes RegexSerDe is as much a part of the standard Hive distribution as the other SerDes currently in hive-serde. I think we should move it over to the hive-serde module so that users don't have to go to the added effort of manually registering the contrib jar before using it. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D3249 AFFECTED FILES contrib/src/test/queries/clientnegative/serde_regex.q contrib/src/test/queries/clientpositive/serde_regex.q contrib/src/test/results/clientnegative/serde_regex.q.out contrib/src/test/results/clientpositive/serde_regex.q.out ql/src/test/queries/clientnegative/serde_regex.q ql/src/test/queries/clientnegative/serde_regex2.q ql/src/test/queries/clientnegative/serde_regex3.q ql/src/test/queries/clientpositive/serde_regex.q ql/src/test/results/clientnegative/serde_regex.q.out ql/src/test/results/clientnegative/serde_regex2.q.out ql/src/test/results/clientnegative/serde_regex3.q.out ql/src/test/results/clientpositive/serde_regex.q.out serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/7341/ To: JIRA, cwsteinbach, shreepadma Cc: shreepadma
        Hide
        Phabricator added a comment -

        cwsteinbach has commented on the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".

        INLINE COMMENTS
        contrib/src/test/queries/clientnegative/serde_regex.q:24 Since the first CREATE TABLE statement is expected to fail, we should probably remove the rest of this code.
        contrib/src/test/queries/clientpositive/serde_regex.q:3 This isn't necessary. QTestUtil.clearTestSideEffects() automatically drops any tables/dbs that aren't defined in QTestUtil.srcTables. (Also, take a look at QTestUtil.createSources() ... I remember you asked about this yesterday).
        contrib/src/test/queries/clientpositive/serde_regex.q:43 Please remove.
        contrib/src/test/queries/clientnegative/serde_regex.q:41 Please refer to a system variable (set in one of the Ant build files) instead of using a relative path, e.g. "$

        {system:test.src.data.dir}

        /files/apache.access.log"
        contrib/src/test/queries/clientpositive/serde_regex.q:38 Replace relative paths.
        ql/src/test/queries/clientnegative/serde_regex.q:2 Please remove.
        ql/src/test/queries/clientnegative/serde_regex.q:23 Remove the rest of this code if we're never going to hit it.
        ql/src/test/queries/clientnegative/serde_regex2.q:2 Please remove.
        ql/src/test/queries/clientnegative/serde_regex2.q:3 Same test as clientnegative/serde_regex.q?
        ql/src/test/queries/clientnegative/serde_regex2.q:20 Remove dead code.
        ql/src/test/queries/clientnegative/serde_regex3.q:1 Remove. Replace with 'USE default;' if you're running into problems placing a comment on the first line of the file.
        ql/src/test/queries/clientpositive/serde_regex.q:1 Remove.
        ql/src/test/queries/clientpositive/serde_regex.q:4 I think we should either roll the clientpositive tests into a single file (my preference), or modify the qfile file names so that they give some hint as to what is being tested. Please take a look at clientpositive/database.q for a good example of the former approach.
        ql/src/test/queries/clientpositive/serde_regex2.q:27 Please add a newline. You might want to configure your editor to add these automatically.
        ql/src/test/queries/clientpositive/serde_regex.q:39 It's a little dangerous using 'SELECT *' in tests because it's not always possible to easily determine from the output which column contains which field. In addition to 'SELECT *', please also include a query which selects a single column in the middle in of the table, and a query which selects a subset of two or more columns from the table.
        ql/src/test/queries/clientpositive/serde_regex2.q:4 I thought about this some more and think we should probably throw a runtime exception in the deserialize() method if this condition is detected. Otherwise, we're basically signing up to support this behavior in the future, which is not what we want to do.

        Please file a JIRA for this issue (i.e. that we want to catch this condition at compile-time instead of runtime), and then move this test case over to the clientnegative directory and cite the JIRA in a comment. Thanks.

        ql/src/test/queries/clientpositive/serde_regex3.q:3 This comment implies that output.format.string actually did something in the past, which isn't true. Also, the warning message doesn't show up in the q.out file, so I think we should probably just skip referencing this property in the tests.
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:42 Need to update this javadoc. RegexSerDe only provides deserialization capabilities.
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 Formatting: Braces always go on their own line. This project basically uses the Sun Java Coding standard (http://www.oracle.com/technetwork/java/codeconv-138413.html) with a 100 char line limit and 2 character indent.
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 "Not support yet" is inaccurate since we never plan to support this feature. Please change this to "RegexSerDe does not support the serialize() method"
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:188 It's possible that this will get triggered for every row in the input file, which will quickly overflow the logs. We should log this at most once. Same thing goes for the log message below.

        Currently SerDes don't really have a good way of logging error information like this. We should talk offline about ways of improving this.

        REVISION DETAIL
        https://reviews.facebook.net/D3141

        To: JIRA, cwsteinbach
        Cc: shreepadma

        Show
        Phabricator added a comment - cwsteinbach has commented on the revision " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". INLINE COMMENTS contrib/src/test/queries/clientnegative/serde_regex.q:24 Since the first CREATE TABLE statement is expected to fail, we should probably remove the rest of this code. contrib/src/test/queries/clientpositive/serde_regex.q:3 This isn't necessary. QTestUtil.clearTestSideEffects() automatically drops any tables/dbs that aren't defined in QTestUtil.srcTables. (Also, take a look at QTestUtil.createSources() ... I remember you asked about this yesterday). contrib/src/test/queries/clientpositive/serde_regex.q:43 Please remove. contrib/src/test/queries/clientnegative/serde_regex.q:41 Please refer to a system variable (set in one of the Ant build files) instead of using a relative path, e.g. "$ {system:test.src.data.dir} /files/apache.access.log" contrib/src/test/queries/clientpositive/serde_regex.q:38 Replace relative paths. ql/src/test/queries/clientnegative/serde_regex.q:2 Please remove. ql/src/test/queries/clientnegative/serde_regex.q:23 Remove the rest of this code if we're never going to hit it. ql/src/test/queries/clientnegative/serde_regex2.q:2 Please remove. ql/src/test/queries/clientnegative/serde_regex2.q:3 Same test as clientnegative/serde_regex.q? ql/src/test/queries/clientnegative/serde_regex2.q:20 Remove dead code. ql/src/test/queries/clientnegative/serde_regex3.q:1 Remove. Replace with 'USE default;' if you're running into problems placing a comment on the first line of the file. ql/src/test/queries/clientpositive/serde_regex.q:1 Remove. ql/src/test/queries/clientpositive/serde_regex.q:4 I think we should either roll the clientpositive tests into a single file (my preference), or modify the qfile file names so that they give some hint as to what is being tested. Please take a look at clientpositive/database.q for a good example of the former approach. ql/src/test/queries/clientpositive/serde_regex2.q:27 Please add a newline. You might want to configure your editor to add these automatically. ql/src/test/queries/clientpositive/serde_regex.q:39 It's a little dangerous using 'SELECT *' in tests because it's not always possible to easily determine from the output which column contains which field. In addition to 'SELECT *', please also include a query which selects a single column in the middle in of the table, and a query which selects a subset of two or more columns from the table. ql/src/test/queries/clientpositive/serde_regex2.q:4 I thought about this some more and think we should probably throw a runtime exception in the deserialize() method if this condition is detected. Otherwise, we're basically signing up to support this behavior in the future, which is not what we want to do. Please file a JIRA for this issue (i.e. that we want to catch this condition at compile-time instead of runtime), and then move this test case over to the clientnegative directory and cite the JIRA in a comment. Thanks. ql/src/test/queries/clientpositive/serde_regex3.q:3 This comment implies that output.format.string actually did something in the past, which isn't true. Also, the warning message doesn't show up in the q.out file, so I think we should probably just skip referencing this property in the tests. serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:42 Need to update this javadoc. RegexSerDe only provides deserialization capabilities. serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 Formatting: Braces always go on their own line. This project basically uses the Sun Java Coding standard ( http://www.oracle.com/technetwork/java/codeconv-138413.html ) with a 100 char line limit and 2 character indent. serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:217 "Not support yet" is inaccurate since we never plan to support this feature. Please change this to "RegexSerDe does not support the serialize() method" serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java:188 It's possible that this will get triggered for every row in the input file, which will quickly overflow the logs. We should log this at most once. Same thing goes for the log message below. Currently SerDes don't really have a good way of logging error information like this. We should talk offline about ways of improving this. REVISION DETAIL https://reviews.facebook.net/D3141 To: JIRA, cwsteinbach Cc: shreepadma
        Hide
        Phabricator added a comment -

        cwsteinbach has commented on the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".

        Submitted this review on behalf on Shreepadma using this patch: https://issues.apache.org/jira/secure/attachment/12526449/HIVE-1719.patch

        REVISION DETAIL
        https://reviews.facebook.net/D3141

        To: JIRA, cwsteinbach
        Cc: shreepadma

        Show
        Phabricator added a comment - cwsteinbach has commented on the revision " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". Submitted this review on behalf on Shreepadma using this patch: https://issues.apache.org/jira/secure/attachment/12526449/HIVE-1719.patch REVISION DETAIL https://reviews.facebook.net/D3141 To: JIRA, cwsteinbach Cc: shreepadma
        Phabricator made changes -
        Attachment HIVE-1719.D3141.1.patch [ 12526546 ]
        Hide
        Phabricator added a comment -

        cwsteinbach requested code review of "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
        Reviewers: JIRA

        https://issues.apache.org/jira/secure/attachment/12526449/HIVE-1719.patch

        RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
        currently in hive-serde. I think we should move it over to the hive-serde module so that
        users don't have to go to the added effort of manually registering the contrib jar before
        using it.

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D3141

        AFFECTED FILES
        contrib/src/test/queries/clientnegative/serde_regex.q
        contrib/src/test/queries/clientpositive/serde_regex.q
        contrib/src/test/results/clientnegative/serde_regex.q.out
        contrib/src/test/results/clientpositive/serde_regex.q.out
        ql/src/test/queries/clientnegative/serde_regex.q
        ql/src/test/queries/clientnegative/serde_regex2.q
        ql/src/test/queries/clientnegative/serde_regex3.q
        ql/src/test/queries/clientpositive/serde_regex.q
        ql/src/test/queries/clientpositive/serde_regex2.q
        ql/src/test/queries/clientpositive/serde_regex3.q
        ql/src/test/results/clientnegative/serde_regex.q.out
        ql/src/test/results/clientnegative/serde_regex2.q.out
        ql/src/test/results/clientnegative/serde_regex3.q.out
        ql/src/test/results/clientpositive/serde_regex.q.out
        ql/src/test/results/clientpositive/serde_regex2.q.out
        ql/src/test/results/clientpositive/serde_regex3.q.out
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java

        MANAGE HERALD DIFFERENTIAL RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/7149/

        To: JIRA, cwsteinbach
        Cc: cwsteinbach

        Show
        Phabricator added a comment - cwsteinbach requested code review of " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". Reviewers: JIRA https://issues.apache.org/jira/secure/attachment/12526449/HIVE-1719.patch RegexSerDe is as much a part of the standard Hive distribution as the other SerDes currently in hive-serde. I think we should move it over to the hive-serde module so that users don't have to go to the added effort of manually registering the contrib jar before using it. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D3141 AFFECTED FILES contrib/src/test/queries/clientnegative/serde_regex.q contrib/src/test/queries/clientpositive/serde_regex.q contrib/src/test/results/clientnegative/serde_regex.q.out contrib/src/test/results/clientpositive/serde_regex.q.out ql/src/test/queries/clientnegative/serde_regex.q ql/src/test/queries/clientnegative/serde_regex2.q ql/src/test/queries/clientnegative/serde_regex3.q ql/src/test/queries/clientpositive/serde_regex.q ql/src/test/queries/clientpositive/serde_regex2.q ql/src/test/queries/clientpositive/serde_regex3.q ql/src/test/results/clientnegative/serde_regex.q.out ql/src/test/results/clientnegative/serde_regex2.q.out ql/src/test/results/clientnegative/serde_regex3.q.out ql/src/test/results/clientpositive/serde_regex.q.out ql/src/test/results/clientpositive/serde_regex2.q.out ql/src/test/results/clientpositive/serde_regex3.q.out serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/7149/ To: JIRA, cwsteinbach Cc: cwsteinbach
        Shreepadma Venugopalan made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Shreepadma Venugopalan made changes -
        Attachment HIVE-1719.patch [ 12526449 ]
        Shreepadma Venugopalan made changes -
        Attachment HIVE-1719.3.patch.txt [ 12526288 ]
        Hide
        Shreepadma Venugopalan added a comment -

        This patch makes the following changes,

        • Moves Regex SerDe to hive-serde
        • Deprecates Regex SerDe under hive-contrib. However the behavior of Regex Serde under hive-contrib is not altered in anyway.

        Additionally it alters the behavior of Regex SerDe under hive-serde in the following ways,

        • Raises an exception when the input pattern to Regex is null
        • Logs a warning when the # of matching groups don't match the # of cols
        • Deprecates the output.format.string

        This patch also improves coverage for regex serde under hive-serde by adding additional positive and negative test cases. Please note that the regex serde in the contrib module contains a deprecation warning, but its behavior is not altered in any way.

        Show
        Shreepadma Venugopalan added a comment - This patch makes the following changes, Moves Regex SerDe to hive-serde Deprecates Regex SerDe under hive-contrib. However the behavior of Regex Serde under hive-contrib is not altered in anyway. Additionally it alters the behavior of Regex SerDe under hive-serde in the following ways, Raises an exception when the input pattern to Regex is null Logs a warning when the # of matching groups don't match the # of cols Deprecates the output.format.string This patch also improves coverage for regex serde under hive-serde by adding additional positive and negative test cases. Please note that the regex serde in the contrib module contains a deprecation warning, but its behavior is not altered in any way.
        Hide
        Phabricator added a comment -

        cwsteinbach has requested changes to the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".

        Looks good, but we need to copy the regex serde testcases from contrib over to ql. I also noticed that the negative testcase isn't documented, and doesn't seem to exercise any of the error conditions in RegexSerDe.initialize(). Can you please add some additional test coverage for these cases? Thanks.

        REVISION DETAIL
        https://reviews.facebook.net/D3051

        BRANCH
        HIVE-1719

        Show
        Phabricator added a comment - cwsteinbach has requested changes to the revision " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". Looks good, but we need to copy the regex serde testcases from contrib over to ql. I also noticed that the negative testcase isn't documented, and doesn't seem to exercise any of the error conditions in RegexSerDe.initialize(). Can you please add some additional test coverage for these cases? Thanks. REVISION DETAIL https://reviews.facebook.net/D3051 BRANCH HIVE-1719
        Phabricator made changes -
        Attachment HIVE-1719.D3051.2.patch [ 12525707 ]
        Hide
        Phabricator added a comment -

        shreepadma updated the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
        Reviewers: JIRA, cwsteinbach

        • This patch includes the deprecation warnings on top of the previous patch which moved the RegexSerDe to the SerDe directory

        REVISION DETAIL
        https://reviews.facebook.net/D3051

        AFFECTED FILES
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java

        Show
        Phabricator added a comment - shreepadma updated the revision " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". Reviewers: JIRA, cwsteinbach HIVE-1719 Deprecate Regex Serde in contrib HIVE-1719 Deprecate Regex Serde in contrib This patch includes the deprecation warnings on top of the previous patch which moved the RegexSerDe to the SerDe directory REVISION DETAIL https://reviews.facebook.net/D3051 AFFECTED FILES serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
        Phabricator made changes -
        Attachment HIVE-1719.D3051.1.patch [ 12525705 ]
        Hide
        Phabricator added a comment -

        shreepadma requested code review of "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde".
        Reviewers: JIRA, cwsteinbach

        HIVE-1719.Move Regex Serde

        RegexSerDe is as much a part of the standard Hive distribution as the other
        SerDes
        currently in hive-serde. I think we should move it over to the hive-serde module
        so that
        users don't have to go to the added effort of manually registering the contrib
        jar before
        using it.

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D3051

        AFFECTED FILES
        serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java

        Show
        Phabricator added a comment - shreepadma requested code review of " HIVE-1719 [jira] Move RegexSerDe out of hive-contrib and over to hive-serde". Reviewers: JIRA, cwsteinbach HIVE-1719 .Move Regex Serde RegexSerDe is as much a part of the standard Hive distribution as the other SerDes currently in hive-serde. I think we should move it over to the hive-serde module so that users don't have to go to the added effort of manually registering the contrib jar before using it. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D3051 AFFECTED FILES serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
        Hide
        Carl Steinbach added a comment -

        Note: we should probably leave a copy of the regexserde in the contrib module with a deprecation warning so that users have an easier migration route.

        Show
        Carl Steinbach added a comment - Note: we should probably leave a copy of the regexserde in the contrib module with a deprecation warning so that users have an easier migration route.
        Carl Steinbach made changes -
        Assignee Shreepadma Venugopalan [ shreepadma ]
        Carl Steinbach made changes -
        Fix Version/s 0.7.0 [ 12315150 ]
        Carl Steinbach made changes -
        Field Original Value New Value
        Fix Version/s 0.7.0 [ 12315150 ]
        Carl Steinbach created issue -

          People

          • Assignee:
            Shreepadma Venugopalan
            Reporter:
            Carl Steinbach
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development