Hive
  1. Hive
  2. HIVE-2998 Making Hive run on Windows Server and Windows Azure environment
  3. HIVE-3146

Support external hive tables whose data are stored in Azure blob store/Azure Storage Volumes (ASV)

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: Windows
    • Labels:

      Description

      Support external hive tables whose data are stored in Azure blob store/Azure Storage Volumes (ASV)

      1. HIVE-3146.3.patch.txt
        2 kB
        Kanna Karanam
      2. HIVE-3146.2.patch.txt
        2 kB
        Kanna Karanam
      3. HIVE-3146.1.patch.txt
        0.6 kB
        Kanna Karanam

        Activity

        Hide
        Kanna Karanam added a comment -

        Attached the patch

        Show
        Kanna Karanam added a comment - Attached the patch
        Hide
        Edward Capriolo added a comment -

        Can you please do a study into what calls into this function and why? Maybe if can be removed entirely or moved into a configuration setting. We do not want to recode each time a new filesystem is introduced.

        Show
        Edward Capriolo added a comment - Can you please do a study into what calls into this function and why? Maybe if can be removed entirely or moved into a configuration setting. We do not want to recode each time a new filesystem is introduced.
        Hide
        Ashutosh Chauhan added a comment -
        Show
        Ashutosh Chauhan added a comment - @Ed, According to https://issues.apache.org/jira/browse/HIVE-1624?focusedCommentId=12914176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12914176 schemes were picked rather defensively. Adding ASV in there should be fine.
        Hide
        JQ Hadoop added a comment -

        I think make it a configuration setting will be more flexible.

        Show
        JQ Hadoop added a comment - I think make it a configuration setting will be more flexible.
        Hide
        Carl Steinbach added a comment -

        @Kanna: please create a review request on reviews.apache.org. Thanks.

        Show
        Carl Steinbach added a comment - @Kanna: please create a review request on reviews.apache.org. Thanks.
        Hide
        Kanna Karanam added a comment -

        please find the code review request at https://reviews.apache.org/r/5530/

        Show
        Kanna Karanam added a comment - please find the code review request at https://reviews.apache.org/r/5530/
        Hide
        Kanna Karanam added a comment -

        @Edward, @JQ – If I understand correctly, the suggestion is
        1)Create a new setting in the hive-conf.xml
        2)Set the default value to the existing values in hiveConf.java & hive-conf.xml template.
        3)Customer can override this setting if they want to work with a different storage system.

        I personally don’t see so many changes to this but if you strongly feel that it has to be configurable then please let me know. I will update the patch and send it for review.

        Show
        Kanna Karanam added a comment - @Edward, @JQ – If I understand correctly, the suggestion is 1)Create a new setting in the hive-conf.xml 2)Set the default value to the existing values in hiveConf.java & hive-conf.xml template. 3)Customer can override this setting if they want to work with a different storage system. I personally don’t see so many changes to this but if you strongly feel that it has to be configurable then please let me know. I will update the patch and send it for review.
        Hide
        Edward Capriolo added a comment -

        Kanna, I actually do not believe we need this check at all.

          private String downloadResource(String value, boolean convertToUnix) {
            if (value.matches("("+ getMatchingSchemaAsRegex() +")://.*")) {
              try {
                FileSystem fs = FileSystem.get(new URI(value), conf);
        

        I just dealt with one of these.

        https://issues.apache.org/jira/browse/HIVE-1444?attachmentSortBy=dateTime

        Hadoop and hive are supposed to support pluggable DFS's, we should not have a 'list of approved fs' anywhere, it just makes more work and more incomparability problems.

        If I understand this method correctly all we probably need is a check for:

        if (!fs.contains("file:///")){ }
        

        Am I right?

        Show
        Edward Capriolo added a comment - Kanna, I actually do not believe we need this check at all. private String downloadResource(String value, boolean convertToUnix) { if (value.matches("("+ getMatchingSchemaAsRegex() +")://.*")) { try { FileSystem fs = FileSystem.get(new URI(value), conf); I just dealt with one of these. https://issues.apache.org/jira/browse/HIVE-1444?attachmentSortBy=dateTime Hadoop and hive are supposed to support pluggable DFS's, we should not have a 'list of approved fs' anywhere, it just makes more work and more incomparability problems. If I understand this method correctly all we probably need is a check for: if (!fs.contains("file:///")){ } Am I right?
        Hide
        Kanna Karanam added a comment -

        @Edward, @Ashutosh - I think we need the following check here. I tried with if (!fs.contains("file:///")){ and noticed that around 70 Unoit tests are failing.

        As per HIVE-1624 - They want to download resources from S3 (external FS to local system) and then move them to cluster.

        public static boolean canDownloadResource(String value)

        { // Allow to download resources from any external FileSystem. // And no need to download if it already exists on local file system. return value.matches("\\w+://.*") && !value.toLowerCase().contains("file://"); }

        I am running unit tests with this change. If 100% pass then I will upload the patch.

        Thanks

        Show
        Kanna Karanam added a comment - @Edward, @Ashutosh - I think we need the following check here. I tried with if (!fs.contains("file:///")){ and noticed that around 70 Unoit tests are failing. As per HIVE-1624 - They want to download resources from S3 (external FS to local system) and then move them to cluster. public static boolean canDownloadResource(String value) { // Allow to download resources from any external FileSystem. // And no need to download if it already exists on local file system. return value.matches("\\w+://.*") && !value.toLowerCase().contains("file://"); } I am running unit tests with this change. If 100% pass then I will upload the patch. Thanks
        Hide
        Kanna Karanam added a comment -

        Attached the patch. Code review request for this patch is https://reviews.apache.org/r/5687/

        Show
        Kanna Karanam added a comment - Attached the patch. Code review request for this patch is https://reviews.apache.org/r/5687/
        Hide
        Kanna Karanam added a comment -

        Updated the patch.

        Addressed the Ashutosh code review comments.

        Show
        Kanna Karanam added a comment - Updated the patch. Addressed the Ashutosh code review comments.
        Hide
        Ashutosh Chauhan added a comment -

        +1 will commit if tests pass.

        Show
        Ashutosh Chauhan added a comment - +1 will commit if tests pass.
        Hide
        Ashutosh Chauhan added a comment -

        Committed to trunk. Thanks, Kanna!

        Show
        Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Kanna!
        Hide
        Kanna Karanam added a comment -

        Thanks Ashutosh.

        Show
        Kanna Karanam added a comment - Thanks Ashutosh.
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1526 (See https://builds.apache.org/job/Hive-trunk-h0.21/1526/)
        HIVE-3146 : Support external hive tables whose data are stored in Azure blob store/Azure Storage Volumes (ASV) (Kanna Karanam via Ashutosh Chauhan) (Revision 1356524)

        Result = FAILURE
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356524
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1526 (See https://builds.apache.org/job/Hive-trunk-h0.21/1526/ ) HIVE-3146 : Support external hive tables whose data are stored in Azure blob store/Azure Storage Volumes (ASV) (Kanna Karanam via Ashutosh Chauhan) (Revision 1356524) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356524 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
        HIVE-3146 : Support external hive tables whose data are stored in Azure blob store/Azure Storage Volumes (ASV) (Kanna Karanam via Ashutosh Chauhan) (Revision 1356524)

        Result = ABORTED
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356524
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-3146 : Support external hive tables whose data are stored in Azure blob store/Azure Storage Volumes (ASV) (Kanna Karanam via Ashutosh Chauhan) (Revision 1356524) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356524 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
        Hide
        Ashutosh Chauhan added a comment -

        This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

        Show
        Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          People

          • Assignee:
            Kanna Karanam
            Reporter:
            Kanna Karanam
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development