Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      In the last 18 month PigLatin has gained significant popularity within the open source community. Many users like its data flow model, its rich type system and its ability to work with any data available on HDFS or outside. We have also heard from many users that having Pig speak SQL would bring many more users. Having a single system that exports multiple interfaces is a big advantage as it guarantees consistent semantics, custom code reuse, and reduces the amount of maintenance. This is especially relevant for project where using both interfaces for different parts of the system is relevant. For instance, in a
      data warehousing system, you would have ETL component that brings data into the warehouse and a component that analyzes the data and produces reports. PigLatin is uniquely suited for ETL processing while SQL might be a better fit for report generation.

      To start, it would make sense to implement a subset of SQL92 standard and to be as much as possible standard compliant. This would include all the standard constructs: select, from, where, group-by + having, order by, limit, join (inner + outer). Several extensions such as support for pig's UDFs and possibly streaming, multiquery and support for pig's complex types would be helpful.

      This work is dependent on metadata support outlined in https://issues.apache.org/jira/browse/PIG-823

      1. students2.bin
        0.2 kB
        Thejas M Nair
      2. students_attr.bin
        0.6 kB
        Thejas M Nair
      3. SQL_IN_PIG.html
        3 kB
        Thejas M Nair
      4. pigsql.patch
        1.42 MB
        Thejas M Nair
      5. pigsql_tutorial.txt
        5 kB
        Thejas M Nair
      6. PIG-824.binfiles.tar.gz
        2.30 MB
        Thejas M Nair
      7. PIG-824.1.patch
        664 kB
        Thejas M Nair
      8. pig_sql_beta.pdf
        86 kB
        Thejas M Nair
      9. java-cup-11a-runtime.jar
        13 kB
        Thejas M Nair
      10. java-cup-11a.jar
        94 kB
        Thejas M Nair

        Activity

        Hide
        Jeff Hammerbacher added a comment -

        Sigh. Really? Why build another SQL interface to Hadoop when we have two already (CloudBase, Hive)? Extending Pig to share Hive's metadata repository seems to be a much, much shorter path to a solution.

        Show
        Jeff Hammerbacher added a comment - Sigh. Really? Why build another SQL interface to Hadoop when we have two already (CloudBase, Hive)? Extending Pig to share Hive's metadata repository seems to be a much, much shorter path to a solution.
        Hide
        eric baldeschwieler added a comment -

        Hi Jeff,

        Reasonable parties can clearly disagree on approach. We've been planning this approach since before Hive's inception, as we discussed before Hive's inception. Your team chose to explore an alternative approach rather than implement a SQL parser of Pig.

        The Hadoop community is richer for that.

        Having looked at the cost benefit for our organization, we've concluded that we still believe that having a single set of tools that supports Pig and SQL syntax will reduce the overall cost of running the diverse workloads we support and we are willing to invest to get to that saving.

        I believe the Hadoop community will be richer for having that alternative too.

        Let's keep talking!

        E14

        Show
        eric baldeschwieler added a comment - Hi Jeff, Reasonable parties can clearly disagree on approach. We've been planning this approach since before Hive's inception, as we discussed before Hive's inception. Your team chose to explore an alternative approach rather than implement a SQL parser of Pig. The Hadoop community is richer for that. Having looked at the cost benefit for our organization, we've concluded that we still believe that having a single set of tools that supports Pig and SQL syntax will reduce the overall cost of running the diverse workloads we support and we are willing to invest to get to that saving. I believe the Hadoop community will be richer for having that alternative too. Let's keep talking! E14
        Hide
        Thejas M Nair added a comment -

        PIG-824.binfiles.tar.gz - contains libs that it depends on
        PIG-824.1.patch - patch
        SQL_IN_PIG.html - (brief) document

        JFlex.jar has not been included because it covered by GPL. It will have to be downloaded to lib dir for building with the patch. In future Ivy will be setup to download it .

        Show
        Thejas M Nair added a comment - PIG-824 .binfiles.tar.gz - contains libs that it depends on PIG-824 .1.patch - patch SQL_IN_PIG.html - (brief) document JFlex.jar has not been included because it covered by GPL. It will have to be downloaded to lib dir for building with the patch. In future Ivy will be setup to download it .
        Hide
        Thejas M Nair added a comment -

        JFlex.jar (required for build this patch) can be downloaded from http://www.jflex.de/download.html .

        Show
        Thejas M Nair added a comment - JFlex.jar (required for build this patch) can be downloaded from http://www.jflex.de/download.html .
        Hide
        Thejas M Nair added a comment -

        SQL patch (pigsql.patch) based on version of owl in svn and documentation (pig_sql_beta.pdf). Patch is against the trunk revision 941018 .

        Show
        Thejas M Nair added a comment - SQL patch (pigsql.patch) based on version of owl in svn and documentation (pig_sql_beta.pdf). Patch is against the trunk revision 941018 .
        Hide
        Thejas M Nair added a comment -

        copy the attached jar files to lib/ dir to build the patch.

        copy the bin storage format test files to following dirs -
        students2.bin -> test/org/apache/pig/test/data/SQL/students2.bin and contrib/owl/contrib/pig/test/java/org/apache/hadoop/owl/pig/data/SQL/students2.bin
        students_attr.bin -> test/org/apache/pig/test/data/SQL/students_attr.bin and contrib/owl/contrib/pig/test/java/org/apache/hadoop/owl/pig/data/SQL/students_attr.bin

        Show
        Thejas M Nair added a comment - copy the attached jar files to lib/ dir to build the patch. copy the bin storage format test files to following dirs - students2.bin -> test/org/apache/pig/test/data/SQL/students2.bin and contrib/owl/contrib/pig/test/java/org/apache/hadoop/owl/pig/data/SQL/students2.bin students_attr.bin -> test/org/apache/pig/test/data/SQL/students_attr.bin and contrib/owl/contrib/pig/test/java/org/apache/hadoop/owl/pig/data/SQL/students_attr.bin
        Hide
        Thejas M Nair added a comment -

        Attaching SQL tutorial (pigsql_tutorial.txt) -
        This Pig SQL tutorial shows you how to run SQL scripts in local mode and mapreduce mode.
        The metadata is stored using Owl. In this tutorial a jetty/derby based owl setup is used so that only minimal setup needs to be done to get started.

        Show
        Thejas M Nair added a comment - Attaching SQL tutorial (pigsql_tutorial.txt) - This Pig SQL tutorial shows you how to run SQL scripts in local mode and mapreduce mode. The metadata is stored using Owl. In this tutorial a jetty/derby based owl setup is used so that only minimal setup needs to be done to get started.
        Hide
        Min Zhou added a comment -

        Any further progress on this issue?

        Show
        Min Zhou added a comment - Any further progress on this issue?
        Hide
        Jeff Hammerbacher added a comment -

        Hey Min,

        To the best of my knowledge, the development on this issue has stopped in favor of adapting Owl (Pig's metastore) to work with Hive. To follow the Howl project, first check out the overview at http://wiki.apache.org/pig/Howl and join the mailing list at http://tech.groups.yahoo.com/group/howldev/.

        Regards,
        Jeff

        Show
        Jeff Hammerbacher added a comment - Hey Min, To the best of my knowledge, the development on this issue has stopped in favor of adapting Owl (Pig's metastore) to work with Hive. To follow the Howl project, first check out the overview at http://wiki.apache.org/pig/Howl and join the mailing list at http://tech.groups.yahoo.com/group/howldev/ . Regards, Jeff
        Hide
        eric baldeschwieler added a comment -

        I'm on vacation til wednesday 7/28.

        I'm in Hershey, PA, cell should work if needed.

        Show
        eric baldeschwieler added a comment - I'm on vacation til wednesday 7/28. I'm in Hershey, PA, cell should work if needed.
        Hide
        Amr Awadallah added a comment -

        I am out of office on vacation and will be slower than usual in
        responding to emails. If this is urgent then please call my cell phone
        (or send an sms), otherwise I will reply to your email when I get
        back.

        Thanks for your patience,

        – amr

        Show
        Amr Awadallah added a comment - I am out of office on vacation and will be slower than usual in responding to emails. If this is urgent then please call my cell phone (or send an sms), otherwise I will reply to your email when I get back. Thanks for your patience, – amr
        Hide
        Olga Natkovich added a comment -

        Jeff is correct. We are not actively developing Pig SQL or Owl.

        Show
        Olga Natkovich added a comment - Jeff is correct. We are not actively developing Pig SQL or Owl.
        Hide
        Russell Jurney added a comment -

        An update on this, regarding HIVE/Pig interoperability, would be appreciated. I want to be throwing SQL at my Pig relations to verify them like crazy.

        Show
        Russell Jurney added a comment - An update on this, regarding HIVE/Pig interoperability, would be appreciated. I want to be throwing SQL at my Pig relations to verify them like crazy.
        Hide
        Thejas M Nair added a comment -

        Russell,
        The hcatalog project is going to enable pig/hive/mr to work with each other. But that is at the storage format/metadata level.

        I want to be throwing SQL at my Pig relations to verify them like crazy.

        This sounds like a new use case. Do you mean running sql queries on pig relations, treating a pig relation like a table ? ie, sql statements within a pig-latin query?

        Show
        Thejas M Nair added a comment - Russell, The hcatalog project is going to enable pig/hive/mr to work with each other. But that is at the storage format/metadata level. I want to be throwing SQL at my Pig relations to verify them like crazy. This sounds like a new use case. Do you mean running sql queries on pig relations, treating a pig relation like a table ? ie, sql statements within a pig-latin query?
        Hide
        Russell Jurney added a comment -

        I do mean that, yes. It would be great to call HIVE from grunt. Maybe I should make a feature request?

        Show
        Russell Jurney added a comment - I do mean that, yes. It would be great to call HIVE from grunt. Maybe I should make a feature request?
        Hide
        Thejas M Nair added a comment -

        I think what you are asking for is more than "calling hive from grunt" (which can be done through an "sh /usr/bin/hive ..."), what you are asking for is "support sql statements in pig-latin". Yes, I think you should make that a new feature request.

        btw, you can do the same in a not as elegant way - store the pig relation using hcat, then run hive query using 'sh /usr/bin/hive ..'.

        Show
        Thejas M Nair added a comment - I think what you are asking for is more than "calling hive from grunt" (which can be done through an "sh /usr/bin/hive ..."), what you are asking for is "support sql statements in pig-latin". Yes, I think you should make that a new feature request. btw, you can do the same in a not as elegant way - store the pig relation using hcat, then run hive query using 'sh /usr/bin/hive ..'.
        Hide
        Russell Jurney added a comment -

        I like the idea of using HCat to get at the data via HIVE. Is there a simple guide to achieving this some place?

        Show
        Russell Jurney added a comment - I like the idea of using HCat to get at the data via HIVE. Is there a simple guide to achieving this some place?
        Hide
        Ashutosh Chauhan added a comment -

        Hey Russell,

        Take a look at http://incubator.apache.org/hcatalog/docs/r0.2.0/index.html#Data+Flow+Example This may give you some idea on how to accomplish that.

        Show
        Ashutosh Chauhan added a comment - Hey Russell, Take a look at http://incubator.apache.org/hcatalog/docs/r0.2.0/index.html#Data+Flow+Example This may give you some idea on how to accomplish that.

          People

          • Assignee:
            Thejas M Nair
            Reporter:
            Olga Natkovich
          • Votes:
            0 Vote for this issue
            Watchers:
            30 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development