Pig
  1. Pig
  2. PIG-6

Addition of Hbase Storage Option In Load/Store Statement

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      all environments

    • Hadoop Flags:
      Reviewed

      Description

      It needs to be able to load full table in hbase. (maybe ... difficult? i'm not sure yet.)
      Also, as described below,
      It needs to compose an abstract 2d-table only with certain data filtered from hbase array structure using arbitrary query-delimited.

      A = LOAD table('hbase_table');
      or
      B = LOAD table('hbase_table') Using HbaseQuery('Query-delimited by attributes & timestamp') as (f1, f2[, f3]);
      

      Once test is done on my local machines,
      I will clarify the grammars and give you more examples to help you explain more storage options.

      Any advice welcome.

      1. PIG-6.patch
        21 kB
        Sijie Guo
      2. PIG-6_V01.patch
        21 kB
        Sijie Guo
      3. m34813f5.txt
        8 kB
        Alex Newman
      4. hbase-0.18.1-test.jar
        1.14 MB
        Sijie Guo
      5. hbase-0.18.1.jar
        946 kB
        Sijie Guo

        Issue Links

          Activity

          Hide
          Alan Gates added a comment -

          The outstanding patch that has not been applied (m34813f5.txt) connects Pig with Hbase 0.19. Since Pig does not yet support Hadoop 0.19 as a released version (there is a patch you can apply and build yourself to make it work) we haven't incorporated this patch yet either.

          Show
          Alan Gates added a comment - The outstanding patch that has not been applied (m34813f5.txt) connects Pig with Hbase 0.19. Since Pig does not yet support Hadoop 0.19 as a released version (there is a patch you can apply and build yourself to make it work) we haven't incorporated this patch yet either.
          Hide
          Amr Awadallah added a comment -

          Any progress on this?

          Show
          Amr Awadallah added a comment - Any progress on this?
          Hide
          Alex Newman added a comment -

          Any progress on this?

          Show
          Alex Newman added a comment - Any progress on this?
          Hide
          Alex Newman added a comment - - edited

          I have added the patch as per the normal process. Someone should also probably attach the hbase-0.19 jar somewhere. Also, this allows for a start row and stop row filter for a table. It needs unit testing.

          Show
          Alex Newman added a comment - - edited I have added the patch as per the normal process. Someone should also probably attach the hbase-0.19 jar somewhere. Also, this allows for a start row and stop row filter for a table. It needs unit testing.
          Hide
          Alan Gates added a comment -

          Are you submitting this patch for us to include to make hbase 19 work with pig? If so can you post the patch to this JIRA and mark the box that says we have permission to include it? That way we have the patch in our system and legal clearance to use it. Thanks.

          Show
          Alan Gates added a comment - Are you submitting this patch for us to include to make hbase 19 work with pig? If so can you post the patch to this JIRA and mark the box that says we have permission to include it? That way we have the patch in our system and legal clearance to use it. Thanks.
          Hide
          Alex Newman added a comment - - edited

          Ok I have this "working" with hbase-0.19, you need to stick your hbase-conf files in the hadoop conf directory however. It is also poorly tested
          http://pastebin.com/m34813f5
          you also need the hbase and hadoop 0.19 jars in the right place. I also added an optional start row and stop row when you create the hbase query

          Show
          Alex Newman added a comment - - edited Ok I have this "working" with hbase-0.19, you need to stick your hbase-conf files in the hadoop conf directory however. It is also poorly tested http://pastebin.com/m34813f5 you also need the hbase and hadoop 0.19 jars in the right place. I also added an optional start row and stop row when you create the hbase query
          Hide
          Sijie Guo added a comment -

          @Oliver

          The patch is base on the types branch of Pig.
          You need to check out the source from http://svn.apache.org/repos/asf/hadoop/pig/branches/types. And try again.

          Show
          Sijie Guo added a comment - @Oliver The patch is base on the types branch of Pig. You need to check out the source from http://svn.apache.org/repos/asf/hadoop/pig/branches/types . And try again.
          Hide
          Oliver Po added a comment -

          Hi,
          I am trying to test this patch but had some unresolved import errors with the following statements (HBaseSlice and HBaseStroage). I tried different version of Pig but cannot find those classes. Could someone tell me where to look for them? Thanks!

          import org.apache.pig.ExecType;
          import org.apache.pig.builtin.Utf8StorageConverter;
          import org.apache.pig.data.DataByteArray;
          import org.apache.pig.data.TupleFactory;

          Show
          Oliver Po added a comment - Hi, I am trying to test this patch but had some unresolved import errors with the following statements (HBaseSlice and HBaseStroage). I tried different version of Pig but cannot find those classes. Could someone tell me where to look for them? Thanks! import org.apache.pig.ExecType; import org.apache.pig.builtin.Utf8StorageConverter; import org.apache.pig.data.DataByteArray; import org.apache.pig.data.TupleFactory;
          Hide
          Alan Gates added a comment -

          V01 patch checked in. Thanks Sam for stepping up and taking on this issue that many people had requested.

          Show
          Alan Gates added a comment - V01 patch checked in. Thanks Sam for stepping up and taking on this issue that many people had requested.
          Hide
          Sijie Guo added a comment -

          Attach a new patch to fix the test mistake.

          Show
          Sijie Guo added a comment - Attach a new patch to fix the test mistake.
          Hide
          Sijie Guo added a comment -

          @Ben

          I have a question about your second suggestion.

          If I don't implement LoadFunc, I can't cast the UDF(HBaseStorage) to LoadFunc. Then I will get the exception below.

          org.apache.pig.backend.hadoop.hbase.HBaseStorage cannot be cast to org.apache.pig.LoadFunc
          java.io.IOException: org.apache.pig.backend.hadoop.hbase.HBaseStorage cannot be cast to org.apache.pig.LoadFunc
          at org.apache.pig.PigServer.parseQuery(PigServer.java:298)
          at org.apache.pig.PigServer.registerQuery(PigServer.java:263)
          at org.apache.pig.PigServer.registerQuery(PigServer.java:349)
          at org.apache.pig.test.TestHBaseStorage.testLoadFromHBase(TestHBaseStorage.java:125)
          Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: org.apache.pig.backend.hadoop.hbase.HBaseStorage cannot be cast to org.apache.pig.LoadFunc
          at org.apache.pig.impl.logicalLayer.parser.QueryParser.NonEvalFuncSpec(QueryParser.java:4271)
          at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1063)
          at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:883)
          at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:742)
          at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:537)
          at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
          at org.apache.pig.PigServer.parseQuery(PigServer.java:295)

          It seems that we only accept four types of UDF in pig-latin: EVALFUNC, COMPARISONFUNC, LOADFUNC, STOREFUNC. Am I right?

          Can you explain more about your second suggestion? Thanks in advanced.

          Show
          Sijie Guo added a comment - @Ben I have a question about your second suggestion. If I don't implement LoadFunc, I can't cast the UDF(HBaseStorage) to LoadFunc . Then I will get the exception below. org.apache.pig.backend.hadoop.hbase.HBaseStorage cannot be cast to org.apache.pig.LoadFunc java.io.IOException: org.apache.pig.backend.hadoop.hbase.HBaseStorage cannot be cast to org.apache.pig.LoadFunc at org.apache.pig.PigServer.parseQuery(PigServer.java:298) at org.apache.pig.PigServer.registerQuery(PigServer.java:263) at org.apache.pig.PigServer.registerQuery(PigServer.java:349) at org.apache.pig.test.TestHBaseStorage.testLoadFromHBase(TestHBaseStorage.java:125) Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: org.apache.pig.backend.hadoop.hbase.HBaseStorage cannot be cast to org.apache.pig.LoadFunc at org.apache.pig.impl.logicalLayer.parser.QueryParser.NonEvalFuncSpec(QueryParser.java:4271) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1063) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:883) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:742) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:537) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60) at org.apache.pig.PigServer.parseQuery(PigServer.java:295) It seems that we only accept four types of UDF in pig-latin: EVALFUNC, COMPARISONFUNC, LOADFUNC, STOREFUNC. Am I right? Can you explain more about your second suggestion? Thanks in advanced.
          Hide
          Sijie Guo added a comment -

          @Ben

          > 1) you don't fill in the schema. hbase has schema right? it would be nice to fill it in.

          All the values in hbase are interpreted as bytes array. I am not very sure about how to fill the schema. I will check the code again.

          > 2) i don't think you need to implement LoadFunc if you implement Slicer.

          hmm, I will remove the LoadFunc's null implementations.

          Thanks, Ben

          Show
          Sijie Guo added a comment - @Ben > 1) you don't fill in the schema. hbase has schema right? it would be nice to fill it in. All the values in hbase are interpreted as bytes array. I am not very sure about how to fill the schema. I will check the code again. > 2) i don't think you need to implement LoadFunc if you implement Slicer. hmm, I will remove the LoadFunc's null implementations. Thanks, Ben
          Hide
          Sijie Guo added a comment -

          @Alan :

          Oh I am sorry. It is my fault. HConnectionManager.deleteConnectionInfo(conf) is a method in hbase trunk. In Stable hbase 0.18.1, HConnectionManager.deleteConnectionInfo(conf, true) is suitable.
          I will attach a new patch to make the test correctly after I make the changes according to Ben's suggestions.

          Show
          Sijie Guo added a comment - @Alan : Oh I am sorry. It is my fault. HConnectionManager.deleteConnectionInfo(conf) is a method in hbase trunk. In Stable hbase 0.18.1, HConnectionManager.deleteConnectionInfo(conf, true) is suitable. I will attach a new patch to make the test correctly after I make the changes according to Ben's suggestions.
          Hide
          Benjamin Reed added a comment -

          +1 for the use of Slicers. everything looks good from a code inspection point of view. (i didn't test it.) i do have two suggestions: 1) you don't fill in the schema. hbase has schema right? it would be nice to fill it in. 2) i don't think you need to implement LoadFunc if you implement Slicer. (that way you don't have all those null methods.)

          Show
          Benjamin Reed added a comment - +1 for the use of Slicers. everything looks good from a code inspection point of view. (i didn't test it.) i do have two suggestions: 1) you don't fill in the schema. hbase has schema right? it would be nice to fill it in. 2) i don't think you need to implement LoadFunc if you implement Slicer. (that way you don't have all those null methods.)
          Hide
          Alan Gates added a comment -

          Thanks for including some hbase specific unit tests.

          I'd like to ask Ben to take a look at the patch too since it involves the slicer, and he's the resident expert in that area.

          When I try to compile the tests I get the following error:

          compile-sources:
          [javac] Compiling 92 source files to /home/gates/src/pig/branches/hbase/types/build/test/classes
          [javac] /home/gates/src/pig/branches/hbase/types/test/org/apache/pig/test/TestHBaseStorage.java:103: deleteConnectionInfo(org.apache.hadoop.hbase.HBaseConfiguration,boolean) in org.apache.hadoop.hbase.client.HConnectionManager cannot be applied to (org.apache.hadoop.hbase.HBaseConfiguration)
          [javac] HConnectionManager.deleteConnectionInfo(conf);
          [javac] ^

          Show
          Alan Gates added a comment - Thanks for including some hbase specific unit tests. I'd like to ask Ben to take a look at the patch too since it involves the slicer, and he's the resident expert in that area. When I try to compile the tests I get the following error: compile-sources: [javac] Compiling 92 source files to /home/gates/src/pig/branches/hbase/types/build/test/classes [javac] /home/gates/src/pig/branches/hbase/types/test/org/apache/pig/test/TestHBaseStorage.java:103: deleteConnectionInfo(org.apache.hadoop.hbase.HBaseConfiguration,boolean) in org.apache.hadoop.hbase.client.HConnectionManager cannot be applied to (org.apache.hadoop.hbase.HBaseConfiguration) [javac] HConnectionManager.deleteConnectionInfo(conf); [javac] ^
          Hide
          Alan Gates added a comment -

          I've downloaded the patch and the jars. I'll take a look at it in the next day or two and give some feedback.

          Show
          Alan Gates added a comment - I've downloaded the patch and the jars. I'll take a look at it in the next day or two and give some feedback.
          Hide
          Sijie Guo added a comment -

          attach the hbase jar.
          and also the hbase-test jar (the test case TestHBaseStorage needs hbase-test jar).

          Show
          Sijie Guo added a comment - attach the hbase jar. and also the hbase-test jar (the test case TestHBaseStorage needs hbase-test jar).
          Hide
          Sijie Guo added a comment -

          Attach my patch.

          Now it implements the function of loading a table from hbase into pig to do data-processing.

          Usage:
          > register ./hbase-0.18.1.jar
          > raw = load 'tablename' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('f1:c1 f2:c2 f3:c3') as (field1, field2, field3);

          the parameter of HBaseStorage is a column list delimited by space.

          Welcome comments on my patch. so I can know how to improve it.

          Show
          Sijie Guo added a comment - Attach my patch. Now it implements the function of loading a table from hbase into pig to do data-processing. Usage: > register ./hbase-0.18.1.jar > raw = load 'tablename' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('f1:c1 f2:c2 f3:c3') as (field1, field2, field3); the parameter of HBaseStorage is a column list delimited by space. Welcome comments on my patch. so I can know how to improve it.
          Hide
          Sijie Guo added a comment -

          Thanks Alan

          hmm, it is better to keep the load/store syntax. What Alan said makes me clear.

          I suggest that syntax like" A = load 'tablename' Using HbaseLoader("Hbase connection info, ..., query info"); " may be better. The 'tablename' in the statement means the target like filename, and we let LoadFunc to do hbase-related things include hbase's query.

          And we just need a new slicer, a new loader. It is simply. Thanks Alan again

          Show
          Sijie Guo added a comment - Thanks Alan hmm, it is better to keep the load/store syntax. What Alan said makes me clear. I suggest that syntax like" A = load 'tablename' Using HbaseLoader("Hbase connection info, ..., query info"); " may be better. The 'tablename' in the statement means the target like filename, and we let LoadFunc to do hbase-related things include hbase's query. And we just need a new slicer, a new loader. It is simply. Thanks Alan again
          Hide
          Alan Gates added a comment -

          A couple of rudimentary comments. My knowledge of hbase is limited, so please feel free to correct assumptions I have about hbase or point me to appropriate documentation.

          1. I'd like to avoid specialized syntax for hbase type queries. Why do we need a special load and store syntax? Is it not possible to fit the necessary information into a combination of the loader constructor arguments and the filename string. Roughly like: A = load 'hbase query' using HBaseLoader("Hbase connection info");

          2. I like to avoid adding new operators to the logical plans, adding new DataStorage implementations, etc. I agree that a new slicer and loader will be needed. I was thinking that the loader and slicer could handle turning the results of the hbase query into records that could be passed to the rest of the pig pipeline as is, and the inverse for storage functions. Past that, why does anything else in pig need to understand hbase? Am I glossing over import details?

          Show
          Alan Gates added a comment - A couple of rudimentary comments. My knowledge of hbase is limited, so please feel free to correct assumptions I have about hbase or point me to appropriate documentation. 1. I'd like to avoid specialized syntax for hbase type queries. Why do we need a special load and store syntax? Is it not possible to fit the necessary information into a combination of the loader constructor arguments and the filename string. Roughly like: A = load 'hbase query' using HBaseLoader("Hbase connection info"); 2. I like to avoid adding new operators to the logical plans, adding new DataStorage implementations, etc. I agree that a new slicer and loader will be needed. I was thinking that the loader and slicer could handle turning the results of the hbase query into records that could be passed to the rest of the pig pipeline as is, and the inverse for storage functions. Past that, why does anything else in pig need to understand hbase? Am I glossing over import details?
          Hide
          Sijie Guo added a comment -

          My ideas about this issue.

            • Load from / Store into Table **
          • Target *
            Let pig have the ability to load from / store into the table in the bigtable-like systems (such as, hbase, hypertable, and future maybe canssandra).
          • Grammer *

          <tableloadclause> := <LOAD> "TABLE" <tablepath> "PROJECTION" <projections_list> AS <schema>
          <tablestoreclause> := <STORE> <IDENTIFIER> "PROJECTION" <INTO> "TABLE" <tablepath> <projections_list>
          <projections_list> := <projection> ["," <projections_list>]
          <projection> := "'"<string>:<string>:<string>"'"
          <tablepath> := " ' " <string>:<string>" ' "

          <tablepath> is formed by two part : "schema" and "tablename". "schema" identify the system where the table is in. "schema" may be "hbase", "hypertable" or other system.
          <projection> is formed by three part : "column_family_name", "column_name", "timestamp".

          • Examples *

          An example is below:

          – load the table 'table1' from 'hbase'
          – the operation project its "family1:column1" 's content at timestamp1 to field1
          – the operation project its "family2:" 's all contents at timestamp2 to field2
          – the operation project its "family3:" 's latest content to field3
          A = Load table 'hbase:table1' projection 'family1:column1:timestamp1', 'family2::timestamp2', 'family3::' as (field1: chararray, field2: tuple, field3:tuple);

          – do some operation over A
          B = ...A;

          – store B into 'hbase' as table 'table2'
          – projection B.$1 to 'family1:column1' with system's current timestamp
          – projection B.$2 to 'family2:column2' with timestamp v2
          Store B projection into table 'hbase:table2' 'family1:column1:', 'family2:column2:v2';

          • Data I/O over Table *

          First, We need a custom datastorage to do the table data i/o.
          Sth like:
          Public interface TableDataStorage extends DataStorage {
          }

          The TableDataStorage will abstract all the bigtable-like system.

          And,

          for Hbase, we can construct the hbase datastorage like:
          public class HbaseDataStorage implements TableDataStorage {
          }

          for Hypertable, we may have a different datastorage like:
          public class HypertableDataStorage implements TableDataStorage {
          }

          • MapReduce Stuff *

          Because table is different from file, we may need a different slice interface. Sth like:

          Public interface TableSlice extends Serializable

          { // get slice locations String[] getLocations(); // init the data storage init(TableDataStorage store) throws IOException; // get the table's name byte[] getTableName(); // get the start row of the table slice in this table. byte[] getStartRow(); // get the end row of the table slice in this table. byte[] getEndRow(); // get the cur row of the table slice in this table. byte[] getCurRow(); // get the progress float getProgress() throws IOException; // get the next tuple. boolean next(Tuple) throws IOException; }

          And we need a related table slicer:

          public interface TableSlicer

          { void validate(TableDataStorage store, String location) throws IOException; TableSlice[] slice(TableDataStorage store, String location) throws IOException; }

          Finally, the inputformat, outputformat, recordreader for map/reduce over table.

          • PigTranslation *
            Now, pig's translation can be divided into 3 steps.
            First: parser -> logical plan;
            Second: logical plan -> physical plan;
            Last: physical plan -> map/reduce plan;

          In the first two steps, we just need to add a similar operator as what file load/store do.
          Like:
          LOLoad -> LOTableLoad
          POLoad -> POTableLoad

          LOStore -> LOTableStore
          POStore -> POTableStore

          The difference is in the last step.
          When we are constucting a map/reduce job with a table load/store operation, we should use the table's map/reduce related stuff (such as inputformat, outputformat and so on) to constuct the job. And, the load/store between jobs just remain using temp files.

          so a pig script using table load/store may be like:

          source-table --> Job1(using table inputformat) -----> tempfiles(piginputformat/pigoutputformat) -----> job2 -----> .... -----> jobN ------> target-table(using table outputformat)

          • Other Problems *
            There may be other optimization problems using table for data-processing. These problems are not considering in the solution to make it clear.

          Welcome for commets

          Show
          Sijie Guo added a comment - My ideas about this issue. Load from / Store into Table ** Target * Let pig have the ability to load from / store into the table in the bigtable-like systems (such as, hbase, hypertable, and future maybe canssandra). Grammer * <tableloadclause> := <LOAD> "TABLE" <tablepath> "PROJECTION" <projections_list> AS <schema> <tablestoreclause> := <STORE> <IDENTIFIER> "PROJECTION" <INTO> "TABLE" <tablepath> <projections_list> <projections_list> := <projection> ["," <projections_list>] <projection> := "'"<string>:<string>:<string>"'" <tablepath> := " ' " <string>:<string>" ' " <tablepath> is formed by two part : "schema" and "tablename". "schema" identify the system where the table is in. "schema" may be "hbase", "hypertable" or other system. <projection> is formed by three part : "column_family_name", "column_name", "timestamp". Examples * An example is below: – load the table 'table1' from 'hbase' – the operation project its "family1:column1" 's content at timestamp1 to field1 – the operation project its "family2:" 's all contents at timestamp2 to field2 – the operation project its "family3:" 's latest content to field3 A = Load table 'hbase:table1' projection 'family1:column1:timestamp1', 'family2::timestamp2', 'family3::' as (field1: chararray, field2: tuple, field3:tuple); – do some operation over A B = ...A; – store B into 'hbase' as table 'table2' – projection B.$1 to 'family1:column1' with system's current timestamp – projection B.$2 to 'family2:column2' with timestamp v2 Store B projection into table 'hbase:table2' 'family1:column1:', 'family2:column2:v2'; Data I/O over Table * First, We need a custom datastorage to do the table data i/o. Sth like: Public interface TableDataStorage extends DataStorage { } The TableDataStorage will abstract all the bigtable-like system. And, for Hbase, we can construct the hbase datastorage like: public class HbaseDataStorage implements TableDataStorage { } for Hypertable, we may have a different datastorage like: public class HypertableDataStorage implements TableDataStorage { } MapReduce Stuff * Because table is different from file, we may need a different slice interface. Sth like: Public interface TableSlice extends Serializable { // get slice locations String[] getLocations(); // init the data storage init(TableDataStorage store) throws IOException; // get the table's name byte[] getTableName(); // get the start row of the table slice in this table. byte[] getStartRow(); // get the end row of the table slice in this table. byte[] getEndRow(); // get the cur row of the table slice in this table. byte[] getCurRow(); // get the progress float getProgress() throws IOException; // get the next tuple. boolean next(Tuple) throws IOException; } And we need a related table slicer: public interface TableSlicer { void validate(TableDataStorage store, String location) throws IOException; TableSlice[] slice(TableDataStorage store, String location) throws IOException; } Finally, the inputformat, outputformat, recordreader for map/reduce over table. PigTranslation * Now, pig's translation can be divided into 3 steps. First: parser -> logical plan; Second: logical plan -> physical plan; Last: physical plan -> map/reduce plan; In the first two steps, we just need to add a similar operator as what file load/store do. Like: LOLoad -> LOTableLoad POLoad -> POTableLoad LOStore -> LOTableStore POStore -> POTableStore The difference is in the last step. When we are constucting a map/reduce job with a table load/store operation, we should use the table's map/reduce related stuff (such as inputformat, outputformat and so on) to constuct the job. And, the load/store between jobs just remain using temp files. so a pig script using table load/store may be like: source-table --> Job1(using table inputformat) -----> tempfiles(piginputformat/pigoutputformat) -----> job2 -----> .... -----> jobN ------> target-table(using table outputformat) Other Problems * There may be other optimization problems using table for data-processing. These problems are not considering in the solution to make it clear. Welcome for commets
          Hide
          Sijie Guo added a comment -

          hmm, thanks Alan.

          I am looking in the types branch. And I will post an outline of my work on it, if I start to work on this.

          Show
          Sijie Guo added a comment - hmm, thanks Alan. I am looking in the types branch. And I will post an outline of my work on it, if I start to work on this.
          Hide
          Alan Gates added a comment -

          Most certainly. Contributions are always welcomed.

          A couple of pointers. One, most new work is going into the types branch (which will soon be merged into trunk and be released as 0.2.0, see http://wiki.apache.org/pig/TrunkToTypesChanges for more info). So it would be much better if the work could be done on that branch.

          Two, since this is a non-trivial addition to pig, it would be great if you could start with an outline of the changes you intend to make, so others can review it and contribute their ideas. This outline can be given in this JIRA, or you can write a separate wiki page and post it on the pig wiki and reference it in this JIRA.

          Show
          Alan Gates added a comment - Most certainly. Contributions are always welcomed. A couple of pointers. One, most new work is going into the types branch (which will soon be merged into trunk and be released as 0.2.0, see http://wiki.apache.org/pig/TrunkToTypesChanges for more info). So it would be much better if the work could be done on that branch. Two, since this is a non-trivial addition to pig, it would be great if you could start with an outline of the changes you intend to make, so others can review it and contribute their ideas. This outline can be given in this JIRA, or you can write a separate wiki page and post it on the pig wiki and reference it in this JIRA.
          Hide
          Olga Natkovich added a comment -

          Of course! We welocme all the contributions!

          Show
          Olga Natkovich added a comment - Of course! We welocme all the contributions!
          Hide
          Sijie Guo added a comment -

          I would like to try myself to work on it. Could I ?

          Show
          Sijie Guo added a comment - I would like to try myself to work on it. Could I ?
          Hide
          Olga Natkovich added a comment -

          I don't think anybody is actively working on it though several people showed interest in using this feature.

          Show
          Olga Natkovich added a comment - I don't think anybody is actively working on it though several people showed interest in using this feature.
          Hide
          Sijie Guo added a comment -

          Is this in progress?

          Show
          Sijie Guo added a comment - Is this in progress?
          Hide
          Edward J. Yoon added a comment -

          Come to think of it, I think query-delimited? is a strange term.
          B = LOAD HbaseQuery('Hbase Query');...?

          I will clarify the grammars.

          Show
          Edward J. Yoon added a comment - Come to think of it, I think query-delimited? is a strange term. B = LOAD HbaseQuery('Hbase Query');...? I will clarify the grammars.
          Hide
          Edward J. Yoon added a comment -

          Oh! I see it. Thanks, Dr.Ted.

          Show
          Edward J. Yoon added a comment - Oh! I see it. Thanks, Dr.Ted.
          Hide
          Ted Dunning added a comment -

          On the both statements, I would think that the following syntax would be simpler:

          B = LOAD HbaseQuery('Query-delimited by attributes & timestamp') as (f1, f2[, f3]);

          There is really no need for separate syntax for tables and queries since the entire contents of a table can be had with a simple query.

          There should also be a symmetrical syntax for STORE.

          Show
          Ted Dunning added a comment - On the both statements, I would think that the following syntax would be simpler: B = LOAD HbaseQuery('Query-delimited by attributes & timestamp') as (f1, f2 [, f3] ); There is really no need for separate syntax for tables and queries since the entire contents of a table can be had with a simple query. There should also be a symmetrical syntax for STORE.
          Hide
          Edward J. Yoon added a comment -

          Change the summary

          Show
          Edward J. Yoon added a comment - Change the summary

            People

            • Assignee:
              Sijie Guo
              Reporter:
              Edward J. Yoon
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development