Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Implemented
    • Affects Version/s: 0.9
    • Fix Version/s: 1.3.0
    • Component/s: Table API & SQL
    • Labels:
      None

      Description

      Add a HBaseTableSource to read data from a HBase table. The HBaseTableSource should implement the ProjectableTableSource (FLINK-3848) and FilterableTableSource (FLINK-3849) interfaces.

      The implementation can be based on Flink's TableInputFormat.

        Issue Links

          Activity

          Hide
          wilmer Wilmer DAZA added a comment - - edited

          Hello Fabian Hueske, I saw this as an starter issue, and I want to start to contribute to Apache Flink, I would love to take this issue and try to solve it. Thank you

          Show
          wilmer Wilmer DAZA added a comment - - edited Hello Fabian Hueske , I saw this as an starter issue, and I want to start to contribute to Apache Flink, I would love to take this issue and try to solve it. Thank you
          Hide
          fhueske Fabian Hueske added a comment -

          Hi, glad to hear that you would like to contribute to Flink!
          I assigned the issue to you.

          Please let us know, if you have any questions.
          Looking forward to your contribution, Fabian

          Show
          fhueske Fabian Hueske added a comment - Hi, glad to hear that you would like to contribute to Flink! I assigned the issue to you. Please let us know, if you have any questions. Looking forward to your contribution, Fabian
          Hide
          aljoscha Aljoscha Krettek added a comment - - edited

          There is currently an open PR (https://github.com/apache/flink/pull/1127) by Timo Walther that adds fromHCat to TableEnvironment along with all the required plumbing. So maybe you should wait for him, or at least coordinate with him.

          Show
          aljoscha Aljoscha Krettek added a comment - - edited There is currently an open PR ( https://github.com/apache/flink/pull/1127 ) by Timo Walther that adds fromHCat to TableEnvironment along with all the required plumbing. So maybe you should wait for him, or at least coordinate with him.
          Hide
          wilmer Wilmer DAZA added a comment -

          Aljoscha Krettek thanks so much for the info ... I'll better do that.

          Show
          wilmer Wilmer DAZA added a comment - Aljoscha Krettek thanks so much for the info ... I'll better do that.
          Hide
          rikhuprasad Rikhu Prasad added a comment - - edited

          Hi,

          Can I work on this too (As I see fromHCat is complete by now)? Possibly coordinate.

          -Rik

          Show
          rikhuprasad Rikhu Prasad added a comment - - edited Hi, Can I work on this too (As I see fromHCat is complete by now)? Possibly coordinate. -Rik
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          Is there any one working on this? If not will be happy to take this up?

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - Is there any one working on this? If not will be happy to take this up?
          Hide
          fhueske Fabian Hueske added a comment -

          The internals of the Table API are currently heavily refactored. At the moment, it does make much sense to work on this issue.
          However, adding external data source to the Table API is still on the roadmap.

          Please see the design document for SQL on Flink and the issue for porting the Table API on top of Apache Calcite FLINK-3221.

          Show
          fhueske Fabian Hueske added a comment - The internals of the Table API are currently heavily refactored. At the moment, it does make much sense to work on this issue. However, adding external data source to the Table API is still on the roadmap. Please see the design document for SQL on Flink and the issue for porting the Table API on top of Apache Calcite FLINK-3221 .
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          Fabian Hueske
          Thanks for the update.

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - Fabian Hueske Thanks for the update.
          Hide
          fhueske Fabian Hueske added a comment -

          Updated the issue description to reflect the new TableSource interface.

          Show
          fhueske Fabian Hueske added a comment - Updated the issue description to reflect the new TableSource interface.
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          Can I take this issue now? Will come with a PR may be next week Fabian Hueske?

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - Can I take this issue now? Will come with a PR may be next week Fabian Hueske ?
          Hide
          fhueske Fabian Hueske added a comment -

          Hi ramkrishna.s.vasudevan,

          this issue depends on FLINK-3848 and FLINK-3849.
          I'll open PR for those in the next days. Then you should be ready to go

          Cheers, Fabian

          Show
          fhueske Fabian Hueske added a comment - Hi ramkrishna.s.vasudevan , this issue depends on FLINK-3848 and FLINK-3849 . I'll open PR for those in the next days. Then you should be ready to go Cheers, Fabian
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          Fabian Hueske
          Can I work on this now? I can see that FLINK-3848 is done.

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - Fabian Hueske Can I work on this now? I can see that FLINK-3848 is done.
          Hide
          fhueske Fabian Hueske added a comment -

          Hi ramkrishna.s.vasudevan, yes. That would be great.
          We should wait for FLINK-5280 as well, but this issue is very close to be resolved.

          I'll assign the issue to you.

          Show
          fhueske Fabian Hueske added a comment - Hi ramkrishna.s.vasudevan , yes. That would be great. We should wait for FLINK-5280 as well, but this issue is very close to be resolved. I'll assign the issue to you.
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          Thanks Fabian Hueske.
          I was going thro the related JIRAs. I was just following how CSVTableSource works here.
          Initial questions

          • Should we see the HBase tables as NOSQL table or like a normal table with a proper schema which defines the set of columns per row?
          • In HBase the columns can have same name but may come under different column families. So how we bring that abstracted view?
          • Next is that, we do scan of an hbase table and the result that we get is in the form of bytes. Where can we do the type conversion to String, double, long etc? Or may be that is not needed for now? Or may be calcite is of help here?

          Sorry if my questions are naive here - after some discussion I think we can discuss on the design part.

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - Thanks Fabian Hueske . I was going thro the related JIRAs. I was just following how CSVTableSource works here. Initial questions Should we see the HBase tables as NOSQL table or like a normal table with a proper schema which defines the set of columns per row? In HBase the columns can have same name but may come under different column families. So how we bring that abstracted view? Next is that, we do scan of an hbase table and the result that we get is in the form of bytes. Where can we do the type conversion to String, double, long etc? Or may be that is not needed for now? Or may be calcite is of help here? Sorry if my questions are naive here - after some discussion I think we can discuss on the design part.
          Hide
          fhueske Fabian Hueske added a comment -

          Hi ramkrishna.s.vasudevan,

          • Calcite requires a relational schema. So we cannot support flexible schema. Nested data is supported, i.e., a field can be of a complex type such as a POJO and Calcite/Flink can access the fields of the POJO.
          • We need unique field names. The user could explicitly specify how fields are named. We can also use HBase's column names by default and throw an exception if we observe a name collision and request explicit names.
          • The HBase table source should return proper types (primitives or objects), otherwise it won't be usable. The user should tell the table source which types are stored in the columns and how they can be deserialized.
          Show
          fhueske Fabian Hueske added a comment - Hi ramkrishna.s.vasudevan , Calcite requires a relational schema. So we cannot support flexible schema. Nested data is supported, i.e., a field can be of a complex type such as a POJO and Calcite/Flink can access the fields of the POJO. We need unique field names. The user could explicitly specify how fields are named. We can also use HBase's column names by default and throw an exception if we observe a name collision and request explicit names. The HBase table source should return proper types (primitives or objects), otherwise it won't be usable. The user should tell the table source which types are stored in the columns and how they can be deserialized.
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          The timing was perfect. I was reading thro some more examples here in Table APIs and was about to ping here.

          Calcite requires a relational schema. So we cannot support flexible schema.

          Ok. For now we can assume the HBase table to have a fixed schema - say read from a table that was loaded using a flat file (like CSV).

          We need unique field names. The user could explicitly specify how fields are named.

          Ok. Suppose we say

          select * from table where 'col' = 10
          

          So we need to implement the way this SQL is going to be implemented. Something like what Apache Phoenix-Calcite integration does? Calcite can take care of the grammer parsing but we need to convert what Calcite model gives us to actual queries on hbase and in this case include some filters etc. So this JIRA will be doing that big stuff also?

          And coming to the data type - say the user wants the table 'col' to be integer so once the data is written to this col we would convert the int to an byte[] and store it. So on retrieval we should use the exact serde format that Flink is aware of right?

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - The timing was perfect. I was reading thro some more examples here in Table APIs and was about to ping here. Calcite requires a relational schema. So we cannot support flexible schema. Ok. For now we can assume the HBase table to have a fixed schema - say read from a table that was loaded using a flat file (like CSV). We need unique field names. The user could explicitly specify how fields are named. Ok. Suppose we say select * from table where 'col' = 10 So we need to implement the way this SQL is going to be implemented. Something like what Apache Phoenix-Calcite integration does? Calcite can take care of the grammer parsing but we need to convert what Calcite model gives us to actual queries on hbase and in this case include some filters etc. So this JIRA will be doing that big stuff also? And coming to the data type - say the user wants the table 'col' to be integer so once the data is written to this col we would convert the int to an byte[] and store it. So on retrieval we should use the exact serde format that Flink is aware of right?
          Hide
          fhueske Fabian Hueske added a comment -

          I think we should limit this issue to implement a projectable BatchTableSource for HBase.
          As such, all we need to do is to return a DataSet of a specific type. There is no need to do any query parsing or query push down at the moment.

          A user needs to configure the table source with the following information:

          • name of the HBase table to scan
          • a list of columns to scan with types and deserializers to convert the byte[] into the column type.

          Given this information, the table source needs to connect to HBase, fetch the columns, deserialize the fields, and create the output records.
          A projectable table source also needs to be able to restrict the column based on a set of columns it is configured with. But which columns is automatically figured out by Calcite.

          Show
          fhueske Fabian Hueske added a comment - I think we should limit this issue to implement a projectable BatchTableSource for HBase. As such, all we need to do is to return a DataSet of a specific type. There is no need to do any query parsing or query push down at the moment. A user needs to configure the table source with the following information: name of the HBase table to scan a list of columns to scan with types and deserializers to convert the byte[] into the column type. Given this information, the table source needs to connect to HBase, fetch the columns, deserialize the fields, and create the output records. A projectable table source also needs to be able to restrict the column based on a set of columns it is configured with. But which columns is automatically figured out by Calcite.
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          Thanks for the info. Am reading this and will get back here Fabian Hueske.

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - Thanks for the info. Am reading this and will get back here Fabian Hueske .
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          One suggestion is that, though we are going to issue a scan (batch read) or get (random read) it is better we specify the family and qualifier to be used in that read. Otherwise we end up returning more results and on the result we need to filter out the fieldNAmes that were passed. Assuming that the fiedNames are unique and there are no same colNames in two different families.

          For creating the DataSet, need to create some input format that implements ResultTypeQueryable? Am I right? And that is where the conversion of the byte[] result to the TypeInfo specified happens I believe.

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - One suggestion is that, though we are going to issue a scan (batch read) or get (random read) it is better we specify the family and qualifier to be used in that read. Otherwise we end up returning more results and on the result we need to filter out the fieldNAmes that were passed. Assuming that the fiedNames are unique and there are no same colNames in two different families. For creating the DataSet, need to create some input format that implements ResultTypeQueryable? Am I right? And that is where the conversion of the byte[] result to the TypeInfo specified happens I believe.
          Hide
          fhueske Fabian Hueske added a comment -

          For now we should only support scan and not get.
          Specifying which columns to read is exactly the functionality that the `ProjectableTableSource` provides.
          When the table source is defined, it maps some columns of an HBase table to a relational schema with unique names. During optimization, the table source is configured with the actual columns to fetch.

          We definitely need some kind of InputFormat to talk to HBase. Flink features a TableInputFormat which could serve as a basis for the HBase table source. The deserialization of byte arrays can either happen in the input format or in a subsequent Map function.

          Show
          fhueske Fabian Hueske added a comment - For now we should only support scan and not get. Specifying which columns to read is exactly the functionality that the `ProjectableTableSource` provides. When the table source is defined, it maps some columns of an HBase table to a relational schema with unique names. During optimization, the table source is configured with the actual columns to fetch. We definitely need some kind of InputFormat to talk to HBase. Flink features a TableInputFormat which could serve as a basis for the HBase table source. The deserialization of byte arrays can either happen in the input format or in a subsequent Map function.
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          Few questions on the package structure
          Where should the HBaseTableSource.scala be added? Should that be inside flink-libraries/flink-table?
          And where should the HBaseTableInputFormat be added? If I add it in the module flink-java like the other input formats then we don't have HBase reference in the pom? So can we modify the pom.xml of flink-java to have HBase library in it?
          If I move the HBaseTableInputFormat to flink-connector/flink-hbase then the HBaseTableSource.scala cannot access the HBaseTableInputFormat .
          Any suggestions here Fabian Hueske?

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - Few questions on the package structure Where should the HBaseTableSource.scala be added? Should that be inside flink-libraries/flink-table? And where should the HBaseTableInputFormat be added? If I add it in the module flink-java like the other input formats then we don't have HBase reference in the pom? So can we modify the pom.xml of flink-java to have HBase library in it? If I move the HBaseTableInputFormat to flink-connector/flink-hbase then the HBaseTableSource.scala cannot access the HBaseTableInputFormat . Any suggestions here Fabian Hueske ?
          Hide
          fhueske Fabian Hueske added a comment -

          TableSources should go into the respective connector module, i.e, the HBaseTableSource should go into flink-hbase.
          Why do you think you could not access the TableInputFormat which is located in the same module? However, I think we would need a custom TableInputFormat anyway which returns Row instead of Tuple which is restricted to 25 fields and does not support null.
          Since the code in flink-hbase is implemented in Java, the new HBaseTableSource should do the same.
          Thanks,Fabian

          Show
          fhueske Fabian Hueske added a comment - TableSources should go into the respective connector module, i.e, the HBaseTableSource should go into flink-hbase. Why do you think you could not access the TableInputFormat which is located in the same module? However, I think we would need a custom TableInputFormat anyway which returns Row instead of Tuple which is restricted to 25 fields and does not support null. Since the code in flink-hbase is implemented in Java, the new HBaseTableSource should do the same. Thanks,Fabian
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          However, I think we would need a custom TableInputFormat anyway which returns Row instead of Tuple which is restricted to 25 fields and does not support null.

          This was the reason. Actually I did not extend the TableInputFormat rather extended RichInputFormat.
          I created HBaseTableSource in flink-table and I created it as scala. I can change them to Java. Thanks Fabian Hueske.

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - However, I think we would need a custom TableInputFormat anyway which returns Row instead of Tuple which is restricted to 25 fields and does not support null. This was the reason. Actually I did not extend the TableInputFormat rather extended RichInputFormat. I created HBaseTableSource in flink-table and I created it as scala. I can change them to Java. Thanks Fabian Hueske .
          Hide
          ram_krish ramkrishna.s.vasudevan added a comment -

          Trying your suggestions

          Since the code in flink-hbase is implemented in Java, the new HBaseTableSource should do the same.

          If I try making HBaseTableSource in java I don't have BatchTableSource or ProjectableTableSource in scala -so I need to create them first? And then link flink-hbase under flink-connectors to make use of flink-table?

          Show
          ram_krish ramkrishna.s.vasudevan added a comment - Trying your suggestions Since the code in flink-hbase is implemented in Java, the new HBaseTableSource should do the same. If I try making HBaseTableSource in java I don't have BatchTableSource or ProjectableTableSource in scala -so I need to create them first? And then link flink-hbase under flink-connectors to make use of flink-table?
          Hide
          fhueske Fabian Hueske added a comment -

          There is no need to create Java interfaces for table sources. Scala and Java work (mostly) well together.
          The TableSource traits are implemented in a way that they can be used as regular Java interfaces. See KafkaTableSource as an example.

          Show
          fhueske Fabian Hueske added a comment - There is no need to create Java interfaces for table sources. Scala and Java work (mostly) well together. The TableSource traits are implemented in a way that they can be used as regular Java interfaces. See KafkaTableSource as an example.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user ramkrish86 opened a pull request:

          https://github.com/apache/flink/pull/3149

          FLINK-2168 Add HBaseTableSource

          Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
          If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
          In addition to going through the list, please provide a meaningful description of your changes.

          • [ ] General
          • The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
          • The pull request addresses only one issue
          • Each commit in the PR has a meaningful commit message (including the JIRA id)
          • [ ] Documentation
          • Documentation has been added for new functionality
          • Old documentation affected by the pull request has been updated
          • JavaDoc for public methods has been added
          • [ ] Tests & Build
          • Functionality added by the pull request is covered by tests
          • `mvn clean verify` has been executed successfully locally or a Travis build has passed

          @fhueske
          Trying to create the first version of this PR. I have made the necessary changes to support HBaseTableSource by creating a HBaseTableInputFormat but lot of code is duplicated with TableInputFormat. I have not unified them for now.
          I tried compiling this code in my linux box but the @Override that I have added in HBaseTableSource overriding the BatchTableSource API shows as compilation issues but my IntelliJ IDE is fine and does not complain.
          Pls provide your valuable feed back so that I can rebase the next PR with suitable fixes.
          Thanks.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/ramkrish86/flink FLINK-2168

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3149.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3149


          commit 509c294fe64a3690d93e011aa54a9dd25302a122
          Author: Ramkrishna <ramkrishna.s.vasudevan@intel.com>
          Date: 2017-01-18T06:57:23Z

          FLINK-2168 Add HBaseTableSource


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user ramkrish86 opened a pull request: https://github.com/apache/flink/pull/3149 FLINK-2168 Add HBaseTableSource Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide] ( http://flink.apache.org/how-to-contribute.html ). In addition to going through the list, please provide a meaningful description of your changes. [ ] General The pull request references the related JIRA issue (" [FLINK-XXX] Jira title text") The pull request addresses only one issue Each commit in the PR has a meaningful commit message (including the JIRA id) [ ] Documentation Documentation has been added for new functionality Old documentation affected by the pull request has been updated JavaDoc for public methods has been added [ ] Tests & Build Functionality added by the pull request is covered by tests `mvn clean verify` has been executed successfully locally or a Travis build has passed @fhueske Trying to create the first version of this PR. I have made the necessary changes to support HBaseTableSource by creating a HBaseTableInputFormat but lot of code is duplicated with TableInputFormat. I have not unified them for now. I tried compiling this code in my linux box but the @Override that I have added in HBaseTableSource overriding the BatchTableSource API shows as compilation issues but my IntelliJ IDE is fine and does not complain. Pls provide your valuable feed back so that I can rebase the next PR with suitable fixes. Thanks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ramkrish86/flink FLINK-2168 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3149.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3149 commit 509c294fe64a3690d93e011aa54a9dd25302a122 Author: Ramkrishna <ramkrishna.s.vasudevan@intel.com> Date: 2017-01-18T06:57:23Z FLINK-2168 Add HBaseTableSource
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on the issue:

          https://github.com/apache/flink/pull/3149

          And one more thing is that other than the BasicTypeInfo what other types should we support. I was not sure on that so added a TODO there.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 And one more thing is that other than the BasicTypeInfo what other types should we support. I was not sure on that so added a TODO there.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on the issue:

          https://github.com/apache/flink/pull/3149

          You need to recompile `TableSource` trait manually and implement `DefinedFieldNames` in `HBaseTableSource`

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3149 You need to recompile `TableSource` trait manually and implement `DefinedFieldNames` in `HBaseTableSource`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3149

          @tonycox is right. Please rebase your PR to the current master branch. The `TableSource` interface was recently modified.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3149 @tonycox is right. Please rebase your PR to the current master branch. The `TableSource` interface was recently modified.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on the issue:

          https://github.com/apache/flink/pull/3149

          I see. Am not sure how I missed that because my IDE I thought was already updated with latest code. Will check it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 I see. Am not sure how I missed that because my IDE I thought was already updated with latest code. Will check it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on the issue:

          https://github.com/apache/flink/pull/3149

          @ramkrish86 @fhueske what do you think about to throw `Tuple` (`T extends Tuple`) out of `org.apache.flink.addons.hbase.TableInputFormat` and implement this abstract class in your `HBaseTableSourceInputFormat` ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3149 @ramkrish86 @fhueske what do you think about to throw `Tuple` (`T extends Tuple`) out of `org.apache.flink.addons.hbase.TableInputFormat` and implement this abstract class in your `HBaseTableSourceInputFormat` ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on the issue:

          https://github.com/apache/flink/pull/3149

          Thanks @tonycox .Yes I am fine with it. I can try. Anyway I t hink there is more to do from my side. Now am not sure how to register the table with the valid family name and column name. It is only registering with the table and it is not resolving '.select("f1:q1, f1:q2, f1:q3");'.
          I am able to run the test case only now and so finding these. I will wait for comments and then go on with the updation of the PR.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 Thanks @tonycox .Yes I am fine with it. I can try. Anyway I t hink there is more to do from my side. Now am not sure how to register the table with the valid family name and column name. It is only registering with the table and it is not resolving '.select("f1:q1, f1:q2, f1:q3");'. I am able to run the test case only now and so finding these. I will wait for comments and then go on with the updation of the PR.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on the issue:

          https://github.com/apache/flink/pull/3149

          And could you extend with `StreamTableSource` also ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3149 And could you extend with `StreamTableSource` also ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on the issue:

          https://github.com/apache/flink/pull/3149

          So am not sure if we can add specific API in table.api for hbase? Is that allowed?

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 So am not sure if we can add specific API in table.api for hbase? Is that allowed?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96665689

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,117 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.Test;
          +import scala.tools.cmd.gen.AnyVals;
          +
          +import java.util.ArrayList;
          +import java.util.List;
          +
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          +
          + @Test
          + public void testHBaseTableSource() throws Exception {
          + // create a table with single region
          + TableName tableName = TableName.valueOf("test");
          + createTable(tableName, F_1, new byte[1][]);
          + // get the htable instance
          + HTable table = openTable(tableName);
          + List<Put> puts = new ArrayList<Put>();
          + // add some data
          + Put put = new Put(ROW_1);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(100));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19991l));
          — End diff –

          I think there should be `Q_3`

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96665689 — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java — @@ -0,0 +1,117 @@ +/* + * Copyright The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase.example; + +import org.apache.flink.addons.hbase.HBaseTableSource; +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter; +import org.apache.flink.api.common.typeinfo.BasicTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableConfig; +import org.apache.flink.table.api.TableEnvironment; +import org.apache.flink.table.api.java.BatchTableEnvironment; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Put; +import org.apache.hadoop.hbase.util.Bytes; +import org.junit.Test; +import scala.tools.cmd.gen.AnyVals; + +import java.util.ArrayList; +import java.util.List; + + +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter { + + + public static final byte[] ROW_1 = Bytes.toBytes("row1"); + public static final byte[] ROW_2 = Bytes.toBytes("row2"); + public static final byte[] ROW_3 = Bytes.toBytes("row3"); + public static final byte[] F_1 = Bytes.toBytes("f1"); + public static final byte[] Q_1 = Bytes.toBytes("q1"); + public static final byte[] Q_2 = Bytes.toBytes("q2"); + public static final byte[] Q_3 = Bytes.toBytes("q3"); + + @Test + public void testHBaseTableSource() throws Exception { + // create a table with single region + TableName tableName = TableName.valueOf("test"); + createTable(tableName, F_1, new byte [1] []); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19991l)); — End diff – I think there should be `Q_3`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96665731

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,117 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.Test;
          +import scala.tools.cmd.gen.AnyVals;
          +
          +import java.util.ArrayList;
          +import java.util.List;
          +
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          +
          + @Test
          + public void testHBaseTableSource() throws Exception {
          + // create a table with single region
          + TableName tableName = TableName.valueOf("test");
          + createTable(tableName, F_1, new byte[1][]);
          + // get the htable instance
          + HTable table = openTable(tableName);
          + List<Put> puts = new ArrayList<Put>();
          + // add some data
          + Put put = new Put(ROW_1);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(100));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19991l));
          + puts.add(put);
          +
          + put = new Put(ROW_2);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(101));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19992l));
          — End diff –

          same as above

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96665731 — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java — @@ -0,0 +1,117 @@ +/* + * Copyright The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase.example; + +import org.apache.flink.addons.hbase.HBaseTableSource; +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter; +import org.apache.flink.api.common.typeinfo.BasicTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableConfig; +import org.apache.flink.table.api.TableEnvironment; +import org.apache.flink.table.api.java.BatchTableEnvironment; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Put; +import org.apache.hadoop.hbase.util.Bytes; +import org.junit.Test; +import scala.tools.cmd.gen.AnyVals; + +import java.util.ArrayList; +import java.util.List; + + +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter { + + + public static final byte[] ROW_1 = Bytes.toBytes("row1"); + public static final byte[] ROW_2 = Bytes.toBytes("row2"); + public static final byte[] ROW_3 = Bytes.toBytes("row3"); + public static final byte[] F_1 = Bytes.toBytes("f1"); + public static final byte[] Q_1 = Bytes.toBytes("q1"); + public static final byte[] Q_2 = Bytes.toBytes("q2"); + public static final byte[] Q_3 = Bytes.toBytes("q3"); + + @Test + public void testHBaseTableSource() throws Exception { + // create a table with single region + TableName tableName = TableName.valueOf("test"); + createTable(tableName, F_1, new byte [1] []); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(101)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19992l)); — End diff – same as above
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96665755

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,117 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.Test;
          +import scala.tools.cmd.gen.AnyVals;
          +
          +import java.util.ArrayList;
          +import java.util.List;
          +
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          +
          + @Test
          + public void testHBaseTableSource() throws Exception {
          + // create a table with single region
          + TableName tableName = TableName.valueOf("test");
          + createTable(tableName, F_1, new byte[1][]);
          + // get the htable instance
          + HTable table = openTable(tableName);
          + List<Put> puts = new ArrayList<Put>();
          + // add some data
          + Put put = new Put(ROW_1);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(100));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19991l));
          + puts.add(put);
          +
          + put = new Put(ROW_2);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(101));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19992l));
          + puts.add(put);
          +
          + put = new Put(ROW_3);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(102));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19993l));
          — End diff –

          same as above

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96665755 — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java — @@ -0,0 +1,117 @@ +/* + * Copyright The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase.example; + +import org.apache.flink.addons.hbase.HBaseTableSource; +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter; +import org.apache.flink.api.common.typeinfo.BasicTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableConfig; +import org.apache.flink.table.api.TableEnvironment; +import org.apache.flink.table.api.java.BatchTableEnvironment; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Put; +import org.apache.hadoop.hbase.util.Bytes; +import org.junit.Test; +import scala.tools.cmd.gen.AnyVals; + +import java.util.ArrayList; +import java.util.List; + + +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter { + + + public static final byte[] ROW_1 = Bytes.toBytes("row1"); + public static final byte[] ROW_2 = Bytes.toBytes("row2"); + public static final byte[] ROW_3 = Bytes.toBytes("row3"); + public static final byte[] F_1 = Bytes.toBytes("f1"); + public static final byte[] Q_1 = Bytes.toBytes("q1"); + public static final byte[] Q_2 = Bytes.toBytes("q2"); + public static final byte[] Q_3 = Bytes.toBytes("q3"); + + @Test + public void testHBaseTableSource() throws Exception { + // create a table with single region + TableName tableName = TableName.valueOf("test"); + createTable(tableName, F_1, new byte [1] []); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(101)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19993l)); — End diff – same as above
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96668950

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,322 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
          +import org.apache.flink.api.common.io.RichInputFormat;
          +import org.apache.flink.api.common.io.statistics.BaseStatistics;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.core.io.InputSplitAssigner;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.client.Table;
          +import org.apache.hadoop.hbase.client.ClusterConnection;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.ResultScanner;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.HRegionLocator;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.ArrayList;
          +import java.util.Date;
          +import java.util.List;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private TypeInformation[] fieldTypeInfos;
          + private String[] fieldNames;
          + private transient Table table;
          + private transient Scan scan;
          + private transient Connection conn;
          + private ResultScanner resultScanner = null;
          +
          + private byte[] lastRow;
          + private int scannedRows;
          + private boolean endReached = false;
          + private org.apache.hadoop.conf.Configuration conf;
          + private static final String COLON = ":";
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos)

          { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = createScanner(); + }

          + }
          +
          + private Scan createScanner() {
          + Scan scan = new Scan();
          + for(String field : fieldNames)

          { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + }

          + return scan;
          + }
          +
          + private void connectToTable() {
          + //use files found in the classpath
          + if(this.conf == null)

          { + this.conf = HBaseConfiguration.create(); + }

          + try

          { + conn = ConnectionFactory.createConnection(this.conf); + }

          catch(IOException ioe)

          { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + }

          + try

          { + table = conn.getTable(TableName.valueOf(tableName)); + }

          catch(TableNotFoundException tnfe)

          { + LOG.error("The table " + tableName + " not found ", tnfe); + }

          catch(IOException ioe)

          { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + }

          + }
          +
          + @Override
          + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException

          { + return null; + }

          +
          + @Override
          + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException {
          + if (table == null)

          { + throw new IOException("The HBase table has not been opened!"); + }
          + if (scan == null) { + throw new IOException("getScanner returned null"); + }
          +
          + //Gets the starting and ending row keys for every region in the currently open table
          + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn);
          + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys();
          + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + }
          + final byte[] startRow = scan.getStartRow();
          + final byte[] stopRow = scan.getStopRow();
          + final boolean scanWithNoLowerBound = startRow.length == 0;
          + final boolean scanWithNoUpperBound = stopRow.length == 0;
          +
          + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits);
          + for (int i = 0; i < keys.getFirst().length; i++) {
          + final byte[] startKey = keys.getFirst()[i];
          + final byte[] endKey = keys.getSecond()[i];
          + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort();
          + //Test if the given region is to be included in the InputSplit while splitting the regions of a table
          + if (!includeRegionInSplit(startKey, endKey)) { + continue; + }
          + //Finds the region on which the given row is being served
          + final String[] hosts = new String[]{regionLocation};
          +
          + // determine if regions contains keys used by the scan
          + boolean isLastRegion = endKey.length == 0;
          + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) &&
          + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + }
          + }
          + LOG.info("Created " + splits.size() + " splits");
          + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + }
          + return splits.toArray(new TableInputSplit[0]);
          + }
          +
          + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + }
          +
          + @Override
          + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + }
          +
          + @Override
          + public void open(TableInputSplit split) throws IOException {
          + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + }

          + if (scan == null)

          { + throw new IOException("getScanner returned null"); + }

          + if (split == null)

          { + throw new IOException("Input split is null!"); + }

          +
          + logSplitInfo("opening", split);
          + // set the start row and stop row from the splits
          + scan.setStartRow(split.getStartRow());
          + lastRow = split.getEndRow();
          + scan.setStopRow(lastRow);
          +
          + resultScanner = table.getScanner(scan);
          + endReached = false;
          + scannedRows = 0;
          + }
          +
          + private void logSplitInfo(String action, TableInputSplit split) {
          + int splitId = split.getSplitNumber();
          + String splitStart = Bytes.toString(split.getStartRow());
          + String splitEnd = Bytes.toString(split.getEndRow());
          + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart;
          + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd;
          + String[] hostnames = split.getHostnames();
          + LOG.info("{} split (this={})[{}|{}|{}|{}]", action, this, splitId, hostnames, splitStartKey, splitStopKey);
          + }
          +
          + @Override
          + public boolean reachedEnd() throws IOException

          { + return endReached; + }

          +
          + @Override
          + public Row nextRecord(Row reuse) throws IOException {
          + if (resultScanner == null)

          { + throw new IOException("No table result scanner provided!"); + }

          + try {
          + Result res = resultScanner.next();
          + if (res != null)

          { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }
          + } catch (Exception e) {
          + resultScanner.close();
          + //workaround for timeout on scan
          + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e);
          + scan.setStartRow(lastRow);
          + resultScanner = table.getScanner(scan);
          + Result res = resultScanner.next();
          + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }

          + }
          + endReached = true;
          + return null;
          + }
          +
          + private Row mapResultToRow(Result res) {
          + Object[] values = new Object[fieldNames.length];
          + int i = 0;
          + for(String field : fieldNames) {
          + String[] famCol = field.split(COLON);
          + byte[] value = res.getValue(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1]));
          + TypeInformation typeInfo = fieldTypeInfos[i];
          + if(typeInfo.isBasicType()) {
          + if(typeInfo.getTypeClass() == Integer.class)

          { + values[i] = Bytes.toInt(value); + }

          else if(typeInfo.getTypeClass() == Short.class)

          { + values[i] = Bytes.toShort(value); + }

          else if(typeInfo.getTypeClass() == Float.class)

          { + values[i] = Bytes.toFloat(value); + }

          else if(typeInfo.getTypeClass() == Long.class)

          { + values[i] = Bytes.toLong(value); + }

          else if(typeInfo.getTypeClass() == String.class)

          { + values[i] = Bytes.toString(value); + }

          else if(typeInfo.getTypeClass() == Byte.class)

          { + values[i] = value[0]; + }

          else if(typeInfo.getTypeClass() == Boolean.class)

          { + values[i] = Bytes.toBoolean(value); + }

          else if(typeInfo.getTypeClass() == Double.class)

          { + values[i] = Bytes.toDouble(value); + }

          else if(typeInfo.getTypeClass() == BigInteger.class)

          { + values[i] = new BigInteger(value); + }

          else if(typeInfo.getTypeClass() == BigDecimal.class)

          { + values[i] = Bytes.toBigDecimal(value); + }

          else if(typeInfo.getTypeClass() == Date.class)

          { + values[i] = new Date(Bytes.toLong(value)); + }

          + } else {
          + // TODO for other types??
          — End diff –

          Arrays, Java SQL Date/Time/Timestamp, CompositeType

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96668950 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.RichInputFormat; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.client.Table; +import org.apache.hadoop.hbase.client.ClusterConnection; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.ResultScanner; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.HRegionLocator; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.ArrayList; +import java.util.Date; +import java.util.List; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private TypeInformation[] fieldTypeInfos; + private String[] fieldNames; + private transient Table table; + private transient Scan scan; + private transient Connection conn; + private ResultScanner resultScanner = null; + + private byte[] lastRow; + private int scannedRows; + private boolean endReached = false; + private org.apache.hadoop.conf.Configuration conf; + private static final String COLON = ":"; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = createScanner(); + } + } + + private Scan createScanner() { + Scan scan = new Scan(); + for(String field : fieldNames) { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + } + return scan; + } + + private void connectToTable() { + //use files found in the classpath + if(this.conf == null) { + this.conf = HBaseConfiguration.create(); + } + try { + conn = ConnectionFactory.createConnection(this.conf); + } catch(IOException ioe) { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + } + try { + table = conn.getTable(TableName.valueOf(tableName)); + } catch(TableNotFoundException tnfe) { + LOG.error("The table " + tableName + " not found ", tnfe); + } catch(IOException ioe) { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + } + } + + @Override + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException { + return null; + } + + @Override + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + + //Gets the starting and ending row keys for every region in the currently open table + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn); + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys(); + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + } + final byte[] startRow = scan.getStartRow(); + final byte[] stopRow = scan.getStopRow(); + final boolean scanWithNoLowerBound = startRow.length == 0; + final boolean scanWithNoUpperBound = stopRow.length == 0; + + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits); + for (int i = 0; i < keys.getFirst().length; i++) { + final byte[] startKey = keys.getFirst() [i] ; + final byte[] endKey = keys.getSecond() [i] ; + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort(); + //Test if the given region is to be included in the InputSplit while splitting the regions of a table + if (!includeRegionInSplit(startKey, endKey)) { + continue; + } + //Finds the region on which the given row is being served + final String[] hosts = new String[]{regionLocation}; + + // determine if regions contains keys used by the scan + boolean isLastRegion = endKey.length == 0; + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) && + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + } + } + LOG.info("Created " + splits.size() + " splits"); + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + } + return splits.toArray(new TableInputSplit [0] ); + } + + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + } + + @Override + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + } + + @Override + public void open(TableInputSplit split) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + if (split == null) { + throw new IOException("Input split is null!"); + } + + logSplitInfo("opening", split); + // set the start row and stop row from the splits + scan.setStartRow(split.getStartRow()); + lastRow = split.getEndRow(); + scan.setStopRow(lastRow); + + resultScanner = table.getScanner(scan); + endReached = false; + scannedRows = 0; + } + + private void logSplitInfo(String action, TableInputSplit split) { + int splitId = split.getSplitNumber(); + String splitStart = Bytes.toString(split.getStartRow()); + String splitEnd = Bytes.toString(split.getEndRow()); + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart; + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd; + String[] hostnames = split.getHostnames(); + LOG.info("{} split (this={}) [{}|{}|{}|{}] ", action, this, splitId, hostnames, splitStartKey, splitStopKey); + } + + @Override + public boolean reachedEnd() throws IOException { + return endReached; + } + + @Override + public Row nextRecord(Row reuse) throws IOException { + if (resultScanner == null) { + throw new IOException("No table result scanner provided!"); + } + try { + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } catch (Exception e) { + resultScanner.close(); + //workaround for timeout on scan + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e); + scan.setStartRow(lastRow); + resultScanner = table.getScanner(scan); + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } + endReached = true; + return null; + } + + private Row mapResultToRow(Result res) { + Object[] values = new Object [fieldNames.length] ; + int i = 0; + for(String field : fieldNames) { + String[] famCol = field.split(COLON); + byte[] value = res.getValue(Bytes.toBytes(famCol [0] ), Bytes.toBytes(famCol [1] )); + TypeInformation typeInfo = fieldTypeInfos [i] ; + if(typeInfo.isBasicType()) { + if(typeInfo.getTypeClass() == Integer.class) { + values[i] = Bytes.toInt(value); + } else if(typeInfo.getTypeClass() == Short.class) { + values[i] = Bytes.toShort(value); + } else if(typeInfo.getTypeClass() == Float.class) { + values[i] = Bytes.toFloat(value); + } else if(typeInfo.getTypeClass() == Long.class) { + values[i] = Bytes.toLong(value); + } else if(typeInfo.getTypeClass() == String.class) { + values[i] = Bytes.toString(value); + } else if(typeInfo.getTypeClass() == Byte.class) { + values[i] = value[0]; + } else if(typeInfo.getTypeClass() == Boolean.class) { + values[i] = Bytes.toBoolean(value); + } else if(typeInfo.getTypeClass() == Double.class) { + values[i] = Bytes.toDouble(value); + } else if(typeInfo.getTypeClass() == BigInteger.class) { + values[i] = new BigInteger(value); + } else if(typeInfo.getTypeClass() == BigDecimal.class) { + values[i] = Bytes.toBigDecimal(value); + } else if(typeInfo.getTypeClass() == Date.class) { + values[i] = new Date(Bytes.toLong(value)); + } + } else { + // TODO for other types?? — End diff – Arrays, Java SQL Date/Time/Timestamp, CompositeType
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96666157

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,117 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.Test;
          +import scala.tools.cmd.gen.AnyVals;
          +
          +import java.util.ArrayList;
          +import java.util.List;
          +
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          +
          + @Test
          + public void testHBaseTableSource() throws Exception {
          + // create a table with single region
          + TableName tableName = TableName.valueOf("test");
          + createTable(tableName, F_1, new byte[1][]);
          + // get the htable instance
          + HTable table = openTable(tableName);
          + List<Put> puts = new ArrayList<Put>();
          + // add some data
          + Put put = new Put(ROW_1);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(100));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19991l));
          + puts.add(put);
          +
          + put = new Put(ROW_2);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(101));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19992l));
          + puts.add(put);
          +
          + put = new Put(ROW_3);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(102));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19993l));
          + puts.add(put);
          + // add the mutations to the table
          + table.put(puts);
          + table.close();
          + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
          + BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env, new TableConfig());
          + String[] colNames = new String[3];
          + colNames[0] = Bytes.toString(F_1)+":"+Bytes.toString(Q_1);
          — End diff –

          could you execute `mvn clean verify -DskipTests` to see syntax mistakes?

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96666157 — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java — @@ -0,0 +1,117 @@ +/* + * Copyright The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase.example; + +import org.apache.flink.addons.hbase.HBaseTableSource; +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter; +import org.apache.flink.api.common.typeinfo.BasicTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableConfig; +import org.apache.flink.table.api.TableEnvironment; +import org.apache.flink.table.api.java.BatchTableEnvironment; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Put; +import org.apache.hadoop.hbase.util.Bytes; +import org.junit.Test; +import scala.tools.cmd.gen.AnyVals; + +import java.util.ArrayList; +import java.util.List; + + +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter { + + + public static final byte[] ROW_1 = Bytes.toBytes("row1"); + public static final byte[] ROW_2 = Bytes.toBytes("row2"); + public static final byte[] ROW_3 = Bytes.toBytes("row3"); + public static final byte[] F_1 = Bytes.toBytes("f1"); + public static final byte[] Q_1 = Bytes.toBytes("q1"); + public static final byte[] Q_2 = Bytes.toBytes("q2"); + public static final byte[] Q_3 = Bytes.toBytes("q3"); + + @Test + public void testHBaseTableSource() throws Exception { + // create a table with single region + TableName tableName = TableName.valueOf("test"); + createTable(tableName, F_1, new byte [1] []); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(101)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19993l)); + puts.add(put); + // add the mutations to the table + table.put(puts); + table.close(); + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env, new TableConfig()); + String[] colNames = new String [3] ; + colNames [0] = Bytes.toString(F_1)+":"+Bytes.toString(Q_1); — End diff – could you execute `mvn clean verify -DskipTests` to see syntax mistakes?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on the issue:

          https://github.com/apache/flink/pull/3149

          We need discuss that

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3149 We need discuss that
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96783534

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,117 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.Test;
          +import scala.tools.cmd.gen.AnyVals;
          +
          +import java.util.ArrayList;
          +import java.util.List;
          +
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          +
          + @Test
          + public void testHBaseTableSource() throws Exception {
          + // create a table with single region
          + TableName tableName = TableName.valueOf("test");
          + createTable(tableName, F_1, new byte[1][]);
          + // get the htable instance
          + HTable table = openTable(tableName);
          + List<Put> puts = new ArrayList<Put>();
          + // add some data
          + Put put = new Put(ROW_1);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(100));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19991l));
          + puts.add(put);
          +
          + put = new Put(ROW_2);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(101));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19992l));
          + puts.add(put);
          +
          + put = new Put(ROW_3);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(102));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_2, Bytes.toBytes(19993l));
          — End diff –

          Ya. Good catch. Will change it. I was more focussed on the impl because if you see the test case itself was not asserting any. I need to add all those stuff.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96783534 — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java — @@ -0,0 +1,117 @@ +/* + * Copyright The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase.example; + +import org.apache.flink.addons.hbase.HBaseTableSource; +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter; +import org.apache.flink.api.common.typeinfo.BasicTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableConfig; +import org.apache.flink.table.api.TableEnvironment; +import org.apache.flink.table.api.java.BatchTableEnvironment; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Put; +import org.apache.hadoop.hbase.util.Bytes; +import org.junit.Test; +import scala.tools.cmd.gen.AnyVals; + +import java.util.ArrayList; +import java.util.List; + + +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter { + + + public static final byte[] ROW_1 = Bytes.toBytes("row1"); + public static final byte[] ROW_2 = Bytes.toBytes("row2"); + public static final byte[] ROW_3 = Bytes.toBytes("row3"); + public static final byte[] F_1 = Bytes.toBytes("f1"); + public static final byte[] Q_1 = Bytes.toBytes("q1"); + public static final byte[] Q_2 = Bytes.toBytes("q2"); + public static final byte[] Q_3 = Bytes.toBytes("q3"); + + @Test + public void testHBaseTableSource() throws Exception { + // create a table with single region + TableName tableName = TableName.valueOf("test"); + createTable(tableName, F_1, new byte [1] []); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(101)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2")); + // 3rd qual is long + put.addColumn(F_1, Q_2, Bytes.toBytes(19993l)); — End diff – Ya. Good catch. Will change it. I was more focussed on the impl because if you see the test case itself was not asserting any. I need to add all those stuff.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96790136

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,322 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
          +import org.apache.flink.api.common.io.RichInputFormat;
          +import org.apache.flink.api.common.io.statistics.BaseStatistics;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.core.io.InputSplitAssigner;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.client.Table;
          +import org.apache.hadoop.hbase.client.ClusterConnection;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.ResultScanner;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.HRegionLocator;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.ArrayList;
          +import java.util.Date;
          +import java.util.List;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private TypeInformation[] fieldTypeInfos;
          + private String[] fieldNames;
          + private transient Table table;
          + private transient Scan scan;
          + private transient Connection conn;
          + private ResultScanner resultScanner = null;
          +
          + private byte[] lastRow;
          + private int scannedRows;
          + private boolean endReached = false;
          + private org.apache.hadoop.conf.Configuration conf;
          + private static final String COLON = ":";
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos)

          { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = createScanner(); + }

          + }
          +
          + private Scan createScanner() {
          + Scan scan = new Scan();
          + for(String field : fieldNames)

          { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + }

          + return scan;
          + }
          +
          + private void connectToTable() {
          + //use files found in the classpath
          + if(this.conf == null)

          { + this.conf = HBaseConfiguration.create(); + }

          + try

          { + conn = ConnectionFactory.createConnection(this.conf); + }

          catch(IOException ioe)

          { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + }

          + try

          { + table = conn.getTable(TableName.valueOf(tableName)); + }

          catch(TableNotFoundException tnfe)

          { + LOG.error("The table " + tableName + " not found ", tnfe); + }

          catch(IOException ioe)

          { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + }

          + }
          +
          + @Override
          + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException

          { + return null; + }

          +
          + @Override
          + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException {
          + if (table == null)

          { + throw new IOException("The HBase table has not been opened!"); + }
          + if (scan == null) { + throw new IOException("getScanner returned null"); + }
          +
          + //Gets the starting and ending row keys for every region in the currently open table
          + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn);
          + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys();
          + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + }
          + final byte[] startRow = scan.getStartRow();
          + final byte[] stopRow = scan.getStopRow();
          + final boolean scanWithNoLowerBound = startRow.length == 0;
          + final boolean scanWithNoUpperBound = stopRow.length == 0;
          +
          + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits);
          + for (int i = 0; i < keys.getFirst().length; i++) {
          + final byte[] startKey = keys.getFirst()[i];
          + final byte[] endKey = keys.getSecond()[i];
          + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort();
          + //Test if the given region is to be included in the InputSplit while splitting the regions of a table
          + if (!includeRegionInSplit(startKey, endKey)) { + continue; + }
          + //Finds the region on which the given row is being served
          + final String[] hosts = new String[]{regionLocation};
          +
          + // determine if regions contains keys used by the scan
          + boolean isLastRegion = endKey.length == 0;
          + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) &&
          + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + }
          + }
          + LOG.info("Created " + splits.size() + " splits");
          + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + }
          + return splits.toArray(new TableInputSplit[0]);
          + }
          +
          + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + }
          +
          + @Override
          + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + }
          +
          + @Override
          + public void open(TableInputSplit split) throws IOException {
          + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + }

          + if (scan == null)

          { + throw new IOException("getScanner returned null"); + }

          + if (split == null)

          { + throw new IOException("Input split is null!"); + }

          +
          + logSplitInfo("opening", split);
          + // set the start row and stop row from the splits
          + scan.setStartRow(split.getStartRow());
          + lastRow = split.getEndRow();
          + scan.setStopRow(lastRow);
          +
          + resultScanner = table.getScanner(scan);
          + endReached = false;
          + scannedRows = 0;
          + }
          +
          + private void logSplitInfo(String action, TableInputSplit split) {
          + int splitId = split.getSplitNumber();
          + String splitStart = Bytes.toString(split.getStartRow());
          + String splitEnd = Bytes.toString(split.getEndRow());
          + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart;
          + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd;
          + String[] hostnames = split.getHostnames();
          + LOG.info("{} split (this={})[{}|{}|{}|{}]", action, this, splitId, hostnames, splitStartKey, splitStopKey);
          + }
          +
          + @Override
          + public boolean reachedEnd() throws IOException

          { + return endReached; + }

          +
          + @Override
          + public Row nextRecord(Row reuse) throws IOException {
          + if (resultScanner == null)

          { + throw new IOException("No table result scanner provided!"); + }

          + try {
          + Result res = resultScanner.next();
          + if (res != null)

          { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }
          + } catch (Exception e) {
          + resultScanner.close();
          + //workaround for timeout on scan
          + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e);
          + scan.setStartRow(lastRow);
          + resultScanner = table.getScanner(scan);
          + Result res = resultScanner.next();
          + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }

          + }
          + endReached = true;
          + return null;
          + }
          +
          + private Row mapResultToRow(Result res) {
          + Object[] values = new Object[fieldNames.length];
          + int i = 0;
          + for(String field : fieldNames) {
          + String[] famCol = field.split(COLON);
          + byte[] value = res.getValue(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1]));
          + TypeInformation typeInfo = fieldTypeInfos[i];
          + if(typeInfo.isBasicType()) {
          + if(typeInfo.getTypeClass() == Integer.class)

          { + values[i] = Bytes.toInt(value); + }

          else if(typeInfo.getTypeClass() == Short.class)

          { + values[i] = Bytes.toShort(value); + }

          else if(typeInfo.getTypeClass() == Float.class)

          { + values[i] = Bytes.toFloat(value); + }

          else if(typeInfo.getTypeClass() == Long.class)

          { + values[i] = Bytes.toLong(value); + }

          else if(typeInfo.getTypeClass() == String.class)

          { + values[i] = Bytes.toString(value); + }

          else if(typeInfo.getTypeClass() == Byte.class)

          { + values[i] = value[0]; + }

          else if(typeInfo.getTypeClass() == Boolean.class)

          { + values[i] = Bytes.toBoolean(value); + }

          else if(typeInfo.getTypeClass() == Double.class)

          { + values[i] = Bytes.toDouble(value); + }

          else if(typeInfo.getTypeClass() == BigInteger.class)

          { + values[i] = new BigInteger(value); + }

          else if(typeInfo.getTypeClass() == BigDecimal.class)

          { + values[i] = Bytes.toBigDecimal(value); + }

          else if(typeInfo.getTypeClass() == Date.class)

          { + values[i] = new Date(Bytes.toLong(value)); + }

          + } else {
          + // TODO for other types??
          — End diff –

          I can see that in the TableSourceITCase that uses CsvInputFormat - the metadata is also passed as the input and the meta data is registered with the table. Now in cases like hbase - how should that be done?

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96790136 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.RichInputFormat; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.client.Table; +import org.apache.hadoop.hbase.client.ClusterConnection; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.ResultScanner; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.HRegionLocator; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.ArrayList; +import java.util.Date; +import java.util.List; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private TypeInformation[] fieldTypeInfos; + private String[] fieldNames; + private transient Table table; + private transient Scan scan; + private transient Connection conn; + private ResultScanner resultScanner = null; + + private byte[] lastRow; + private int scannedRows; + private boolean endReached = false; + private org.apache.hadoop.conf.Configuration conf; + private static final String COLON = ":"; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = createScanner(); + } + } + + private Scan createScanner() { + Scan scan = new Scan(); + for(String field : fieldNames) { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + } + return scan; + } + + private void connectToTable() { + //use files found in the classpath + if(this.conf == null) { + this.conf = HBaseConfiguration.create(); + } + try { + conn = ConnectionFactory.createConnection(this.conf); + } catch(IOException ioe) { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + } + try { + table = conn.getTable(TableName.valueOf(tableName)); + } catch(TableNotFoundException tnfe) { + LOG.error("The table " + tableName + " not found ", tnfe); + } catch(IOException ioe) { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + } + } + + @Override + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException { + return null; + } + + @Override + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + + //Gets the starting and ending row keys for every region in the currently open table + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn); + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys(); + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + } + final byte[] startRow = scan.getStartRow(); + final byte[] stopRow = scan.getStopRow(); + final boolean scanWithNoLowerBound = startRow.length == 0; + final boolean scanWithNoUpperBound = stopRow.length == 0; + + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits); + for (int i = 0; i < keys.getFirst().length; i++) { + final byte[] startKey = keys.getFirst() [i] ; + final byte[] endKey = keys.getSecond() [i] ; + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort(); + //Test if the given region is to be included in the InputSplit while splitting the regions of a table + if (!includeRegionInSplit(startKey, endKey)) { + continue; + } + //Finds the region on which the given row is being served + final String[] hosts = new String[]{regionLocation}; + + // determine if regions contains keys used by the scan + boolean isLastRegion = endKey.length == 0; + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) && + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + } + } + LOG.info("Created " + splits.size() + " splits"); + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + } + return splits.toArray(new TableInputSplit [0] ); + } + + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + } + + @Override + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + } + + @Override + public void open(TableInputSplit split) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + if (split == null) { + throw new IOException("Input split is null!"); + } + + logSplitInfo("opening", split); + // set the start row and stop row from the splits + scan.setStartRow(split.getStartRow()); + lastRow = split.getEndRow(); + scan.setStopRow(lastRow); + + resultScanner = table.getScanner(scan); + endReached = false; + scannedRows = 0; + } + + private void logSplitInfo(String action, TableInputSplit split) { + int splitId = split.getSplitNumber(); + String splitStart = Bytes.toString(split.getStartRow()); + String splitEnd = Bytes.toString(split.getEndRow()); + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart; + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd; + String[] hostnames = split.getHostnames(); + LOG.info("{} split (this={}) [{}|{}|{}|{}] ", action, this, splitId, hostnames, splitStartKey, splitStopKey); + } + + @Override + public boolean reachedEnd() throws IOException { + return endReached; + } + + @Override + public Row nextRecord(Row reuse) throws IOException { + if (resultScanner == null) { + throw new IOException("No table result scanner provided!"); + } + try { + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } catch (Exception e) { + resultScanner.close(); + //workaround for timeout on scan + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e); + scan.setStartRow(lastRow); + resultScanner = table.getScanner(scan); + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } + endReached = true; + return null; + } + + private Row mapResultToRow(Result res) { + Object[] values = new Object [fieldNames.length] ; + int i = 0; + for(String field : fieldNames) { + String[] famCol = field.split(COLON); + byte[] value = res.getValue(Bytes.toBytes(famCol [0] ), Bytes.toBytes(famCol [1] )); + TypeInformation typeInfo = fieldTypeInfos [i] ; + if(typeInfo.isBasicType()) { + if(typeInfo.getTypeClass() == Integer.class) { + values[i] = Bytes.toInt(value); + } else if(typeInfo.getTypeClass() == Short.class) { + values[i] = Bytes.toShort(value); + } else if(typeInfo.getTypeClass() == Float.class) { + values[i] = Bytes.toFloat(value); + } else if(typeInfo.getTypeClass() == Long.class) { + values[i] = Bytes.toLong(value); + } else if(typeInfo.getTypeClass() == String.class) { + values[i] = Bytes.toString(value); + } else if(typeInfo.getTypeClass() == Byte.class) { + values[i] = value[0]; + } else if(typeInfo.getTypeClass() == Boolean.class) { + values[i] = Bytes.toBoolean(value); + } else if(typeInfo.getTypeClass() == Double.class) { + values[i] = Bytes.toDouble(value); + } else if(typeInfo.getTypeClass() == BigInteger.class) { + values[i] = new BigInteger(value); + } else if(typeInfo.getTypeClass() == BigDecimal.class) { + values[i] = Bytes.toBigDecimal(value); + } else if(typeInfo.getTypeClass() == Date.class) { + values[i] = new Date(Bytes.toLong(value)); + } + } else { + // TODO for other types?? — End diff – I can see that in the TableSourceITCase that uses CsvInputFormat - the metadata is also passed as the input and the meta data is registered with the table. Now in cases like hbase - how should that be done?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96790253

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,322 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
          +import org.apache.flink.api.common.io.RichInputFormat;
          +import org.apache.flink.api.common.io.statistics.BaseStatistics;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.core.io.InputSplitAssigner;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.client.Table;
          +import org.apache.hadoop.hbase.client.ClusterConnection;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.ResultScanner;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.HRegionLocator;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.ArrayList;
          +import java.util.Date;
          +import java.util.List;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private TypeInformation[] fieldTypeInfos;
          + private String[] fieldNames;
          + private transient Table table;
          + private transient Scan scan;
          + private transient Connection conn;
          + private ResultScanner resultScanner = null;
          +
          + private byte[] lastRow;
          + private int scannedRows;
          + private boolean endReached = false;
          + private org.apache.hadoop.conf.Configuration conf;
          + private static final String COLON = ":";
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos)

          { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = createScanner(); + }

          + }
          +
          + private Scan createScanner() {
          + Scan scan = new Scan();
          + for(String field : fieldNames)

          { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + }

          + return scan;
          + }
          +
          + private void connectToTable() {
          + //use files found in the classpath
          + if(this.conf == null)

          { + this.conf = HBaseConfiguration.create(); + }

          + try

          { + conn = ConnectionFactory.createConnection(this.conf); + }

          catch(IOException ioe)

          { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + }

          + try

          { + table = conn.getTable(TableName.valueOf(tableName)); + }

          catch(TableNotFoundException tnfe)

          { + LOG.error("The table " + tableName + " not found ", tnfe); + }

          catch(IOException ioe)

          { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + }

          + }
          +
          + @Override
          + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException

          { + return null; + }

          +
          + @Override
          + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException {
          + if (table == null)

          { + throw new IOException("The HBase table has not been opened!"); + }
          + if (scan == null) { + throw new IOException("getScanner returned null"); + }
          +
          + //Gets the starting and ending row keys for every region in the currently open table
          + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn);
          + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys();
          + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + }
          + final byte[] startRow = scan.getStartRow();
          + final byte[] stopRow = scan.getStopRow();
          + final boolean scanWithNoLowerBound = startRow.length == 0;
          + final boolean scanWithNoUpperBound = stopRow.length == 0;
          +
          + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits);
          + for (int i = 0; i < keys.getFirst().length; i++) {
          + final byte[] startKey = keys.getFirst()[i];
          + final byte[] endKey = keys.getSecond()[i];
          + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort();
          + //Test if the given region is to be included in the InputSplit while splitting the regions of a table
          + if (!includeRegionInSplit(startKey, endKey)) { + continue; + }
          + //Finds the region on which the given row is being served
          + final String[] hosts = new String[]{regionLocation};
          +
          + // determine if regions contains keys used by the scan
          + boolean isLastRegion = endKey.length == 0;
          + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) &&
          + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + }
          + }
          + LOG.info("Created " + splits.size() + " splits");
          + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + }
          + return splits.toArray(new TableInputSplit[0]);
          + }
          +
          + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + }
          +
          + @Override
          + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + }
          +
          + @Override
          + public void open(TableInputSplit split) throws IOException {
          + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + }

          + if (scan == null)

          { + throw new IOException("getScanner returned null"); + }

          + if (split == null)

          { + throw new IOException("Input split is null!"); + }

          +
          + logSplitInfo("opening", split);
          + // set the start row and stop row from the splits
          + scan.setStartRow(split.getStartRow());
          + lastRow = split.getEndRow();
          + scan.setStopRow(lastRow);
          +
          + resultScanner = table.getScanner(scan);
          + endReached = false;
          + scannedRows = 0;
          + }
          +
          + private void logSplitInfo(String action, TableInputSplit split) {
          + int splitId = split.getSplitNumber();
          + String splitStart = Bytes.toString(split.getStartRow());
          + String splitEnd = Bytes.toString(split.getEndRow());
          + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart;
          + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd;
          + String[] hostnames = split.getHostnames();
          + LOG.info("{} split (this={})[{}|{}|{}|{}]", action, this, splitId, hostnames, splitStartKey, splitStopKey);
          + }
          +
          + @Override
          + public boolean reachedEnd() throws IOException

          { + return endReached; + }

          +
          + @Override
          + public Row nextRecord(Row reuse) throws IOException {
          + if (resultScanner == null)

          { + throw new IOException("No table result scanner provided!"); + }

          + try {
          + Result res = resultScanner.next();
          + if (res != null)

          { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }
          + } catch (Exception e) {
          + resultScanner.close();
          + //workaround for timeout on scan
          + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e);
          + scan.setStartRow(lastRow);
          + resultScanner = table.getScanner(scan);
          + Result res = resultScanner.next();
          + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }

          + }
          + endReached = true;
          + return null;
          + }
          +
          + private Row mapResultToRow(Result res) {
          + Object[] values = new Object[fieldNames.length];
          + int i = 0;
          + for(String field : fieldNames) {
          + String[] famCol = field.split(COLON);
          + byte[] value = res.getValue(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1]));
          + TypeInformation typeInfo = fieldTypeInfos[i];
          + if(typeInfo.isBasicType()) {
          + if(typeInfo.getTypeClass() == Integer.class)

          { + values[i] = Bytes.toInt(value); + }

          else if(typeInfo.getTypeClass() == Short.class)

          { + values[i] = Bytes.toShort(value); + }

          else if(typeInfo.getTypeClass() == Float.class)

          { + values[i] = Bytes.toFloat(value); + }

          else if(typeInfo.getTypeClass() == Long.class)

          { + values[i] = Bytes.toLong(value); + }

          else if(typeInfo.getTypeClass() == String.class)

          { + values[i] = Bytes.toString(value); + }

          else if(typeInfo.getTypeClass() == Byte.class)

          { + values[i] = value[0]; + }

          else if(typeInfo.getTypeClass() == Boolean.class)

          { + values[i] = Bytes.toBoolean(value); + }

          else if(typeInfo.getTypeClass() == Double.class)

          { + values[i] = Bytes.toDouble(value); + }

          else if(typeInfo.getTypeClass() == BigInteger.class)

          { + values[i] = new BigInteger(value); + }

          else if(typeInfo.getTypeClass() == BigDecimal.class)

          { + values[i] = Bytes.toBigDecimal(value); + }

          else if(typeInfo.getTypeClass() == Date.class)

          { + values[i] = new Date(Bytes.toLong(value)); + }

          + } else {
          + // TODO for other types??
          — End diff –

              testHBaseTableSource(org.apache.flink.addons.hbase.example.HBaseTableSourceITCase)  Time elapsed: 2.297 sec  <<< ERROR!
              org.apache.flink.table.api.ValidationException: Cannot resolve [q1] given input [f0, f1, f2].
                      at org.apache.flink.addons.hbase.example.HBaseTableSourceITCase.testHBaseTableSource(HBaseTableSourceITCase.java:113)
              

          The error says field input [f0, f1, f2]. Not sure how it got picked up.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96790253 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.RichInputFormat; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.client.Table; +import org.apache.hadoop.hbase.client.ClusterConnection; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.ResultScanner; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.HRegionLocator; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.ArrayList; +import java.util.Date; +import java.util.List; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private TypeInformation[] fieldTypeInfos; + private String[] fieldNames; + private transient Table table; + private transient Scan scan; + private transient Connection conn; + private ResultScanner resultScanner = null; + + private byte[] lastRow; + private int scannedRows; + private boolean endReached = false; + private org.apache.hadoop.conf.Configuration conf; + private static final String COLON = ":"; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = createScanner(); + } + } + + private Scan createScanner() { + Scan scan = new Scan(); + for(String field : fieldNames) { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + } + return scan; + } + + private void connectToTable() { + //use files found in the classpath + if(this.conf == null) { + this.conf = HBaseConfiguration.create(); + } + try { + conn = ConnectionFactory.createConnection(this.conf); + } catch(IOException ioe) { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + } + try { + table = conn.getTable(TableName.valueOf(tableName)); + } catch(TableNotFoundException tnfe) { + LOG.error("The table " + tableName + " not found ", tnfe); + } catch(IOException ioe) { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + } + } + + @Override + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException { + return null; + } + + @Override + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + + //Gets the starting and ending row keys for every region in the currently open table + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn); + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys(); + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + } + final byte[] startRow = scan.getStartRow(); + final byte[] stopRow = scan.getStopRow(); + final boolean scanWithNoLowerBound = startRow.length == 0; + final boolean scanWithNoUpperBound = stopRow.length == 0; + + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits); + for (int i = 0; i < keys.getFirst().length; i++) { + final byte[] startKey = keys.getFirst() [i] ; + final byte[] endKey = keys.getSecond() [i] ; + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort(); + //Test if the given region is to be included in the InputSplit while splitting the regions of a table + if (!includeRegionInSplit(startKey, endKey)) { + continue; + } + //Finds the region on which the given row is being served + final String[] hosts = new String[]{regionLocation}; + + // determine if regions contains keys used by the scan + boolean isLastRegion = endKey.length == 0; + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) && + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + } + } + LOG.info("Created " + splits.size() + " splits"); + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + } + return splits.toArray(new TableInputSplit [0] ); + } + + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + } + + @Override + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + } + + @Override + public void open(TableInputSplit split) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + if (split == null) { + throw new IOException("Input split is null!"); + } + + logSplitInfo("opening", split); + // set the start row and stop row from the splits + scan.setStartRow(split.getStartRow()); + lastRow = split.getEndRow(); + scan.setStopRow(lastRow); + + resultScanner = table.getScanner(scan); + endReached = false; + scannedRows = 0; + } + + private void logSplitInfo(String action, TableInputSplit split) { + int splitId = split.getSplitNumber(); + String splitStart = Bytes.toString(split.getStartRow()); + String splitEnd = Bytes.toString(split.getEndRow()); + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart; + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd; + String[] hostnames = split.getHostnames(); + LOG.info("{} split (this={}) [{}|{}|{}|{}] ", action, this, splitId, hostnames, splitStartKey, splitStopKey); + } + + @Override + public boolean reachedEnd() throws IOException { + return endReached; + } + + @Override + public Row nextRecord(Row reuse) throws IOException { + if (resultScanner == null) { + throw new IOException("No table result scanner provided!"); + } + try { + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } catch (Exception e) { + resultScanner.close(); + //workaround for timeout on scan + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e); + scan.setStartRow(lastRow); + resultScanner = table.getScanner(scan); + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } + endReached = true; + return null; + } + + private Row mapResultToRow(Result res) { + Object[] values = new Object [fieldNames.length] ; + int i = 0; + for(String field : fieldNames) { + String[] famCol = field.split(COLON); + byte[] value = res.getValue(Bytes.toBytes(famCol [0] ), Bytes.toBytes(famCol [1] )); + TypeInformation typeInfo = fieldTypeInfos [i] ; + if(typeInfo.isBasicType()) { + if(typeInfo.getTypeClass() == Integer.class) { + values[i] = Bytes.toInt(value); + } else if(typeInfo.getTypeClass() == Short.class) { + values[i] = Bytes.toShort(value); + } else if(typeInfo.getTypeClass() == Float.class) { + values[i] = Bytes.toFloat(value); + } else if(typeInfo.getTypeClass() == Long.class) { + values[i] = Bytes.toLong(value); + } else if(typeInfo.getTypeClass() == String.class) { + values[i] = Bytes.toString(value); + } else if(typeInfo.getTypeClass() == Byte.class) { + values[i] = value[0]; + } else if(typeInfo.getTypeClass() == Boolean.class) { + values[i] = Bytes.toBoolean(value); + } else if(typeInfo.getTypeClass() == Double.class) { + values[i] = Bytes.toDouble(value); + } else if(typeInfo.getTypeClass() == BigInteger.class) { + values[i] = new BigInteger(value); + } else if(typeInfo.getTypeClass() == BigDecimal.class) { + values[i] = Bytes.toBigDecimal(value); + } else if(typeInfo.getTypeClass() == Date.class) { + values[i] = new Date(Bytes.toLong(value)); + } + } else { + // TODO for other types?? — End diff – testHBaseTableSource(org.apache.flink.addons.hbase.example.HBaseTableSourceITCase) Time elapsed: 2.297 sec <<< ERROR! org.apache.flink.table.api.ValidationException: Cannot resolve [q1] given input [f0, f1, f2]. at org.apache.flink.addons.hbase.example.HBaseTableSourceITCase.testHBaseTableSource(HBaseTableSourceITCase.java:113) The error says field input [f0, f1, f2] . Not sure how it got picked up.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96802576

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,322 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
          +import org.apache.flink.api.common.io.RichInputFormat;
          +import org.apache.flink.api.common.io.statistics.BaseStatistics;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.core.io.InputSplitAssigner;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.client.Table;
          +import org.apache.hadoop.hbase.client.ClusterConnection;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.ResultScanner;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.HRegionLocator;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.ArrayList;
          +import java.util.Date;
          +import java.util.List;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private TypeInformation[] fieldTypeInfos;
          + private String[] fieldNames;
          + private transient Table table;
          + private transient Scan scan;
          + private transient Connection conn;
          + private ResultScanner resultScanner = null;
          +
          + private byte[] lastRow;
          + private int scannedRows;
          + private boolean endReached = false;
          + private org.apache.hadoop.conf.Configuration conf;
          + private static final String COLON = ":";
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos)

          { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = createScanner(); + }

          + }
          +
          + private Scan createScanner() {
          + Scan scan = new Scan();
          + for(String field : fieldNames)

          { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + }

          + return scan;
          + }
          +
          + private void connectToTable() {
          + //use files found in the classpath
          + if(this.conf == null)

          { + this.conf = HBaseConfiguration.create(); + }

          + try

          { + conn = ConnectionFactory.createConnection(this.conf); + }

          catch(IOException ioe)

          { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + }

          + try

          { + table = conn.getTable(TableName.valueOf(tableName)); + }

          catch(TableNotFoundException tnfe)

          { + LOG.error("The table " + tableName + " not found ", tnfe); + }

          catch(IOException ioe)

          { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + }

          + }
          +
          + @Override
          + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException

          { + return null; + }

          +
          + @Override
          + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException {
          + if (table == null)

          { + throw new IOException("The HBase table has not been opened!"); + }
          + if (scan == null) { + throw new IOException("getScanner returned null"); + }
          +
          + //Gets the starting and ending row keys for every region in the currently open table
          + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn);
          + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys();
          + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + }
          + final byte[] startRow = scan.getStartRow();
          + final byte[] stopRow = scan.getStopRow();
          + final boolean scanWithNoLowerBound = startRow.length == 0;
          + final boolean scanWithNoUpperBound = stopRow.length == 0;
          +
          + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits);
          + for (int i = 0; i < keys.getFirst().length; i++) {
          + final byte[] startKey = keys.getFirst()[i];
          + final byte[] endKey = keys.getSecond()[i];
          + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort();
          + //Test if the given region is to be included in the InputSplit while splitting the regions of a table
          + if (!includeRegionInSplit(startKey, endKey)) { + continue; + }
          + //Finds the region on which the given row is being served
          + final String[] hosts = new String[]{regionLocation};
          +
          + // determine if regions contains keys used by the scan
          + boolean isLastRegion = endKey.length == 0;
          + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) &&
          + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + }
          + }
          + LOG.info("Created " + splits.size() + " splits");
          + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + }
          + return splits.toArray(new TableInputSplit[0]);
          + }
          +
          + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + }
          +
          + @Override
          + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + }
          +
          + @Override
          + public void open(TableInputSplit split) throws IOException {
          + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + }

          + if (scan == null)

          { + throw new IOException("getScanner returned null"); + }

          + if (split == null)

          { + throw new IOException("Input split is null!"); + }

          +
          + logSplitInfo("opening", split);
          + // set the start row and stop row from the splits
          + scan.setStartRow(split.getStartRow());
          + lastRow = split.getEndRow();
          + scan.setStopRow(lastRow);
          +
          + resultScanner = table.getScanner(scan);
          + endReached = false;
          + scannedRows = 0;
          + }
          +
          + private void logSplitInfo(String action, TableInputSplit split) {
          + int splitId = split.getSplitNumber();
          + String splitStart = Bytes.toString(split.getStartRow());
          + String splitEnd = Bytes.toString(split.getEndRow());
          + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart;
          + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd;
          + String[] hostnames = split.getHostnames();
          + LOG.info("{} split (this={})[{}|{}|{}|{}]", action, this, splitId, hostnames, splitStartKey, splitStopKey);
          + }
          +
          + @Override
          + public boolean reachedEnd() throws IOException

          { + return endReached; + }

          +
          + @Override
          + public Row nextRecord(Row reuse) throws IOException {
          + if (resultScanner == null)

          { + throw new IOException("No table result scanner provided!"); + }

          + try {
          + Result res = resultScanner.next();
          + if (res != null)

          { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }
          + } catch (Exception e) {
          + resultScanner.close();
          + //workaround for timeout on scan
          + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e);
          + scan.setStartRow(lastRow);
          + resultScanner = table.getScanner(scan);
          + Result res = resultScanner.next();
          + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }

          + }
          + endReached = true;
          + return null;
          + }
          +
          + private Row mapResultToRow(Result res) {
          + Object[] values = new Object[fieldNames.length];
          + int i = 0;
          + for(String field : fieldNames) {
          + String[] famCol = field.split(COLON);
          + byte[] value = res.getValue(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1]));
          + TypeInformation typeInfo = fieldTypeInfos[i];
          + if(typeInfo.isBasicType()) {
          + if(typeInfo.getTypeClass() == Integer.class)

          { + values[i] = Bytes.toInt(value); + }

          else if(typeInfo.getTypeClass() == Short.class)

          { + values[i] = Bytes.toShort(value); + }

          else if(typeInfo.getTypeClass() == Float.class)

          { + values[i] = Bytes.toFloat(value); + }

          else if(typeInfo.getTypeClass() == Long.class)

          { + values[i] = Bytes.toLong(value); + }

          else if(typeInfo.getTypeClass() == String.class)

          { + values[i] = Bytes.toString(value); + }

          else if(typeInfo.getTypeClass() == Byte.class)

          { + values[i] = value[0]; + }

          else if(typeInfo.getTypeClass() == Boolean.class)

          { + values[i] = Bytes.toBoolean(value); + }

          else if(typeInfo.getTypeClass() == Double.class)

          { + values[i] = Bytes.toDouble(value); + }

          else if(typeInfo.getTypeClass() == BigInteger.class)

          { + values[i] = new BigInteger(value); + }

          else if(typeInfo.getTypeClass() == BigDecimal.class)

          { + values[i] = Bytes.toBigDecimal(value); + }

          else if(typeInfo.getTypeClass() == Date.class)

          { + values[i] = new Date(Bytes.toLong(value)); + }

          + } else {
          + // TODO for other types??
          — End diff –

          I think you need to implement `DefinedFieldNames` trait in `HBaseTableSource`

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96802576 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.RichInputFormat; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.client.Table; +import org.apache.hadoop.hbase.client.ClusterConnection; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.ResultScanner; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.HRegionLocator; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.ArrayList; +import java.util.Date; +import java.util.List; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private TypeInformation[] fieldTypeInfos; + private String[] fieldNames; + private transient Table table; + private transient Scan scan; + private transient Connection conn; + private ResultScanner resultScanner = null; + + private byte[] lastRow; + private int scannedRows; + private boolean endReached = false; + private org.apache.hadoop.conf.Configuration conf; + private static final String COLON = ":"; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = createScanner(); + } + } + + private Scan createScanner() { + Scan scan = new Scan(); + for(String field : fieldNames) { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + } + return scan; + } + + private void connectToTable() { + //use files found in the classpath + if(this.conf == null) { + this.conf = HBaseConfiguration.create(); + } + try { + conn = ConnectionFactory.createConnection(this.conf); + } catch(IOException ioe) { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + } + try { + table = conn.getTable(TableName.valueOf(tableName)); + } catch(TableNotFoundException tnfe) { + LOG.error("The table " + tableName + " not found ", tnfe); + } catch(IOException ioe) { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + } + } + + @Override + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException { + return null; + } + + @Override + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + + //Gets the starting and ending row keys for every region in the currently open table + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn); + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys(); + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + } + final byte[] startRow = scan.getStartRow(); + final byte[] stopRow = scan.getStopRow(); + final boolean scanWithNoLowerBound = startRow.length == 0; + final boolean scanWithNoUpperBound = stopRow.length == 0; + + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits); + for (int i = 0; i < keys.getFirst().length; i++) { + final byte[] startKey = keys.getFirst() [i] ; + final byte[] endKey = keys.getSecond() [i] ; + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort(); + //Test if the given region is to be included in the InputSplit while splitting the regions of a table + if (!includeRegionInSplit(startKey, endKey)) { + continue; + } + //Finds the region on which the given row is being served + final String[] hosts = new String[]{regionLocation}; + + // determine if regions contains keys used by the scan + boolean isLastRegion = endKey.length == 0; + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) && + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + } + } + LOG.info("Created " + splits.size() + " splits"); + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + } + return splits.toArray(new TableInputSplit [0] ); + } + + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + } + + @Override + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + } + + @Override + public void open(TableInputSplit split) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + if (split == null) { + throw new IOException("Input split is null!"); + } + + logSplitInfo("opening", split); + // set the start row and stop row from the splits + scan.setStartRow(split.getStartRow()); + lastRow = split.getEndRow(); + scan.setStopRow(lastRow); + + resultScanner = table.getScanner(scan); + endReached = false; + scannedRows = 0; + } + + private void logSplitInfo(String action, TableInputSplit split) { + int splitId = split.getSplitNumber(); + String splitStart = Bytes.toString(split.getStartRow()); + String splitEnd = Bytes.toString(split.getEndRow()); + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart; + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd; + String[] hostnames = split.getHostnames(); + LOG.info("{} split (this={}) [{}|{}|{}|{}] ", action, this, splitId, hostnames, splitStartKey, splitStopKey); + } + + @Override + public boolean reachedEnd() throws IOException { + return endReached; + } + + @Override + public Row nextRecord(Row reuse) throws IOException { + if (resultScanner == null) { + throw new IOException("No table result scanner provided!"); + } + try { + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } catch (Exception e) { + resultScanner.close(); + //workaround for timeout on scan + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e); + scan.setStartRow(lastRow); + resultScanner = table.getScanner(scan); + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } + endReached = true; + return null; + } + + private Row mapResultToRow(Result res) { + Object[] values = new Object [fieldNames.length] ; + int i = 0; + for(String field : fieldNames) { + String[] famCol = field.split(COLON); + byte[] value = res.getValue(Bytes.toBytes(famCol [0] ), Bytes.toBytes(famCol [1] )); + TypeInformation typeInfo = fieldTypeInfos [i] ; + if(typeInfo.isBasicType()) { + if(typeInfo.getTypeClass() == Integer.class) { + values[i] = Bytes.toInt(value); + } else if(typeInfo.getTypeClass() == Short.class) { + values[i] = Bytes.toShort(value); + } else if(typeInfo.getTypeClass() == Float.class) { + values[i] = Bytes.toFloat(value); + } else if(typeInfo.getTypeClass() == Long.class) { + values[i] = Bytes.toLong(value); + } else if(typeInfo.getTypeClass() == String.class) { + values[i] = Bytes.toString(value); + } else if(typeInfo.getTypeClass() == Byte.class) { + values[i] = value[0]; + } else if(typeInfo.getTypeClass() == Boolean.class) { + values[i] = Bytes.toBoolean(value); + } else if(typeInfo.getTypeClass() == Double.class) { + values[i] = Bytes.toDouble(value); + } else if(typeInfo.getTypeClass() == BigInteger.class) { + values[i] = new BigInteger(value); + } else if(typeInfo.getTypeClass() == BigDecimal.class) { + values[i] = Bytes.toBigDecimal(value); + } else if(typeInfo.getTypeClass() == Date.class) { + values[i] = new Date(Bytes.toLong(value)); + } + } else { + // TODO for other types?? — End diff – I think you need to implement `DefinedFieldNames` trait in `HBaseTableSource`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on the issue:

          https://github.com/apache/flink/pull/3149

          As Jark Wu said in [jira](https://issues.apache.org/jira/browse/FLINK-5554)

          > I think the HBaseTableSource should return a composite type (with column family and qualifier), and we can get columns by composite type accessing.

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3149 As Jark Wu said in [jira] ( https://issues.apache.org/jira/browse/FLINK-5554 ) > I think the HBaseTableSource should return a composite type (with column family and qualifier), and we can get columns by composite type accessing.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on the issue:

          https://github.com/apache/flink/pull/3149

          Thanks for the ping here @tonycox .

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 Thanks for the ping here @tonycox .
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96643785

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -0,0 +1,75 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.table.sources.ProjectableTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.conf.Configuration;
          +
          +/**
          + * Creates a table source that helps to scan data from an hbase table
          + *
          + * Note : the colNames are specified along with a familyName and they are seperated by a ':'
          + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
          + */
          +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
          +
          + private Configuration conf;
          + private String tableName;
          + private byte[] rowKey;
          + private String[] colNames;
          + private TypeInformation<?>[] colTypes;
          +
          + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          + TypeInformation<?>[] colTypes) {
          + this.conf = conf;
          + this.tableName = Preconditions.checkNotNull(tableName, "Table name");
          + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey");
          — End diff –

          What is the rowKey used for ? I think we can remove it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96643785 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.table.sources.ProjectableTableSource; +import org.apache.flink.types.Row; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.conf.Configuration; + +/** + * Creates a table source that helps to scan data from an hbase table + * + * Note : the colNames are specified along with a familyName and they are seperated by a ':' + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name + */ +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { + + private Configuration conf; + private String tableName; + private byte[] rowKey; + private String[] colNames; + private TypeInformation<?>[] colTypes; + + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, + TypeInformation<?>[] colTypes) { + this.conf = conf; + this.tableName = Preconditions.checkNotNull(tableName, "Table name"); + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); — End diff – What is the rowKey used for ? I think we can remove it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96824021

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,322 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
          +import org.apache.flink.api.common.io.RichInputFormat;
          +import org.apache.flink.api.common.io.statistics.BaseStatistics;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.core.io.InputSplitAssigner;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.client.Table;
          +import org.apache.hadoop.hbase.client.ClusterConnection;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.ResultScanner;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.HRegionLocator;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.ArrayList;
          +import java.util.Date;
          +import java.util.List;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private TypeInformation[] fieldTypeInfos;
          + private String[] fieldNames;
          + private transient Table table;
          + private transient Scan scan;
          + private transient Connection conn;
          + private ResultScanner resultScanner = null;
          +
          + private byte[] lastRow;
          + private int scannedRows;
          + private boolean endReached = false;
          + private org.apache.hadoop.conf.Configuration conf;
          + private static final String COLON = ":";
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos)

          { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = createScanner(); + }

          + }
          +
          + private Scan createScanner() {
          + Scan scan = new Scan();
          + for(String field : fieldNames)

          { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + }

          + return scan;
          + }
          +
          + private void connectToTable() {
          + //use files found in the classpath
          + if(this.conf == null)

          { + this.conf = HBaseConfiguration.create(); + }

          + try

          { + conn = ConnectionFactory.createConnection(this.conf); + }

          catch(IOException ioe)

          { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + }

          + try

          { + table = conn.getTable(TableName.valueOf(tableName)); + }

          catch(TableNotFoundException tnfe)

          { + LOG.error("The table " + tableName + " not found ", tnfe); + }

          catch(IOException ioe)

          { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + }

          + }
          +
          + @Override
          + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException

          { + return null; + }

          +
          + @Override
          + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException {
          + if (table == null)

          { + throw new IOException("The HBase table has not been opened!"); + }
          + if (scan == null) { + throw new IOException("getScanner returned null"); + }
          +
          + //Gets the starting and ending row keys for every region in the currently open table
          + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn);
          + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys();
          + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + }
          + final byte[] startRow = scan.getStartRow();
          + final byte[] stopRow = scan.getStopRow();
          + final boolean scanWithNoLowerBound = startRow.length == 0;
          + final boolean scanWithNoUpperBound = stopRow.length == 0;
          +
          + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits);
          + for (int i = 0; i < keys.getFirst().length; i++) {
          + final byte[] startKey = keys.getFirst()[i];
          + final byte[] endKey = keys.getSecond()[i];
          + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort();
          + //Test if the given region is to be included in the InputSplit while splitting the regions of a table
          + if (!includeRegionInSplit(startKey, endKey)) { + continue; + }
          + //Finds the region on which the given row is being served
          + final String[] hosts = new String[]{regionLocation};
          +
          + // determine if regions contains keys used by the scan
          + boolean isLastRegion = endKey.length == 0;
          + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) &&
          + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + }
          + }
          + LOG.info("Created " + splits.size() + " splits");
          + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + }
          + return splits.toArray(new TableInputSplit[0]);
          + }
          +
          + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + }
          +
          + @Override
          + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + }
          +
          + @Override
          + public void open(TableInputSplit split) throws IOException {
          + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + }

          + if (scan == null)

          { + throw new IOException("getScanner returned null"); + }

          + if (split == null)

          { + throw new IOException("Input split is null!"); + }

          +
          + logSplitInfo("opening", split);
          + // set the start row and stop row from the splits
          + scan.setStartRow(split.getStartRow());
          + lastRow = split.getEndRow();
          + scan.setStopRow(lastRow);
          +
          + resultScanner = table.getScanner(scan);
          + endReached = false;
          + scannedRows = 0;
          + }
          +
          + private void logSplitInfo(String action, TableInputSplit split) {
          + int splitId = split.getSplitNumber();
          + String splitStart = Bytes.toString(split.getStartRow());
          + String splitEnd = Bytes.toString(split.getEndRow());
          + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart;
          + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd;
          + String[] hostnames = split.getHostnames();
          + LOG.info("{} split (this={})[{}|{}|{}|{}]", action, this, splitId, hostnames, splitStartKey, splitStopKey);
          + }
          +
          + @Override
          + public boolean reachedEnd() throws IOException

          { + return endReached; + }

          +
          + @Override
          + public Row nextRecord(Row reuse) throws IOException {
          + if (resultScanner == null)

          { + throw new IOException("No table result scanner provided!"); + }

          + try {
          + Result res = resultScanner.next();
          + if (res != null)

          { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }
          + } catch (Exception e) {
          + resultScanner.close();
          + //workaround for timeout on scan
          + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e);
          + scan.setStartRow(lastRow);
          + resultScanner = table.getScanner(scan);
          + Result res = resultScanner.next();
          + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }

          + }
          + endReached = true;
          + return null;
          + }
          +
          + private Row mapResultToRow(Result res) {
          + Object[] values = new Object[fieldNames.length];
          + int i = 0;
          + for(String field : fieldNames) {
          + String[] famCol = field.split(COLON);
          + byte[] value = res.getValue(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1]));
          + TypeInformation typeInfo = fieldTypeInfos[i];
          + if(typeInfo.isBasicType()) {
          + if(typeInfo.getTypeClass() == Integer.class)

          { + values[i] = Bytes.toInt(value); + }

          else if(typeInfo.getTypeClass() == Short.class)

          { + values[i] = Bytes.toShort(value); + }

          else if(typeInfo.getTypeClass() == Float.class)

          { + values[i] = Bytes.toFloat(value); + }

          else if(typeInfo.getTypeClass() == Long.class)

          { + values[i] = Bytes.toLong(value); + }

          else if(typeInfo.getTypeClass() == String.class)

          { + values[i] = Bytes.toString(value); + }

          else if(typeInfo.getTypeClass() == Byte.class)

          { + values[i] = value[0]; + }

          else if(typeInfo.getTypeClass() == Boolean.class)

          { + values[i] = Bytes.toBoolean(value); + }

          else if(typeInfo.getTypeClass() == Double.class)

          { + values[i] = Bytes.toDouble(value); + }

          else if(typeInfo.getTypeClass() == BigInteger.class)

          { + values[i] = new BigInteger(value); + }

          else if(typeInfo.getTypeClass() == BigDecimal.class)

          { + values[i] = Bytes.toBigDecimal(value); + }

          else if(typeInfo.getTypeClass() == Date.class)

          { + values[i] = new Date(Bytes.toLong(value)); + }

          + } else

          { + // TODO for other types?? + }

          + i++;
          + }
          + return Row.of(values);
          + }
          +
          + @Override
          + public void close() throws IOException {
          + LOG.info("Closing split (scanned {} rows)", scannedRows);
          + lastRow = null;
          + try {
          + if (resultScanner != null)

          { + resultScanner.close(); + }

          + } finally

          { + resultScanner = null; + }

          + }
          +
          + @Override
          + public void closeInputFormat() throws IOException {
          + try {
          + if (table != null)

          { + table.close(); + }

          + } finally

          { + table = null; + }

          + }
          +
          + @Override
          + public TypeInformation<Row> getProducedType() {
          + return new RowTypeInfo(this.fieldTypeInfos);
          — End diff –

          Using `RowTypeInfo(TypeInformation<?>[] types, String[] fieldNames)` constructor to set the custom field names.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96824021 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.RichInputFormat; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.client.Table; +import org.apache.hadoop.hbase.client.ClusterConnection; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.ResultScanner; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.HRegionLocator; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.ArrayList; +import java.util.Date; +import java.util.List; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private TypeInformation[] fieldTypeInfos; + private String[] fieldNames; + private transient Table table; + private transient Scan scan; + private transient Connection conn; + private ResultScanner resultScanner = null; + + private byte[] lastRow; + private int scannedRows; + private boolean endReached = false; + private org.apache.hadoop.conf.Configuration conf; + private static final String COLON = ":"; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = createScanner(); + } + } + + private Scan createScanner() { + Scan scan = new Scan(); + for(String field : fieldNames) { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + } + return scan; + } + + private void connectToTable() { + //use files found in the classpath + if(this.conf == null) { + this.conf = HBaseConfiguration.create(); + } + try { + conn = ConnectionFactory.createConnection(this.conf); + } catch(IOException ioe) { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + } + try { + table = conn.getTable(TableName.valueOf(tableName)); + } catch(TableNotFoundException tnfe) { + LOG.error("The table " + tableName + " not found ", tnfe); + } catch(IOException ioe) { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + } + } + + @Override + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException { + return null; + } + + @Override + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + + //Gets the starting and ending row keys for every region in the currently open table + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn); + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys(); + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + } + final byte[] startRow = scan.getStartRow(); + final byte[] stopRow = scan.getStopRow(); + final boolean scanWithNoLowerBound = startRow.length == 0; + final boolean scanWithNoUpperBound = stopRow.length == 0; + + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits); + for (int i = 0; i < keys.getFirst().length; i++) { + final byte[] startKey = keys.getFirst() [i] ; + final byte[] endKey = keys.getSecond() [i] ; + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort(); + //Test if the given region is to be included in the InputSplit while splitting the regions of a table + if (!includeRegionInSplit(startKey, endKey)) { + continue; + } + //Finds the region on which the given row is being served + final String[] hosts = new String[]{regionLocation}; + + // determine if regions contains keys used by the scan + boolean isLastRegion = endKey.length == 0; + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) && + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + } + } + LOG.info("Created " + splits.size() + " splits"); + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + } + return splits.toArray(new TableInputSplit [0] ); + } + + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + } + + @Override + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + } + + @Override + public void open(TableInputSplit split) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + if (split == null) { + throw new IOException("Input split is null!"); + } + + logSplitInfo("opening", split); + // set the start row and stop row from the splits + scan.setStartRow(split.getStartRow()); + lastRow = split.getEndRow(); + scan.setStopRow(lastRow); + + resultScanner = table.getScanner(scan); + endReached = false; + scannedRows = 0; + } + + private void logSplitInfo(String action, TableInputSplit split) { + int splitId = split.getSplitNumber(); + String splitStart = Bytes.toString(split.getStartRow()); + String splitEnd = Bytes.toString(split.getEndRow()); + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart; + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd; + String[] hostnames = split.getHostnames(); + LOG.info("{} split (this={}) [{}|{}|{}|{}] ", action, this, splitId, hostnames, splitStartKey, splitStopKey); + } + + @Override + public boolean reachedEnd() throws IOException { + return endReached; + } + + @Override + public Row nextRecord(Row reuse) throws IOException { + if (resultScanner == null) { + throw new IOException("No table result scanner provided!"); + } + try { + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } catch (Exception e) { + resultScanner.close(); + //workaround for timeout on scan + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e); + scan.setStartRow(lastRow); + resultScanner = table.getScanner(scan); + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } + endReached = true; + return null; + } + + private Row mapResultToRow(Result res) { + Object[] values = new Object [fieldNames.length] ; + int i = 0; + for(String field : fieldNames) { + String[] famCol = field.split(COLON); + byte[] value = res.getValue(Bytes.toBytes(famCol [0] ), Bytes.toBytes(famCol [1] )); + TypeInformation typeInfo = fieldTypeInfos [i] ; + if(typeInfo.isBasicType()) { + if(typeInfo.getTypeClass() == Integer.class) { + values[i] = Bytes.toInt(value); + } else if(typeInfo.getTypeClass() == Short.class) { + values[i] = Bytes.toShort(value); + } else if(typeInfo.getTypeClass() == Float.class) { + values[i] = Bytes.toFloat(value); + } else if(typeInfo.getTypeClass() == Long.class) { + values[i] = Bytes.toLong(value); + } else if(typeInfo.getTypeClass() == String.class) { + values[i] = Bytes.toString(value); + } else if(typeInfo.getTypeClass() == Byte.class) { + values[i] = value[0]; + } else if(typeInfo.getTypeClass() == Boolean.class) { + values[i] = Bytes.toBoolean(value); + } else if(typeInfo.getTypeClass() == Double.class) { + values[i] = Bytes.toDouble(value); + } else if(typeInfo.getTypeClass() == BigInteger.class) { + values[i] = new BigInteger(value); + } else if(typeInfo.getTypeClass() == BigDecimal.class) { + values[i] = Bytes.toBigDecimal(value); + } else if(typeInfo.getTypeClass() == Date.class) { + values[i] = new Date(Bytes.toLong(value)); + } + } else { + // TODO for other types?? + } + i++; + } + return Row.of(values); + } + + @Override + public void close() throws IOException { + LOG.info("Closing split (scanned {} rows)", scannedRows); + lastRow = null; + try { + if (resultScanner != null) { + resultScanner.close(); + } + } finally { + resultScanner = null; + } + } + + @Override + public void closeInputFormat() throws IOException { + try { + if (table != null) { + table.close(); + } + } finally { + table = null; + } + } + + @Override + public TypeInformation<Row> getProducedType() { + return new RowTypeInfo(this.fieldTypeInfos); — End diff – Using `RowTypeInfo(TypeInformation<?>[] types, String[] fieldNames)` constructor to set the custom field names.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r96823925

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,322 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
          +import org.apache.flink.api.common.io.RichInputFormat;
          +import org.apache.flink.api.common.io.statistics.BaseStatistics;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.core.io.InputSplitAssigner;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.client.Table;
          +import org.apache.hadoop.hbase.client.ClusterConnection;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.ResultScanner;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.HRegionLocator;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.ArrayList;
          +import java.util.Date;
          +import java.util.List;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private TypeInformation[] fieldTypeInfos;
          + private String[] fieldNames;
          + private transient Table table;
          + private transient Scan scan;
          + private transient Connection conn;
          + private ResultScanner resultScanner = null;
          +
          + private byte[] lastRow;
          + private int scannedRows;
          + private boolean endReached = false;
          + private org.apache.hadoop.conf.Configuration conf;
          + private static final String COLON = ":";
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos)

          { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = createScanner(); + }

          + }
          +
          + private Scan createScanner() {
          + Scan scan = new Scan();
          + for(String field : fieldNames)

          { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + }

          + return scan;
          + }
          +
          + private void connectToTable() {
          + //use files found in the classpath
          + if(this.conf == null)

          { + this.conf = HBaseConfiguration.create(); + }

          + try

          { + conn = ConnectionFactory.createConnection(this.conf); + }

          catch(IOException ioe)

          { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + }

          + try

          { + table = conn.getTable(TableName.valueOf(tableName)); + }

          catch(TableNotFoundException tnfe)

          { + LOG.error("The table " + tableName + " not found ", tnfe); + }

          catch(IOException ioe)

          { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + }

          + }
          +
          + @Override
          + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException

          { + return null; + }

          +
          + @Override
          + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException {
          + if (table == null)

          { + throw new IOException("The HBase table has not been opened!"); + }
          + if (scan == null) { + throw new IOException("getScanner returned null"); + }
          +
          + //Gets the starting and ending row keys for every region in the currently open table
          + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn);
          + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys();
          + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + }
          + final byte[] startRow = scan.getStartRow();
          + final byte[] stopRow = scan.getStopRow();
          + final boolean scanWithNoLowerBound = startRow.length == 0;
          + final boolean scanWithNoUpperBound = stopRow.length == 0;
          +
          + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits);
          + for (int i = 0; i < keys.getFirst().length; i++) {
          + final byte[] startKey = keys.getFirst()[i];
          + final byte[] endKey = keys.getSecond()[i];
          + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort();
          + //Test if the given region is to be included in the InputSplit while splitting the regions of a table
          + if (!includeRegionInSplit(startKey, endKey)) { + continue; + }
          + //Finds the region on which the given row is being served
          + final String[] hosts = new String[]{regionLocation};
          +
          + // determine if regions contains keys used by the scan
          + boolean isLastRegion = endKey.length == 0;
          + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) &&
          + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + }
          + }
          + LOG.info("Created " + splits.size() + " splits");
          + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + }
          + return splits.toArray(new TableInputSplit[0]);
          + }
          +
          + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + }
          +
          + @Override
          + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + }
          +
          + @Override
          + public void open(TableInputSplit split) throws IOException {
          + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + }

          + if (scan == null)

          { + throw new IOException("getScanner returned null"); + }

          + if (split == null)

          { + throw new IOException("Input split is null!"); + }

          +
          + logSplitInfo("opening", split);
          + // set the start row and stop row from the splits
          + scan.setStartRow(split.getStartRow());
          + lastRow = split.getEndRow();
          + scan.setStopRow(lastRow);
          +
          + resultScanner = table.getScanner(scan);
          + endReached = false;
          + scannedRows = 0;
          + }
          +
          + private void logSplitInfo(String action, TableInputSplit split) {
          + int splitId = split.getSplitNumber();
          + String splitStart = Bytes.toString(split.getStartRow());
          + String splitEnd = Bytes.toString(split.getEndRow());
          + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart;
          + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd;
          + String[] hostnames = split.getHostnames();
          + LOG.info("{} split (this={})[{}|{}|{}|{}]", action, this, splitId, hostnames, splitStartKey, splitStopKey);
          + }
          +
          + @Override
          + public boolean reachedEnd() throws IOException

          { + return endReached; + }

          +
          + @Override
          + public Row nextRecord(Row reuse) throws IOException {
          + if (resultScanner == null)

          { + throw new IOException("No table result scanner provided!"); + }

          + try {
          + Result res = resultScanner.next();
          + if (res != null)

          { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }
          + } catch (Exception e) {
          + resultScanner.close();
          + //workaround for timeout on scan
          + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e);
          + scan.setStartRow(lastRow);
          + resultScanner = table.getScanner(scan);
          + Result res = resultScanner.next();
          + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }

          + }
          + endReached = true;
          + return null;
          + }
          +
          + private Row mapResultToRow(Result res) {
          + Object[] values = new Object[fieldNames.length];
          + int i = 0;
          + for(String field : fieldNames) {
          + String[] famCol = field.split(COLON);
          + byte[] value = res.getValue(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1]));
          + TypeInformation typeInfo = fieldTypeInfos[i];
          + if(typeInfo.isBasicType()) {
          + if(typeInfo.getTypeClass() == Integer.class)

          { + values[i] = Bytes.toInt(value); + }

          else if(typeInfo.getTypeClass() == Short.class)

          { + values[i] = Bytes.toShort(value); + }

          else if(typeInfo.getTypeClass() == Float.class)

          { + values[i] = Bytes.toFloat(value); + }

          else if(typeInfo.getTypeClass() == Long.class)

          { + values[i] = Bytes.toLong(value); + }

          else if(typeInfo.getTypeClass() == String.class)

          { + values[i] = Bytes.toString(value); + }

          else if(typeInfo.getTypeClass() == Byte.class)

          { + values[i] = value[0]; + }

          else if(typeInfo.getTypeClass() == Boolean.class)

          { + values[i] = Bytes.toBoolean(value); + }

          else if(typeInfo.getTypeClass() == Double.class)

          { + values[i] = Bytes.toDouble(value); + }

          else if(typeInfo.getTypeClass() == BigInteger.class)

          { + values[i] = new BigInteger(value); + }

          else if(typeInfo.getTypeClass() == BigDecimal.class)

          { + values[i] = Bytes.toBigDecimal(value); + }

          else if(typeInfo.getTypeClass() == Date.class)

          { + values[i] = new Date(Bytes.toLong(value)); + }

          + } else {
          + // TODO for other types??
          — End diff –

          No need to implement `DefinedFieldNames`. The [f0, f1, f2] is the default field names of RowTypeInfo, you should set the custom field names in RowTypeInfo. Using `RowTypeInfo(TypeInformation<?>[] types, String[] fieldNames)` constructor.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r96823925 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.RichInputFormat; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.client.Table; +import org.apache.hadoop.hbase.client.ClusterConnection; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.ResultScanner; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.HRegionLocator; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.ArrayList; +import java.util.Date; +import java.util.List; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private TypeInformation[] fieldTypeInfos; + private String[] fieldNames; + private transient Table table; + private transient Scan scan; + private transient Connection conn; + private ResultScanner resultScanner = null; + + private byte[] lastRow; + private int scannedRows; + private boolean endReached = false; + private org.apache.hadoop.conf.Configuration conf; + private static final String COLON = ":"; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = createScanner(); + } + } + + private Scan createScanner() { + Scan scan = new Scan(); + for(String field : fieldNames) { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + } + return scan; + } + + private void connectToTable() { + //use files found in the classpath + if(this.conf == null) { + this.conf = HBaseConfiguration.create(); + } + try { + conn = ConnectionFactory.createConnection(this.conf); + } catch(IOException ioe) { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + } + try { + table = conn.getTable(TableName.valueOf(tableName)); + } catch(TableNotFoundException tnfe) { + LOG.error("The table " + tableName + " not found ", tnfe); + } catch(IOException ioe) { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + } + } + + @Override + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException { + return null; + } + + @Override + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + + //Gets the starting and ending row keys for every region in the currently open table + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn); + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys(); + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + } + final byte[] startRow = scan.getStartRow(); + final byte[] stopRow = scan.getStopRow(); + final boolean scanWithNoLowerBound = startRow.length == 0; + final boolean scanWithNoUpperBound = stopRow.length == 0; + + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits); + for (int i = 0; i < keys.getFirst().length; i++) { + final byte[] startKey = keys.getFirst() [i] ; + final byte[] endKey = keys.getSecond() [i] ; + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort(); + //Test if the given region is to be included in the InputSplit while splitting the regions of a table + if (!includeRegionInSplit(startKey, endKey)) { + continue; + } + //Finds the region on which the given row is being served + final String[] hosts = new String[]{regionLocation}; + + // determine if regions contains keys used by the scan + boolean isLastRegion = endKey.length == 0; + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) && + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + } + } + LOG.info("Created " + splits.size() + " splits"); + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + } + return splits.toArray(new TableInputSplit [0] ); + } + + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + } + + @Override + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + } + + @Override + public void open(TableInputSplit split) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + if (split == null) { + throw new IOException("Input split is null!"); + } + + logSplitInfo("opening", split); + // set the start row and stop row from the splits + scan.setStartRow(split.getStartRow()); + lastRow = split.getEndRow(); + scan.setStopRow(lastRow); + + resultScanner = table.getScanner(scan); + endReached = false; + scannedRows = 0; + } + + private void logSplitInfo(String action, TableInputSplit split) { + int splitId = split.getSplitNumber(); + String splitStart = Bytes.toString(split.getStartRow()); + String splitEnd = Bytes.toString(split.getEndRow()); + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart; + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd; + String[] hostnames = split.getHostnames(); + LOG.info("{} split (this={}) [{}|{}|{}|{}] ", action, this, splitId, hostnames, splitStartKey, splitStopKey); + } + + @Override + public boolean reachedEnd() throws IOException { + return endReached; + } + + @Override + public Row nextRecord(Row reuse) throws IOException { + if (resultScanner == null) { + throw new IOException("No table result scanner provided!"); + } + try { + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } catch (Exception e) { + resultScanner.close(); + //workaround for timeout on scan + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e); + scan.setStartRow(lastRow); + resultScanner = table.getScanner(scan); + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } + endReached = true; + return null; + } + + private Row mapResultToRow(Result res) { + Object[] values = new Object [fieldNames.length] ; + int i = 0; + for(String field : fieldNames) { + String[] famCol = field.split(COLON); + byte[] value = res.getValue(Bytes.toBytes(famCol [0] ), Bytes.toBytes(famCol [1] )); + TypeInformation typeInfo = fieldTypeInfos [i] ; + if(typeInfo.isBasicType()) { + if(typeInfo.getTypeClass() == Integer.class) { + values[i] = Bytes.toInt(value); + } else if(typeInfo.getTypeClass() == Short.class) { + values[i] = Bytes.toShort(value); + } else if(typeInfo.getTypeClass() == Float.class) { + values[i] = Bytes.toFloat(value); + } else if(typeInfo.getTypeClass() == Long.class) { + values[i] = Bytes.toLong(value); + } else if(typeInfo.getTypeClass() == String.class) { + values[i] = Bytes.toString(value); + } else if(typeInfo.getTypeClass() == Byte.class) { + values[i] = value[0]; + } else if(typeInfo.getTypeClass() == Boolean.class) { + values[i] = Bytes.toBoolean(value); + } else if(typeInfo.getTypeClass() == Double.class) { + values[i] = Bytes.toDouble(value); + } else if(typeInfo.getTypeClass() == BigInteger.class) { + values[i] = new BigInteger(value); + } else if(typeInfo.getTypeClass() == BigDecimal.class) { + values[i] = Bytes.toBigDecimal(value); + } else if(typeInfo.getTypeClass() == Date.class) { + values[i] = new Date(Bytes.toLong(value)); + } + } else { + // TODO for other types?? — End diff – No need to implement `DefinedFieldNames`. The [f0, f1, f2] is the default field names of RowTypeInfo, you should set the custom field names in RowTypeInfo. Using `RowTypeInfo(TypeInformation<?>[] types, String[] fieldNames)` constructor.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97015267

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,322 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
          +import org.apache.flink.api.common.io.RichInputFormat;
          +import org.apache.flink.api.common.io.statistics.BaseStatistics;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.core.io.InputSplitAssigner;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.client.Table;
          +import org.apache.hadoop.hbase.client.ClusterConnection;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.ResultScanner;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.HRegionLocator;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.ArrayList;
          +import java.util.Date;
          +import java.util.List;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private TypeInformation[] fieldTypeInfos;
          + private String[] fieldNames;
          + private transient Table table;
          + private transient Scan scan;
          + private transient Connection conn;
          + private ResultScanner resultScanner = null;
          +
          + private byte[] lastRow;
          + private int scannedRows;
          + private boolean endReached = false;
          + private org.apache.hadoop.conf.Configuration conf;
          + private static final String COLON = ":";
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos)

          { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = createScanner(); + }

          + }
          +
          + private Scan createScanner() {
          + Scan scan = new Scan();
          + for(String field : fieldNames)

          { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + }

          + return scan;
          + }
          +
          + private void connectToTable() {
          + //use files found in the classpath
          + if(this.conf == null)

          { + this.conf = HBaseConfiguration.create(); + }

          + try

          { + conn = ConnectionFactory.createConnection(this.conf); + }

          catch(IOException ioe)

          { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + }

          + try

          { + table = conn.getTable(TableName.valueOf(tableName)); + }

          catch(TableNotFoundException tnfe)

          { + LOG.error("The table " + tableName + " not found ", tnfe); + }

          catch(IOException ioe)

          { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + }

          + }
          +
          + @Override
          + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException

          { + return null; + }

          +
          + @Override
          + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException {
          + if (table == null)

          { + throw new IOException("The HBase table has not been opened!"); + }
          + if (scan == null) { + throw new IOException("getScanner returned null"); + }
          +
          + //Gets the starting and ending row keys for every region in the currently open table
          + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn);
          + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys();
          + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + }
          + final byte[] startRow = scan.getStartRow();
          + final byte[] stopRow = scan.getStopRow();
          + final boolean scanWithNoLowerBound = startRow.length == 0;
          + final boolean scanWithNoUpperBound = stopRow.length == 0;
          +
          + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits);
          + for (int i = 0; i < keys.getFirst().length; i++) {
          + final byte[] startKey = keys.getFirst()[i];
          + final byte[] endKey = keys.getSecond()[i];
          + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort();
          + //Test if the given region is to be included in the InputSplit while splitting the regions of a table
          + if (!includeRegionInSplit(startKey, endKey)) { + continue; + }
          + //Finds the region on which the given row is being served
          + final String[] hosts = new String[]{regionLocation};
          +
          + // determine if regions contains keys used by the scan
          + boolean isLastRegion = endKey.length == 0;
          + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) &&
          + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + }
          + }
          + LOG.info("Created " + splits.size() + " splits");
          + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + }
          + return splits.toArray(new TableInputSplit[0]);
          + }
          +
          + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + }
          +
          + @Override
          + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + }
          +
          + @Override
          + public void open(TableInputSplit split) throws IOException {
          + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + }

          + if (scan == null)

          { + throw new IOException("getScanner returned null"); + }

          + if (split == null)

          { + throw new IOException("Input split is null!"); + }

          +
          + logSplitInfo("opening", split);
          + // set the start row and stop row from the splits
          + scan.setStartRow(split.getStartRow());
          + lastRow = split.getEndRow();
          + scan.setStopRow(lastRow);
          +
          + resultScanner = table.getScanner(scan);
          + endReached = false;
          + scannedRows = 0;
          + }
          +
          + private void logSplitInfo(String action, TableInputSplit split) {
          + int splitId = split.getSplitNumber();
          + String splitStart = Bytes.toString(split.getStartRow());
          + String splitEnd = Bytes.toString(split.getEndRow());
          + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart;
          + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd;
          + String[] hostnames = split.getHostnames();
          + LOG.info("{} split (this={})[{}|{}|{}|{}]", action, this, splitId, hostnames, splitStartKey, splitStopKey);
          + }
          +
          + @Override
          + public boolean reachedEnd() throws IOException

          { + return endReached; + }

          +
          + @Override
          + public Row nextRecord(Row reuse) throws IOException {
          + if (resultScanner == null)

          { + throw new IOException("No table result scanner provided!"); + }

          + try {
          + Result res = resultScanner.next();
          + if (res != null)

          { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }
          + } catch (Exception e) {
          + resultScanner.close();
          + //workaround for timeout on scan
          + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e);
          + scan.setStartRow(lastRow);
          + resultScanner = table.getScanner(scan);
          + Result res = resultScanner.next();
          + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }

          + }
          + endReached = true;
          + return null;
          + }
          +
          + private Row mapResultToRow(Result res) {
          + Object[] values = new Object[fieldNames.length];
          + int i = 0;
          + for(String field : fieldNames) {
          + String[] famCol = field.split(COLON);
          + byte[] value = res.getValue(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1]));
          + TypeInformation typeInfo = fieldTypeInfos[i];
          + if(typeInfo.isBasicType()) {
          + if(typeInfo.getTypeClass() == Integer.class)

          { + values[i] = Bytes.toInt(value); + }

          else if(typeInfo.getTypeClass() == Short.class)

          { + values[i] = Bytes.toShort(value); + }

          else if(typeInfo.getTypeClass() == Float.class)

          { + values[i] = Bytes.toFloat(value); + }

          else if(typeInfo.getTypeClass() == Long.class)

          { + values[i] = Bytes.toLong(value); + }

          else if(typeInfo.getTypeClass() == String.class)

          { + values[i] = Bytes.toString(value); + }

          else if(typeInfo.getTypeClass() == Byte.class)

          { + values[i] = value[0]; + }

          else if(typeInfo.getTypeClass() == Boolean.class)

          { + values[i] = Bytes.toBoolean(value); + }

          else if(typeInfo.getTypeClass() == Double.class)

          { + values[i] = Bytes.toDouble(value); + }

          else if(typeInfo.getTypeClass() == BigInteger.class)

          { + values[i] = new BigInteger(value); + }

          else if(typeInfo.getTypeClass() == BigDecimal.class)

          { + values[i] = Bytes.toBigDecimal(value); + }

          else if(typeInfo.getTypeClass() == Date.class)

          { + values[i] = new Date(Bytes.toLong(value)); + }

          + } else

          { + // TODO for other types?? + }

          + i++;
          + }
          + return Row.of(values);
          + }
          +
          + @Override
          + public void close() throws IOException {
          + LOG.info("Closing split (scanned {} rows)", scannedRows);
          + lastRow = null;
          + try {
          + if (resultScanner != null)

          { + resultScanner.close(); + }

          + } finally

          { + resultScanner = null; + }

          + }
          +
          + @Override
          + public void closeInputFormat() throws IOException {
          + try {
          + if (table != null)

          { + table.close(); + }

          + } finally

          { + table = null; + }

          + }
          +
          + @Override
          + public TypeInformation<Row> getProducedType() {
          + return new RowTypeInfo(this.fieldTypeInfos);
          — End diff –

          Ok. Got it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97015267 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.RichInputFormat; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.client.Table; +import org.apache.hadoop.hbase.client.ClusterConnection; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.ResultScanner; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.HRegionLocator; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.ArrayList; +import java.util.Date; +import java.util.List; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private TypeInformation[] fieldTypeInfos; + private String[] fieldNames; + private transient Table table; + private transient Scan scan; + private transient Connection conn; + private ResultScanner resultScanner = null; + + private byte[] lastRow; + private int scannedRows; + private boolean endReached = false; + private org.apache.hadoop.conf.Configuration conf; + private static final String COLON = ":"; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = createScanner(); + } + } + + private Scan createScanner() { + Scan scan = new Scan(); + for(String field : fieldNames) { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + } + return scan; + } + + private void connectToTable() { + //use files found in the classpath + if(this.conf == null) { + this.conf = HBaseConfiguration.create(); + } + try { + conn = ConnectionFactory.createConnection(this.conf); + } catch(IOException ioe) { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + } + try { + table = conn.getTable(TableName.valueOf(tableName)); + } catch(TableNotFoundException tnfe) { + LOG.error("The table " + tableName + " not found ", tnfe); + } catch(IOException ioe) { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + } + } + + @Override + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException { + return null; + } + + @Override + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + + //Gets the starting and ending row keys for every region in the currently open table + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn); + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys(); + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + } + final byte[] startRow = scan.getStartRow(); + final byte[] stopRow = scan.getStopRow(); + final boolean scanWithNoLowerBound = startRow.length == 0; + final boolean scanWithNoUpperBound = stopRow.length == 0; + + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits); + for (int i = 0; i < keys.getFirst().length; i++) { + final byte[] startKey = keys.getFirst() [i] ; + final byte[] endKey = keys.getSecond() [i] ; + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort(); + //Test if the given region is to be included in the InputSplit while splitting the regions of a table + if (!includeRegionInSplit(startKey, endKey)) { + continue; + } + //Finds the region on which the given row is being served + final String[] hosts = new String[]{regionLocation}; + + // determine if regions contains keys used by the scan + boolean isLastRegion = endKey.length == 0; + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) && + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + } + } + LOG.info("Created " + splits.size() + " splits"); + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + } + return splits.toArray(new TableInputSplit [0] ); + } + + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + } + + @Override + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + } + + @Override + public void open(TableInputSplit split) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + if (split == null) { + throw new IOException("Input split is null!"); + } + + logSplitInfo("opening", split); + // set the start row and stop row from the splits + scan.setStartRow(split.getStartRow()); + lastRow = split.getEndRow(); + scan.setStopRow(lastRow); + + resultScanner = table.getScanner(scan); + endReached = false; + scannedRows = 0; + } + + private void logSplitInfo(String action, TableInputSplit split) { + int splitId = split.getSplitNumber(); + String splitStart = Bytes.toString(split.getStartRow()); + String splitEnd = Bytes.toString(split.getEndRow()); + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart; + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd; + String[] hostnames = split.getHostnames(); + LOG.info("{} split (this={}) [{}|{}|{}|{}] ", action, this, splitId, hostnames, splitStartKey, splitStopKey); + } + + @Override + public boolean reachedEnd() throws IOException { + return endReached; + } + + @Override + public Row nextRecord(Row reuse) throws IOException { + if (resultScanner == null) { + throw new IOException("No table result scanner provided!"); + } + try { + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } catch (Exception e) { + resultScanner.close(); + //workaround for timeout on scan + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e); + scan.setStartRow(lastRow); + resultScanner = table.getScanner(scan); + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } + endReached = true; + return null; + } + + private Row mapResultToRow(Result res) { + Object[] values = new Object [fieldNames.length] ; + int i = 0; + for(String field : fieldNames) { + String[] famCol = field.split(COLON); + byte[] value = res.getValue(Bytes.toBytes(famCol [0] ), Bytes.toBytes(famCol [1] )); + TypeInformation typeInfo = fieldTypeInfos [i] ; + if(typeInfo.isBasicType()) { + if(typeInfo.getTypeClass() == Integer.class) { + values[i] = Bytes.toInt(value); + } else if(typeInfo.getTypeClass() == Short.class) { + values[i] = Bytes.toShort(value); + } else if(typeInfo.getTypeClass() == Float.class) { + values[i] = Bytes.toFloat(value); + } else if(typeInfo.getTypeClass() == Long.class) { + values[i] = Bytes.toLong(value); + } else if(typeInfo.getTypeClass() == String.class) { + values[i] = Bytes.toString(value); + } else if(typeInfo.getTypeClass() == Byte.class) { + values[i] = value[0]; + } else if(typeInfo.getTypeClass() == Boolean.class) { + values[i] = Bytes.toBoolean(value); + } else if(typeInfo.getTypeClass() == Double.class) { + values[i] = Bytes.toDouble(value); + } else if(typeInfo.getTypeClass() == BigInteger.class) { + values[i] = new BigInteger(value); + } else if(typeInfo.getTypeClass() == BigDecimal.class) { + values[i] = Bytes.toBigDecimal(value); + } else if(typeInfo.getTypeClass() == Date.class) { + values[i] = new Date(Bytes.toLong(value)); + } + } else { + // TODO for other types?? + } + i++; + } + return Row.of(values); + } + + @Override + public void close() throws IOException { + LOG.info("Closing split (scanned {} rows)", scannedRows); + lastRow = null; + try { + if (resultScanner != null) { + resultScanner.close(); + } + } finally { + resultScanner = null; + } + } + + @Override + public void closeInputFormat() throws IOException { + try { + if (table != null) { + table.close(); + } + } finally { + table = null; + } + } + + @Override + public TypeInformation<Row> getProducedType() { + return new RowTypeInfo(this.fieldTypeInfos); — End diff – Ok. Got it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97015321

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -0,0 +1,75 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.table.sources.ProjectableTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.conf.Configuration;
          +
          +/**
          + * Creates a table source that helps to scan data from an hbase table
          + *
          + * Note : the colNames are specified along with a familyName and they are seperated by a ':'
          + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
          + */
          +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
          +
          + private Configuration conf;
          + private String tableName;
          + private byte[] rowKey;
          + private String[] colNames;
          + private TypeInformation<?>[] colTypes;
          +
          + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          + TypeInformation<?>[] colTypes) {
          + this.conf = conf;
          + this.tableName = Preconditions.checkNotNull(tableName, "Table name");
          + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey");
          — End diff –

          Yes. That is true but do we always want full table scan? Actually in HBase it is better we specify start and end key. So how do we specify that? I have not used this rowKey now but I thought it is better to be used?

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97015321 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.table.sources.ProjectableTableSource; +import org.apache.flink.types.Row; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.conf.Configuration; + +/** + * Creates a table source that helps to scan data from an hbase table + * + * Note : the colNames are specified along with a familyName and they are seperated by a ':' + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name + */ +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { + + private Configuration conf; + private String tableName; + private byte[] rowKey; + private String[] colNames; + private TypeInformation<?>[] colTypes; + + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, + TypeInformation<?>[] colTypes) { + this.conf = conf; + this.tableName = Preconditions.checkNotNull(tableName, "Table name"); + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); — End diff – Yes. That is true but do we always want full table scan? Actually in HBase it is better we specify start and end key. So how do we specify that? I have not used this rowKey now but I thought it is better to be used?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97015379

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,322 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
          +import org.apache.flink.api.common.io.RichInputFormat;
          +import org.apache.flink.api.common.io.statistics.BaseStatistics;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.core.io.InputSplitAssigner;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.client.Table;
          +import org.apache.hadoop.hbase.client.ClusterConnection;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.ResultScanner;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.HRegionLocator;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.ArrayList;
          +import java.util.Date;
          +import java.util.List;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private TypeInformation[] fieldTypeInfos;
          + private String[] fieldNames;
          + private transient Table table;
          + private transient Scan scan;
          + private transient Connection conn;
          + private ResultScanner resultScanner = null;
          +
          + private byte[] lastRow;
          + private int scannedRows;
          + private boolean endReached = false;
          + private org.apache.hadoop.conf.Configuration conf;
          + private static final String COLON = ":";
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos)

          { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = createScanner(); + }

          + }
          +
          + private Scan createScanner() {
          + Scan scan = new Scan();
          + for(String field : fieldNames)

          { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + }

          + return scan;
          + }
          +
          + private void connectToTable() {
          + //use files found in the classpath
          + if(this.conf == null)

          { + this.conf = HBaseConfiguration.create(); + }

          + try

          { + conn = ConnectionFactory.createConnection(this.conf); + }

          catch(IOException ioe)

          { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + }

          + try

          { + table = conn.getTable(TableName.valueOf(tableName)); + }

          catch(TableNotFoundException tnfe)

          { + LOG.error("The table " + tableName + " not found ", tnfe); + }

          catch(IOException ioe)

          { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + }

          + }
          +
          + @Override
          + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException

          { + return null; + }

          +
          + @Override
          + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException {
          + if (table == null)

          { + throw new IOException("The HBase table has not been opened!"); + }
          + if (scan == null) { + throw new IOException("getScanner returned null"); + }
          +
          + //Gets the starting and ending row keys for every region in the currently open table
          + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn);
          + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys();
          + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + }
          + final byte[] startRow = scan.getStartRow();
          + final byte[] stopRow = scan.getStopRow();
          + final boolean scanWithNoLowerBound = startRow.length == 0;
          + final boolean scanWithNoUpperBound = stopRow.length == 0;
          +
          + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits);
          + for (int i = 0; i < keys.getFirst().length; i++) {
          + final byte[] startKey = keys.getFirst()[i];
          + final byte[] endKey = keys.getSecond()[i];
          + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort();
          + //Test if the given region is to be included in the InputSplit while splitting the regions of a table
          + if (!includeRegionInSplit(startKey, endKey)) { + continue; + }
          + //Finds the region on which the given row is being served
          + final String[] hosts = new String[]{regionLocation};
          +
          + // determine if regions contains keys used by the scan
          + boolean isLastRegion = endKey.length == 0;
          + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) &&
          + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + }
          + }
          + LOG.info("Created " + splits.size() + " splits");
          + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + }
          + return splits.toArray(new TableInputSplit[0]);
          + }
          +
          + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + }
          +
          + @Override
          + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + }
          +
          + @Override
          + public void open(TableInputSplit split) throws IOException {
          + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + }

          + if (scan == null)

          { + throw new IOException("getScanner returned null"); + }

          + if (split == null)

          { + throw new IOException("Input split is null!"); + }

          +
          + logSplitInfo("opening", split);
          + // set the start row and stop row from the splits
          + scan.setStartRow(split.getStartRow());
          + lastRow = split.getEndRow();
          + scan.setStopRow(lastRow);
          +
          + resultScanner = table.getScanner(scan);
          + endReached = false;
          + scannedRows = 0;
          + }
          +
          + private void logSplitInfo(String action, TableInputSplit split) {
          + int splitId = split.getSplitNumber();
          + String splitStart = Bytes.toString(split.getStartRow());
          + String splitEnd = Bytes.toString(split.getEndRow());
          + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart;
          + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd;
          + String[] hostnames = split.getHostnames();
          + LOG.info("{} split (this={})[{}|{}|{}|{}]", action, this, splitId, hostnames, splitStartKey, splitStopKey);
          + }
          +
          + @Override
          + public boolean reachedEnd() throws IOException

          { + return endReached; + }

          +
          + @Override
          + public Row nextRecord(Row reuse) throws IOException {
          + if (resultScanner == null)

          { + throw new IOException("No table result scanner provided!"); + }

          + try {
          + Result res = resultScanner.next();
          + if (res != null)

          { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }
          + } catch (Exception e) {
          + resultScanner.close();
          + //workaround for timeout on scan
          + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e);
          + scan.setStartRow(lastRow);
          + resultScanner = table.getScanner(scan);
          + Result res = resultScanner.next();
          + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }

          + }
          + endReached = true;
          + return null;
          + }
          +
          + private Row mapResultToRow(Result res) {
          + Object[] values = new Object[fieldNames.length];
          + int i = 0;
          + for(String field : fieldNames) {
          + String[] famCol = field.split(COLON);
          + byte[] value = res.getValue(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1]));
          + TypeInformation typeInfo = fieldTypeInfos[i];
          + if(typeInfo.isBasicType()) {
          + if(typeInfo.getTypeClass() == Integer.class)

          { + values[i] = Bytes.toInt(value); + }

          else if(typeInfo.getTypeClass() == Short.class)

          { + values[i] = Bytes.toShort(value); + }

          else if(typeInfo.getTypeClass() == Float.class)

          { + values[i] = Bytes.toFloat(value); + }

          else if(typeInfo.getTypeClass() == Long.class)

          { + values[i] = Bytes.toLong(value); + }

          else if(typeInfo.getTypeClass() == String.class)

          { + values[i] = Bytes.toString(value); + }

          else if(typeInfo.getTypeClass() == Byte.class)

          { + values[i] = value[0]; + }

          else if(typeInfo.getTypeClass() == Boolean.class)

          { + values[i] = Bytes.toBoolean(value); + }

          else if(typeInfo.getTypeClass() == Double.class)

          { + values[i] = Bytes.toDouble(value); + }

          else if(typeInfo.getTypeClass() == BigInteger.class)

          { + values[i] = new BigInteger(value); + }

          else if(typeInfo.getTypeClass() == BigDecimal.class)

          { + values[i] = Bytes.toBigDecimal(value); + }

          else if(typeInfo.getTypeClass() == Date.class)

          { + values[i] = new Date(Bytes.toLong(value)); + }

          + } else {
          + // TODO for other types??
          — End diff –

          HBaseTableSchema - So can we add the startKey and endKey also into this then?

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97015379 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.RichInputFormat; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.client.Table; +import org.apache.hadoop.hbase.client.ClusterConnection; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.ResultScanner; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.HRegionLocator; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.ArrayList; +import java.util.Date; +import java.util.List; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private TypeInformation[] fieldTypeInfos; + private String[] fieldNames; + private transient Table table; + private transient Scan scan; + private transient Connection conn; + private ResultScanner resultScanner = null; + + private byte[] lastRow; + private int scannedRows; + private boolean endReached = false; + private org.apache.hadoop.conf.Configuration conf; + private static final String COLON = ":"; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = createScanner(); + } + } + + private Scan createScanner() { + Scan scan = new Scan(); + for(String field : fieldNames) { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + } + return scan; + } + + private void connectToTable() { + //use files found in the classpath + if(this.conf == null) { + this.conf = HBaseConfiguration.create(); + } + try { + conn = ConnectionFactory.createConnection(this.conf); + } catch(IOException ioe) { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + } + try { + table = conn.getTable(TableName.valueOf(tableName)); + } catch(TableNotFoundException tnfe) { + LOG.error("The table " + tableName + " not found ", tnfe); + } catch(IOException ioe) { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + } + } + + @Override + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException { + return null; + } + + @Override + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + + //Gets the starting and ending row keys for every region in the currently open table + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn); + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys(); + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + } + final byte[] startRow = scan.getStartRow(); + final byte[] stopRow = scan.getStopRow(); + final boolean scanWithNoLowerBound = startRow.length == 0; + final boolean scanWithNoUpperBound = stopRow.length == 0; + + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits); + for (int i = 0; i < keys.getFirst().length; i++) { + final byte[] startKey = keys.getFirst() [i] ; + final byte[] endKey = keys.getSecond() [i] ; + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort(); + //Test if the given region is to be included in the InputSplit while splitting the regions of a table + if (!includeRegionInSplit(startKey, endKey)) { + continue; + } + //Finds the region on which the given row is being served + final String[] hosts = new String[]{regionLocation}; + + // determine if regions contains keys used by the scan + boolean isLastRegion = endKey.length == 0; + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) && + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + } + } + LOG.info("Created " + splits.size() + " splits"); + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + } + return splits.toArray(new TableInputSplit [0] ); + } + + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + } + + @Override + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + } + + @Override + public void open(TableInputSplit split) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + if (split == null) { + throw new IOException("Input split is null!"); + } + + logSplitInfo("opening", split); + // set the start row and stop row from the splits + scan.setStartRow(split.getStartRow()); + lastRow = split.getEndRow(); + scan.setStopRow(lastRow); + + resultScanner = table.getScanner(scan); + endReached = false; + scannedRows = 0; + } + + private void logSplitInfo(String action, TableInputSplit split) { + int splitId = split.getSplitNumber(); + String splitStart = Bytes.toString(split.getStartRow()); + String splitEnd = Bytes.toString(split.getEndRow()); + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart; + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd; + String[] hostnames = split.getHostnames(); + LOG.info("{} split (this={}) [{}|{}|{}|{}] ", action, this, splitId, hostnames, splitStartKey, splitStopKey); + } + + @Override + public boolean reachedEnd() throws IOException { + return endReached; + } + + @Override + public Row nextRecord(Row reuse) throws IOException { + if (resultScanner == null) { + throw new IOException("No table result scanner provided!"); + } + try { + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } catch (Exception e) { + resultScanner.close(); + //workaround for timeout on scan + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e); + scan.setStartRow(lastRow); + resultScanner = table.getScanner(scan); + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } + endReached = true; + return null; + } + + private Row mapResultToRow(Result res) { + Object[] values = new Object [fieldNames.length] ; + int i = 0; + for(String field : fieldNames) { + String[] famCol = field.split(COLON); + byte[] value = res.getValue(Bytes.toBytes(famCol [0] ), Bytes.toBytes(famCol [1] )); + TypeInformation typeInfo = fieldTypeInfos [i] ; + if(typeInfo.isBasicType()) { + if(typeInfo.getTypeClass() == Integer.class) { + values[i] = Bytes.toInt(value); + } else if(typeInfo.getTypeClass() == Short.class) { + values[i] = Bytes.toShort(value); + } else if(typeInfo.getTypeClass() == Float.class) { + values[i] = Bytes.toFloat(value); + } else if(typeInfo.getTypeClass() == Long.class) { + values[i] = Bytes.toLong(value); + } else if(typeInfo.getTypeClass() == String.class) { + values[i] = Bytes.toString(value); + } else if(typeInfo.getTypeClass() == Byte.class) { + values[i] = value[0]; + } else if(typeInfo.getTypeClass() == Boolean.class) { + values[i] = Bytes.toBoolean(value); + } else if(typeInfo.getTypeClass() == Double.class) { + values[i] = Bytes.toDouble(value); + } else if(typeInfo.getTypeClass() == BigInteger.class) { + values[i] = new BigInteger(value); + } else if(typeInfo.getTypeClass() == BigDecimal.class) { + values[i] = Bytes.toBigDecimal(value); + } else if(typeInfo.getTypeClass() == Date.class) { + values[i] = new Date(Bytes.toLong(value)); + } + } else { + // TODO for other types?? — End diff – HBaseTableSchema - So can we add the startKey and endKey also into this then?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97030686

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -0,0 +1,75 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.table.sources.ProjectableTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.conf.Configuration;
          +
          +/**
          + * Creates a table source that helps to scan data from an hbase table
          + *
          + * Note : the colNames are specified along with a familyName and they are seperated by a ':'
          + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
          + */
          +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
          +
          + private Configuration conf;
          + private String tableName;
          + private byte[] rowKey;
          + private String[] colNames;
          + private TypeInformation<?>[] colTypes;
          +
          + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          + TypeInformation<?>[] colTypes) {
          + this.conf = conf;
          + this.tableName = Preconditions.checkNotNull(tableName, "Table name");
          + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey");
          — End diff –

          Make sense. We can support the start and end rowkey.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97030686 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.table.sources.ProjectableTableSource; +import org.apache.flink.types.Row; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.conf.Configuration; + +/** + * Creates a table source that helps to scan data from an hbase table + * + * Note : the colNames are specified along with a familyName and they are seperated by a ':' + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name + */ +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { + + private Configuration conf; + private String tableName; + private byte[] rowKey; + private String[] colNames; + private TypeInformation<?>[] colTypes; + + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, + TypeInformation<?>[] colTypes) { + this.conf = conf; + this.tableName = Preconditions.checkNotNull(tableName, "Table name"); + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); — End diff – Make sense. We can support the start and end rowkey.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97031401

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,322 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
          +import org.apache.flink.api.common.io.RichInputFormat;
          +import org.apache.flink.api.common.io.statistics.BaseStatistics;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.core.io.InputSplitAssigner;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.client.Table;
          +import org.apache.hadoop.hbase.client.ClusterConnection;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.ResultScanner;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.HRegionLocator;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.ArrayList;
          +import java.util.Date;
          +import java.util.List;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private TypeInformation[] fieldTypeInfos;
          + private String[] fieldNames;
          + private transient Table table;
          + private transient Scan scan;
          + private transient Connection conn;
          + private ResultScanner resultScanner = null;
          +
          + private byte[] lastRow;
          + private int scannedRows;
          + private boolean endReached = false;
          + private org.apache.hadoop.conf.Configuration conf;
          + private static final String COLON = ":";
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos)

          { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = createScanner(); + }

          + }
          +
          + private Scan createScanner() {
          + Scan scan = new Scan();
          + for(String field : fieldNames)

          { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + }

          + return scan;
          + }
          +
          + private void connectToTable() {
          + //use files found in the classpath
          + if(this.conf == null)

          { + this.conf = HBaseConfiguration.create(); + }

          + try

          { + conn = ConnectionFactory.createConnection(this.conf); + }

          catch(IOException ioe)

          { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + }

          + try

          { + table = conn.getTable(TableName.valueOf(tableName)); + }

          catch(TableNotFoundException tnfe)

          { + LOG.error("The table " + tableName + " not found ", tnfe); + }

          catch(IOException ioe)

          { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + }

          + }
          +
          + @Override
          + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException

          { + return null; + }

          +
          + @Override
          + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException {
          + if (table == null)

          { + throw new IOException("The HBase table has not been opened!"); + }
          + if (scan == null) { + throw new IOException("getScanner returned null"); + }
          +
          + //Gets the starting and ending row keys for every region in the currently open table
          + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn);
          + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys();
          + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + }
          + final byte[] startRow = scan.getStartRow();
          + final byte[] stopRow = scan.getStopRow();
          + final boolean scanWithNoLowerBound = startRow.length == 0;
          + final boolean scanWithNoUpperBound = stopRow.length == 0;
          +
          + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits);
          + for (int i = 0; i < keys.getFirst().length; i++) {
          + final byte[] startKey = keys.getFirst()[i];
          + final byte[] endKey = keys.getSecond()[i];
          + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort();
          + //Test if the given region is to be included in the InputSplit while splitting the regions of a table
          + if (!includeRegionInSplit(startKey, endKey)) { + continue; + }
          + //Finds the region on which the given row is being served
          + final String[] hosts = new String[]{regionLocation};
          +
          + // determine if regions contains keys used by the scan
          + boolean isLastRegion = endKey.length == 0;
          + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) &&
          + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + }
          + }
          + LOG.info("Created " + splits.size() + " splits");
          + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + }
          + return splits.toArray(new TableInputSplit[0]);
          + }
          +
          + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + }
          +
          + @Override
          + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + }
          +
          + @Override
          + public void open(TableInputSplit split) throws IOException {
          + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + }

          + if (scan == null)

          { + throw new IOException("getScanner returned null"); + }

          + if (split == null)

          { + throw new IOException("Input split is null!"); + }

          +
          + logSplitInfo("opening", split);
          + // set the start row and stop row from the splits
          + scan.setStartRow(split.getStartRow());
          + lastRow = split.getEndRow();
          + scan.setStopRow(lastRow);
          +
          + resultScanner = table.getScanner(scan);
          + endReached = false;
          + scannedRows = 0;
          + }
          +
          + private void logSplitInfo(String action, TableInputSplit split) {
          + int splitId = split.getSplitNumber();
          + String splitStart = Bytes.toString(split.getStartRow());
          + String splitEnd = Bytes.toString(split.getEndRow());
          + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart;
          + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd;
          + String[] hostnames = split.getHostnames();
          + LOG.info("{} split (this={})[{}|{}|{}|{}]", action, this, splitId, hostnames, splitStartKey, splitStopKey);
          + }
          +
          + @Override
          + public boolean reachedEnd() throws IOException

          { + return endReached; + }

          +
          + @Override
          + public Row nextRecord(Row reuse) throws IOException {
          + if (resultScanner == null)

          { + throw new IOException("No table result scanner provided!"); + }

          + try {
          + Result res = resultScanner.next();
          + if (res != null)

          { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }
          + } catch (Exception e) {
          + resultScanner.close();
          + //workaround for timeout on scan
          + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e);
          + scan.setStartRow(lastRow);
          + resultScanner = table.getScanner(scan);
          + Result res = resultScanner.next();
          + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + }

          + }
          + endReached = true;
          + return null;
          + }
          +
          + private Row mapResultToRow(Result res) {
          + Object[] values = new Object[fieldNames.length];
          + int i = 0;
          + for(String field : fieldNames) {
          + String[] famCol = field.split(COLON);
          + byte[] value = res.getValue(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1]));
          + TypeInformation typeInfo = fieldTypeInfos[i];
          + if(typeInfo.isBasicType()) {
          + if(typeInfo.getTypeClass() == Integer.class)

          { + values[i] = Bytes.toInt(value); + }

          else if(typeInfo.getTypeClass() == Short.class)

          { + values[i] = Bytes.toShort(value); + }

          else if(typeInfo.getTypeClass() == Float.class)

          { + values[i] = Bytes.toFloat(value); + }

          else if(typeInfo.getTypeClass() == Long.class)

          { + values[i] = Bytes.toLong(value); + }

          else if(typeInfo.getTypeClass() == String.class)

          { + values[i] = Bytes.toString(value); + }

          else if(typeInfo.getTypeClass() == Byte.class)

          { + values[i] = value[0]; + }

          else if(typeInfo.getTypeClass() == Boolean.class)

          { + values[i] = Bytes.toBoolean(value); + }

          else if(typeInfo.getTypeClass() == Double.class)

          { + values[i] = Bytes.toDouble(value); + }

          else if(typeInfo.getTypeClass() == BigInteger.class)

          { + values[i] = new BigInteger(value); + }

          else if(typeInfo.getTypeClass() == BigDecimal.class)

          { + values[i] = Bytes.toBigDecimal(value); + }

          else if(typeInfo.getTypeClass() == Date.class)

          { + values[i] = new Date(Bytes.toLong(value)); + }

          + } else {
          + // TODO for other types??
          — End diff –

          I think it's fine to add start/endKey into HBaseTableSchema.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97031401 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,322 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.RichInputFormat; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.client.Table; +import org.apache.hadoop.hbase.client.ClusterConnection; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.ResultScanner; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.HRegionLocator; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.ArrayList; +import java.util.Date; +import java.util.List; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private TypeInformation[] fieldTypeInfos; + private String[] fieldNames; + private transient Table table; + private transient Scan scan; + private transient Connection conn; + private ResultScanner resultScanner = null; + + private byte[] lastRow; + private int scannedRows; + private boolean endReached = false; + private org.apache.hadoop.conf.Configuration conf; + private static final String COLON = ":"; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) { + this.conf = conf; + this.tableName = tableName; + this.fieldNames = fieldNames; + this.fieldTypeInfos = fieldTypeInfos; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = createScanner(); + } + } + + private Scan createScanner() { + Scan scan = new Scan(); + for(String field : fieldNames) { + // select only the fields in the 'selectedFields' + String[] famCol = field.split(COLON); + scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1])); + } + return scan; + } + + private void connectToTable() { + //use files found in the classpath + if(this.conf == null) { + this.conf = HBaseConfiguration.create(); + } + try { + conn = ConnectionFactory.createConnection(this.conf); + } catch(IOException ioe) { + LOG.error("Exception while creating connection to hbase cluster", ioe); + return; + } + try { + table = conn.getTable(TableName.valueOf(tableName)); + } catch(TableNotFoundException tnfe) { + LOG.error("The table " + tableName + " not found ", tnfe); + } catch(IOException ioe) { + LOG.error("Exception while connecting to the table "+tableName+ " ", ioe); + } + } + + @Override + public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException { + return null; + } + + @Override + public TableInputSplit[] createInputSplits(final int minNumSplits) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + + //Gets the starting and ending row keys for every region in the currently open table + HRegionLocator regionLocator = new HRegionLocator(table.getName(), (ClusterConnection) conn); + final Pair<byte[][], byte[][]> keys = regionLocator.getStartEndKeys(); + if (keys == null || keys.getFirst() == null || keys.getFirst().length == 0) { + throw new IOException("Expecting at least one region."); + } + final byte[] startRow = scan.getStartRow(); + final byte[] stopRow = scan.getStopRow(); + final boolean scanWithNoLowerBound = startRow.length == 0; + final boolean scanWithNoUpperBound = stopRow.length == 0; + + final List<TableInputSplit> splits = new ArrayList<TableInputSplit>(minNumSplits); + for (int i = 0; i < keys.getFirst().length; i++) { + final byte[] startKey = keys.getFirst() [i] ; + final byte[] endKey = keys.getSecond() [i] ; + final String regionLocation = regionLocator.getRegionLocation(startKey, false).getHostnamePort(); + //Test if the given region is to be included in the InputSplit while splitting the regions of a table + if (!includeRegionInSplit(startKey, endKey)) { + continue; + } + //Finds the region on which the given row is being served + final String[] hosts = new String[]{regionLocation}; + + // determine if regions contains keys used by the scan + boolean isLastRegion = endKey.length == 0; + if ((scanWithNoLowerBound || isLastRegion || Bytes.compareTo(startRow, endKey) < 0) && + (scanWithNoUpperBound || Bytes.compareTo(stopRow, startKey) > 0)) { + + final byte[] splitStart = scanWithNoLowerBound || Bytes.compareTo(startKey, startRow) >= 0 ? startKey : startRow; + final byte[] splitStop = (scanWithNoUpperBound || Bytes.compareTo(endKey, stopRow) <= 0) + && !isLastRegion ? endKey : stopRow; + int id = splits.size(); + final TableInputSplit split = new TableInputSplit(id, hosts, table.getName().getName(), splitStart, splitStop); + splits.add(split); + } + } + LOG.info("Created " + splits.size() + " splits"); + for (TableInputSplit split : splits) { + logSplitInfo("created", split); + } + return splits.toArray(new TableInputSplit [0] ); + } + + protected boolean includeRegionInSplit(final byte[] startKey, final byte[] endKey) { + return true; + } + + @Override + public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits) { + return new LocatableInputSplitAssigner(inputSplits); + } + + @Override + public void open(TableInputSplit split) throws IOException { + if (table == null) { + throw new IOException("The HBase table has not been opened!"); + } + if (scan == null) { + throw new IOException("getScanner returned null"); + } + if (split == null) { + throw new IOException("Input split is null!"); + } + + logSplitInfo("opening", split); + // set the start row and stop row from the splits + scan.setStartRow(split.getStartRow()); + lastRow = split.getEndRow(); + scan.setStopRow(lastRow); + + resultScanner = table.getScanner(scan); + endReached = false; + scannedRows = 0; + } + + private void logSplitInfo(String action, TableInputSplit split) { + int splitId = split.getSplitNumber(); + String splitStart = Bytes.toString(split.getStartRow()); + String splitEnd = Bytes.toString(split.getEndRow()); + String splitStartKey = splitStart.isEmpty() ? "-" : splitStart; + String splitStopKey = splitEnd.isEmpty() ? "-" : splitEnd; + String[] hostnames = split.getHostnames(); + LOG.info("{} split (this={}) [{}|{}|{}|{}] ", action, this, splitId, hostnames, splitStartKey, splitStopKey); + } + + @Override + public boolean reachedEnd() throws IOException { + return endReached; + } + + @Override + public Row nextRecord(Row reuse) throws IOException { + if (resultScanner == null) { + throw new IOException("No table result scanner provided!"); + } + try { + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } catch (Exception e) { + resultScanner.close(); + //workaround for timeout on scan + LOG.warn("Error after scan of " + scannedRows + " rows. Retry with a new scanner...", e); + scan.setStartRow(lastRow); + resultScanner = table.getScanner(scan); + Result res = resultScanner.next(); + if (res != null) { + scannedRows++; + lastRow = res.getRow(); + return mapResultToRow(res); + } + } + endReached = true; + return null; + } + + private Row mapResultToRow(Result res) { + Object[] values = new Object [fieldNames.length] ; + int i = 0; + for(String field : fieldNames) { + String[] famCol = field.split(COLON); + byte[] value = res.getValue(Bytes.toBytes(famCol [0] ), Bytes.toBytes(famCol [1] )); + TypeInformation typeInfo = fieldTypeInfos [i] ; + if(typeInfo.isBasicType()) { + if(typeInfo.getTypeClass() == Integer.class) { + values[i] = Bytes.toInt(value); + } else if(typeInfo.getTypeClass() == Short.class) { + values[i] = Bytes.toShort(value); + } else if(typeInfo.getTypeClass() == Float.class) { + values[i] = Bytes.toFloat(value); + } else if(typeInfo.getTypeClass() == Long.class) { + values[i] = Bytes.toLong(value); + } else if(typeInfo.getTypeClass() == String.class) { + values[i] = Bytes.toString(value); + } else if(typeInfo.getTypeClass() == Byte.class) { + values[i] = value[0]; + } else if(typeInfo.getTypeClass() == Boolean.class) { + values[i] = Bytes.toBoolean(value); + } else if(typeInfo.getTypeClass() == Double.class) { + values[i] = Bytes.toDouble(value); + } else if(typeInfo.getTypeClass() == BigInteger.class) { + values[i] = new BigInteger(value); + } else if(typeInfo.getTypeClass() == BigDecimal.class) { + values[i] = Bytes.toBigDecimal(value); + } else if(typeInfo.getTypeClass() == Date.class) { + values[i] = new Date(Bytes.toLong(value)); + } + } else { + // TODO for other types?? — End diff – I think it's fine to add start/endKey into HBaseTableSchema.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97055300

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -0,0 +1,75 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.table.sources.ProjectableTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.conf.Configuration;
          +
          +/**
          + * Creates a table source that helps to scan data from an hbase table
          + *
          + * Note : the colNames are specified along with a familyName and they are seperated by a ':'
          + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
          + */
          +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
          +
          + private Configuration conf;
          + private String tableName;
          + private byte[] rowKey;
          + private String[] colNames;
          + private TypeInformation<?>[] colTypes;
          +
          + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          + TypeInformation<?>[] colTypes) {
          + this.conf = conf;
          + this.tableName = Preconditions.checkNotNull(tableName, "Table name");
          + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey");
          — End diff –

          `new RowTypeInfo(
          new TypeInformation[]{
          new RowTypeInfo(
          new TypeInformation[]

          { BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO}

          ,
          new String[]

          {"name", "age"}

          )
          },
          new String[]

          {"person"}

          );`
          I did the above change for the HBaseTableSource's return type and then made a simple query
          ` tableEnv.registerTableSource("test", hbaseTable);
          Table result = tableEnv
          .sql("SELECT f1.q1, f1.q2, f1.q3 FROM test");`
          It throws sql validation error and considers f1 as table name and says tableName 'f1' not found.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97055300 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.table.sources.ProjectableTableSource; +import org.apache.flink.types.Row; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.conf.Configuration; + +/** + * Creates a table source that helps to scan data from an hbase table + * + * Note : the colNames are specified along with a familyName and they are seperated by a ':' + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name + */ +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { + + private Configuration conf; + private String tableName; + private byte[] rowKey; + private String[] colNames; + private TypeInformation<?>[] colTypes; + + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, + TypeInformation<?>[] colTypes) { + this.conf = conf; + this.tableName = Preconditions.checkNotNull(tableName, "Table name"); + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); — End diff – `new RowTypeInfo( new TypeInformation[]{ new RowTypeInfo( new TypeInformation[] { BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO} , new String[] {"name", "age"} ) }, new String[] {"person"} );` I did the above change for the HBaseTableSource's return type and then made a simple query ` tableEnv.registerTableSource("test", hbaseTable); Table result = tableEnv .sql("SELECT f1.q1, f1.q2, f1.q3 FROM test");` It throws sql validation error and considers f1 as table name and says tableName 'f1' not found.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97067569

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -0,0 +1,75 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.table.sources.ProjectableTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.conf.Configuration;
          +
          +/**
          + * Creates a table source that helps to scan data from an hbase table
          + *
          + * Note : the colNames are specified along with a familyName and they are seperated by a ':'
          + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
          + */
          +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
          +
          + private Configuration conf;
          + private String tableName;
          + private byte[] rowKey;
          + private String[] colNames;
          + private TypeInformation<?>[] colTypes;
          +
          + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          + TypeInformation<?>[] colTypes) {
          + this.conf = conf;
          + this.tableName = Preconditions.checkNotNull(tableName, "Table name");
          + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey");
          — End diff –

          Try to use `.sql("SELECT test.f1.q1, test.f1.q2t")` or table api instead. I think there is a problem with nested Rows while scaning

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97067569 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.table.sources.ProjectableTableSource; +import org.apache.flink.types.Row; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.conf.Configuration; + +/** + * Creates a table source that helps to scan data from an hbase table + * + * Note : the colNames are specified along with a familyName and they are seperated by a ':' + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name + */ +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { + + private Configuration conf; + private String tableName; + private byte[] rowKey; + private String[] colNames; + private TypeInformation<?>[] colTypes; + + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, + TypeInformation<?>[] colTypes) { + this.conf = conf; + this.tableName = Preconditions.checkNotNull(tableName, "Table name"); + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); — End diff – Try to use `.sql("SELECT test.f1.q1, test.f1.q2t")` or table api instead. I think there is a problem with nested Rows while scaning
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97073974

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -0,0 +1,75 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.table.sources.ProjectableTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.conf.Configuration;
          +
          +/**
          + * Creates a table source that helps to scan data from an hbase table
          + *
          + * Note : the colNames are specified along with a familyName and they are seperated by a ':'
          + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
          + */
          +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
          +
          + private Configuration conf;
          + private String tableName;
          + private byte[] rowKey;
          + private String[] colNames;
          + private TypeInformation<?>[] colTypes;
          +
          + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          + TypeInformation<?>[] colTypes) {
          + this.conf = conf;
          + this.tableName = Preconditions.checkNotNull(tableName, "Table name");
          + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey");
          — End diff –

          Hi @ramkrish86 , the RowTypeInfo contains field types and also field names (default f1~fn) of the table. By using `RowTypeInfo(TypeInformation[] types, String[] fieldNames)` constructor, we can customize the field names of the table. So in your case, you can query columns by `.sql(SELECT person.name, person.age FROM test)`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97073974 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.table.sources.ProjectableTableSource; +import org.apache.flink.types.Row; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.conf.Configuration; + +/** + * Creates a table source that helps to scan data from an hbase table + * + * Note : the colNames are specified along with a familyName and they are seperated by a ':' + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name + */ +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { + + private Configuration conf; + private String tableName; + private byte[] rowKey; + private String[] colNames; + private TypeInformation<?>[] colTypes; + + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, + TypeInformation<?>[] colTypes) { + this.conf = conf; + this.tableName = Preconditions.checkNotNull(tableName, "Table name"); + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); — End diff – Hi @ramkrish86 , the RowTypeInfo contains field types and also field names (default f1~fn) of the table. By using `RowTypeInfo(TypeInformation[] types, String[] fieldNames)` constructor, we can customize the field names of the table. So in your case, you can query columns by `.sql(SELECT person.name, person.age FROM test)`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97074517

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -0,0 +1,75 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.table.sources.ProjectableTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.conf.Configuration;
          +
          +/**
          + * Creates a table source that helps to scan data from an hbase table
          + *
          + * Note : the colNames are specified along with a familyName and they are seperated by a ':'
          + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
          + */
          +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
          +
          + private Configuration conf;
          + private String tableName;
          + private byte[] rowKey;
          + private String[] colNames;
          + private TypeInformation<?>[] colTypes;
          +
          + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          + TypeInformation<?>[] colTypes) {
          + this.conf = conf;
          + this.tableName = Preconditions.checkNotNull(tableName, "Table name");
          + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey");
          — End diff –

          ` f0~fn-1 `

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97074517 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.table.sources.ProjectableTableSource; +import org.apache.flink.types.Row; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.conf.Configuration; + +/** + * Creates a table source that helps to scan data from an hbase table + * + * Note : the colNames are specified along with a familyName and they are seperated by a ':' + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name + */ +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { + + private Configuration conf; + private String tableName; + private byte[] rowKey; + private String[] colNames; + private TypeInformation<?>[] colTypes; + + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, + TypeInformation<?>[] colTypes) { + this.conf = conf; + this.tableName = Preconditions.checkNotNull(tableName, "Table name"); + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); — End diff – ` f0~fn-1 `
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3149

          Hi @ramkrish86, @tonycox, and @wuchong,

          sorry for joining the discussion a bit late. I haven't looked at the code yet, but I think the discussion is going into the right direction.

          I had a look at [how Apache Drill provides access to HBase tables](https://drill.apache.org/docs/querying-hbase/). Drill also uses a nested schema of `[rowkey, colfamily1[col1, col2, ...], colfamiliy2[col1, col2, ...] ...]` so basically the same as we are discussing here.

          Regarding the field types: The serialization is not under our control, so should also offer to just return the raw bytes (as Drill does). If users have custom data types or serialization logic they can use a user defined scalar function to extract the value. I don't know what's the standard serialization format for primitives with HBase (or if there is one at all).

          Regarding restricting the scan with rowkeys. @tonycox's PR for [filterable TableSources](https://github.com/apache/flink/pull/3166) can be used to set the scan range. This would be much better than "hardcoding" the scan ranges in the TableSource.

          Best, Fabian

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3149 Hi @ramkrish86, @tonycox, and @wuchong, sorry for joining the discussion a bit late. I haven't looked at the code yet, but I think the discussion is going into the right direction. I had a look at [how Apache Drill provides access to HBase tables] ( https://drill.apache.org/docs/querying-hbase/ ). Drill also uses a nested schema of `[rowkey, colfamily1 [col1, col2, ...] , colfamiliy2 [col1, col2, ...] ...]` so basically the same as we are discussing here. Regarding the field types: The serialization is not under our control, so should also offer to just return the raw bytes (as Drill does). If users have custom data types or serialization logic they can use a user defined scalar function to extract the value. I don't know what's the standard serialization format for primitives with HBase (or if there is one at all). Regarding restricting the scan with rowkeys. @tonycox's PR for [filterable TableSources] ( https://github.com/apache/flink/pull/3166 ) can be used to set the scan range. This would be much better than "hardcoding" the scan ranges in the TableSource. Best, Fabian
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on the issue:

          https://github.com/apache/flink/pull/3149

          Hi @fhueske ,

          Regarding to the field type serialization, I think maybe we can provide default deserialization for basic types (int,long,String...) if users do use the `Bytes.toBytes(...)` to serialize the basic types. If not, users can ask this field to return raw bytes in this way: `htableSchame.add("column_family", "qualifier", byte[].class)` and then use an user defined scalar function to deserialize the value.

          regarding to the rowkeys, I agree with you. It would be great if we can set scan range by WHERE clause. But FLINK-3849 (FilterableTableSource) is still a pending PR, I would suggest to break this issue into two. 1. add HBaseTableSource, provide access to HBase tables and support nested schema. 2. extend HBaseTableSource to support FilterableTableSource.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3149 Hi @fhueske , Regarding to the field type serialization, I think maybe we can provide default deserialization for basic types (int,long,String...) if users do use the `Bytes.toBytes(...)` to serialize the basic types. If not, users can ask this field to return raw bytes in this way: `htableSchame.add("column_family", "qualifier", byte[].class)` and then use an user defined scalar function to deserialize the value. regarding to the rowkeys, I agree with you. It would be great if we can set scan range by WHERE clause. But FLINK-3849 (FilterableTableSource) is still a pending PR, I would suggest to break this issue into two. 1. add HBaseTableSource, provide access to HBase tables and support nested schema. 2. extend HBaseTableSource to support FilterableTableSource.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3149

          You are right @wuchong, we should break it down into two issues.

          I also agree about the serialization. We should offer defaults for primitives (`byte`, `short`, `int`, `long`, `boolean`, `float`, `double`) and a set of common character encodings (UTF-8, ASCII, etc.) for `String`. Everything else can be initially handled as `byte[]`, IMO.

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3149 You are right @wuchong, we should break it down into two issues. I also agree about the serialization. We should offer defaults for primitives (`byte`, `short`, `int`, `long`, `boolean`, `float`, `double`) and a set of common character encodings (UTF-8, ASCII, etc.) for `String`. Everything else can be initially handled as `byte[]`, IMO.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97304540

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -0,0 +1,75 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.table.sources.ProjectableTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.conf.Configuration;
          +
          +/**
          + * Creates a table source that helps to scan data from an hbase table
          + *
          + * Note : the colNames are specified along with a familyName and they are seperated by a ':'
          + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
          + */
          +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
          +
          + private Configuration conf;
          + private String tableName;
          + private byte[] rowKey;
          + private String[] colNames;
          + private TypeInformation<?>[] colTypes;
          +
          + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          + TypeInformation<?>[] colTypes) {
          + this.conf = conf;
          + this.tableName = Preconditions.checkNotNull(tableName, "Table name");
          + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey");
          — End diff –

          I tried this customization but still if I pass something like f1.q1 it was throwing a validation error which was fine after I used it as suggested by @tonycox .

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97304540 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.table.sources.ProjectableTableSource; +import org.apache.flink.types.Row; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.conf.Configuration; + +/** + * Creates a table source that helps to scan data from an hbase table + * + * Note : the colNames are specified along with a familyName and they are seperated by a ':' + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name + */ +public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { + + private Configuration conf; + private String tableName; + private byte[] rowKey; + private String[] colNames; + private TypeInformation<?>[] colTypes; + + public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, + TypeInformation<?>[] colTypes) { + this.conf = conf; + this.tableName = Preconditions.checkNotNull(tableName, "Table name"); + this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); — End diff – I tried this customization but still if I pass something like f1.q1 it was throwing a validation error which was fine after I used it as suggested by @tonycox .
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on the issue:

          https://github.com/apache/flink/pull/3149

          Thanks for all the inputs here. I have been trying to make my existing code work with the composite RowTypeInfo. Once that is done I will try to introduce the HBaseTableSchema.
          Also I would like to work on FLINK-3849 (FilterableTableSource) after this first version of HBaseTableSource is accepted.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 Thanks for all the inputs here. I have been trying to make my existing code work with the composite RowTypeInfo. Once that is done I will try to introduce the HBaseTableSchema. Also I would like to work on FLINK-3849 (FilterableTableSource) after this first version of HBaseTableSource is accepted.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on the issue:

          https://github.com/apache/flink/pull/3149

          Good news is that with the help of this Composite RowType and modifying my code accordingly and debugging things I could get the basic thing to work. Now I will work on stitching things together and submitting a PR with updated changes.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 Good news is that with the help of this Composite RowType and modifying my code accordingly and debugging things I could get the basic thing to work. Now I will work on stitching things together and submitting a PR with updated changes.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on the issue:

          https://github.com/apache/flink/pull/3149

          Sounds good ! Looking forward that !

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3149 Sounds good ! Looking forward that !
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on the issue:

          https://github.com/apache/flink/pull/3149

          @fhueske , @tonycox , @wuchong
          I have updated the PR based on all the feedbacks here. Now you could see that we now support CompoisteRowType and we are able to specify multiple column families along with the qualifier names.
          We are able to retrieve the result by doing a full scan.
          This is not efficient and we need to specify start and end rows. I think that can be done after FilterableTableSource is done.
          I have added test cases that shows single column family and double column family.
          For now if the TypeInformation is not known we use plain byte[] type only. That happens at the validation state itself. But one main concern from my side is how to present the 'NULL' means we specify a column with a type but there is no data for that column. For now I have handled by returning the Int, Float, Long - Min_values. But that may not be right I believe. Feedback and suggestions welcome.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 @fhueske , @tonycox , @wuchong I have updated the PR based on all the feedbacks here. Now you could see that we now support CompoisteRowType and we are able to specify multiple column families along with the qualifier names. We are able to retrieve the result by doing a full scan. This is not efficient and we need to specify start and end rows. I think that can be done after FilterableTableSource is done. I have added test cases that shows single column family and double column family. For now if the TypeInformation is not known we use plain byte[] type only. That happens at the validation state itself. But one main concern from my side is how to present the 'NULL' means we specify a column with a type but there is no data for that column. For now I have handled by returning the Int, Float, Long - Min_values. But that may not be right I believe. Feedback and suggestions welcome.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97701834

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,160 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private transient Connection conn;
          + private transient org.apache.hadoop.conf.Configuration conf;
          + private HBaseTableSchema schema;
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema)

          { + this.tableName = tableName; + this.conf = conf; + this.schema = schema; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = getScanner(); + }

          + }
          +
          + @Override
          + protected Scan getScanner() {
          + // TODO : Pass 'rowkey'. For this we need FilterableTableSource
          + Scan scan = new Scan();
          + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
          + for(String family : familyMap.keySet()) {
          + // select only the fields in the 'selectedFields'
          + List<Pair> colDetails = familyMap.get(family);
          + for(Pair<String, TypeInformation<?>> pair : colDetails)

          { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + }

          + }
          + return scan;
          + }
          +
          + @Override
          + public String getTableName()

          { + return tableName; + }

          +
          + @Override
          + protected Row mapResultToTuple(Result res) {
          + List<Object> values = new ArrayList<Object>();
          + int i = 0;
          + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
          + Row[] rows = new Row[familyMap.size()];
          + for(String family : familyMap.keySet()) {
          + List<Pair> colDetails = familyMap.get(family);
          + for(Pair<String, TypeInformation<?>> pair : colDetails) {
          + byte[] value = res.getValue(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst()));
          + if(value != null)

          { + values.add(schema.deserialize(value, pair.getSecond())); + }

          else {
          + values.add(schema.deserializeNull(pair.getSecond()));
          — End diff –

          Do we really need this method to indicate `null` using the special default value? Why not set null `values.add(null)` directly ?

          `Row` supports nullable field, but Tuple doesn't.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97701834 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private transient Connection conn; + private transient org.apache.hadoop.conf.Configuration conf; + private HBaseTableSchema schema; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema) { + this.tableName = tableName; + this.conf = conf; + this.schema = schema; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = getScanner(); + } + } + + @Override + protected Scan getScanner() { + // TODO : Pass 'rowkey'. For this we need FilterableTableSource + Scan scan = new Scan(); + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + for(String family : familyMap.keySet()) { + // select only the fields in the 'selectedFields' + List<Pair> colDetails = familyMap.get(family); + for(Pair<String, TypeInformation<?>> pair : colDetails) { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + } + } + return scan; + } + + @Override + public String getTableName() { + return tableName; + } + + @Override + protected Row mapResultToTuple(Result res) { + List<Object> values = new ArrayList<Object>(); + int i = 0; + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + Row[] rows = new Row [familyMap.size()] ; + for(String family : familyMap.keySet()) { + List<Pair> colDetails = familyMap.get(family); + for(Pair<String, TypeInformation<?>> pair : colDetails) { + byte[] value = res.getValue(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + if(value != null) { + values.add(schema.deserialize(value, pair.getSecond())); + } else { + values.add(schema.deserializeNull(pair.getSecond())); — End diff – Do we really need this method to indicate `null` using the special default value? Why not set null `values.add(null)` directly ? `Row` supports nullable field, but Tuple doesn't.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97701594

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -19,99 +19,113 @@
          package org.apache.flink.addons.hbase;

          import org.apache.flink.api.common.io.InputFormat;
          -import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
          -import org.apache.flink.api.common.io.RichInputFormat;
          -import org.apache.flink.api.common.io.statistics.BaseStatistics;
          import org.apache.flink.api.common.typeinfo.TypeInformation;
          import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          import org.apache.flink.api.java.typeutils.RowTypeInfo;
          import org.apache.flink.configuration.Configuration;
          -import org.apache.flink.core.io.InputSplitAssigner;
          import org.apache.flink.types.Row;
          import org.apache.hadoop.hbase.HBaseConfiguration;
          import org.apache.hadoop.hbase.TableName;
          import org.apache.hadoop.hbase.TableNotFoundException;
          -import org.apache.hadoop.hbase.client.Scan;
          -import org.apache.hadoop.hbase.client.Table;
          -import org.apache.hadoop.hbase.client.ClusterConnection;
          -import org.apache.hadoop.hbase.client.Result;
          -import org.apache.hadoop.hbase.client.ResultScanner;
          -import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.HTable;
          import org.apache.hadoop.hbase.client.Connection;
          -import org.apache.hadoop.hbase.client.HRegionLocator;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.Scan;
          import org.apache.hadoop.hbase.util.Bytes;
          import org.apache.hadoop.hbase.util.Pair;
          import org.slf4j.Logger;
          import org.slf4j.LoggerFactory;

          import java.io.IOException;
          -import java.math.BigDecimal;
          -import java.math.BigInteger;
          import java.util.ArrayList;
          -import java.util.Date;
          import java.util.List;
          +import java.util.Map;

          /**

          • {@link InputFormat}

            subclass that wraps the access for HTables. Returns the result as

            {@link Row}

            */
            -public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> {
            +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> {

          private static final long serialVersionUID = 1L;

          private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          private String tableName;

          • private TypeInformation[] fieldTypeInfos;
          • private String[] fieldNames;
          • private transient Table table;
          • private transient Scan scan;
            private transient Connection conn;
          • private ResultScanner resultScanner = null;
            -
          • private byte[] lastRow;
          • private int scannedRows;
          • private boolean endReached = false;
          • private org.apache.hadoop.conf.Configuration conf;
          • private static final String COLON = ":";
            + private transient org.apache.hadoop.conf.Configuration conf;
            + private HBaseTableSchema schema;
          • public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) {
          • this.conf = conf;
            + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema) { this.tableName = tableName; - this.fieldNames = fieldNames; - this.fieldTypeInfos = fieldTypeInfos; + this.conf = conf; + this.schema = schema; }

          @Override
          public void configure(Configuration parameters) {
          LOG.info("Initializing HBaseConfiguration");
          connectToTable();
          if(table != null)

          { - scan = createScanner(); + scan = getScanner(); }

          }

          • private Scan createScanner() {
            + @Override
            + protected Scan getScanner() {
            + // TODO : Pass 'rowkey'. For this we need FilterableTableSource
            Scan scan = new Scan();
          • for(String field : fieldNames) {
            + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
            + for(String family : familyMap.keySet()) {
            // select only the fields in the 'selectedFields'
          • String[] famCol = field.split(COLON);
          • scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1]));
            + List<Pair> colDetails = familyMap.get(family);
            + for(Pair<String, TypeInformation<?>> pair : colDetails) { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + }

            }
            return scan;
            }

          + @Override
          + public String getTableName()

          { + return tableName; + }

          +
          + @Override
          + protected Row mapResultToTuple(Result res) {
          + List<Object> values = new ArrayList<Object>();
          + int i = 0;
          + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
          + Row[] rows = new Row[familyMap.size()];
          — End diff –

          Better to declare `rows` as `Object[]` to avoid confusing whether `rows` is a varargs or non-varargs.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97701594 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -19,99 +19,113 @@ package org.apache.flink.addons.hbase; import org.apache.flink.api.common.io.InputFormat; -import org.apache.flink.api.common.io.LocatableInputSplitAssigner; -import org.apache.flink.api.common.io.RichInputFormat; -import org.apache.flink.api.common.io.statistics.BaseStatistics; import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.api.java.typeutils.ResultTypeQueryable; import org.apache.flink.api.java.typeutils.RowTypeInfo; import org.apache.flink.configuration.Configuration; -import org.apache.flink.core.io.InputSplitAssigner; import org.apache.flink.types.Row; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.TableNotFoundException; -import org.apache.hadoop.hbase.client.Scan; -import org.apache.hadoop.hbase.client.Table; -import org.apache.hadoop.hbase.client.ClusterConnection; -import org.apache.hadoop.hbase.client.Result; -import org.apache.hadoop.hbase.client.ResultScanner; -import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Connection; -import org.apache.hadoop.hbase.client.HRegionLocator; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.hbase.util.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; -import java.math.BigDecimal; -import java.math.BigInteger; import java.util.ArrayList; -import java.util.Date; import java.util.List; +import java.util.Map; /** {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} */ -public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> { +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> { private static final long serialVersionUID = 1L; private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); private String tableName; private TypeInformation[] fieldTypeInfos; private String[] fieldNames; private transient Table table; private transient Scan scan; private transient Connection conn; private ResultScanner resultScanner = null; - private byte[] lastRow; private int scannedRows; private boolean endReached = false; private org.apache.hadoop.conf.Configuration conf; private static final String COLON = ":"; + private transient org.apache.hadoop.conf.Configuration conf; + private HBaseTableSchema schema; public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) { this.conf = conf; + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema) { this.tableName = tableName; - this.fieldNames = fieldNames; - this.fieldTypeInfos = fieldTypeInfos; + this.conf = conf; + this.schema = schema; } @Override public void configure(Configuration parameters) { LOG.info("Initializing HBaseConfiguration"); connectToTable(); if(table != null) { - scan = createScanner(); + scan = getScanner(); } } private Scan createScanner() { + @Override + protected Scan getScanner() { + // TODO : Pass 'rowkey'. For this we need FilterableTableSource Scan scan = new Scan(); for(String field : fieldNames) { + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + for(String family : familyMap.keySet()) { // select only the fields in the 'selectedFields' String[] famCol = field.split(COLON); scan.addColumn(Bytes.toBytes(famCol [0] ), Bytes.toBytes(famCol [1] )); + List<Pair> colDetails = familyMap.get(family); + for(Pair<String, TypeInformation<?>> pair : colDetails) { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + } } return scan; } + @Override + public String getTableName() { + return tableName; + } + + @Override + protected Row mapResultToTuple(Result res) { + List<Object> values = new ArrayList<Object>(); + int i = 0; + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + Row[] rows = new Row [familyMap.size()] ; — End diff – Better to declare `rows` as `Object[]` to avoid confusing whether `rows` is a varargs or non-varargs.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97696842

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          + Preconditions.checkNotNull(family, "family name");
          + Preconditions.checkNotNull(family, "qualifier name");
          + Preconditions.checkNotNull(type, "type name");
          + List<Pair> list = this.familyMap.get(family);
          + if (list == null)

          { + list = new ArrayList<Pair>(); + }

          + boolean found = false;
          + for(Class classType : CLASS_TYPES) {
          + if(classType == type.getTypeClass())

          { + found = true; + break; + }

          + }
          + if(!found) {
          + // by default it will be byte[] type only
          + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO;
          — End diff –

          I think we should throw an exception here to indicate users this class type is not supported, please use byte[].class instead. Otherwise, the user will think that this type can work, and select the value without deserialization, and get the unexpected result.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97696842 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { + Preconditions.checkNotNull(family, "family name"); + Preconditions.checkNotNull(family, "qualifier name"); + Preconditions.checkNotNull(type, "type name"); + List<Pair> list = this.familyMap.get(family); + if (list == null) { + list = new ArrayList<Pair>(); + } + boolean found = false; + for(Class classType : CLASS_TYPES) { + if(classType == type.getTypeClass()) { + found = true; + break; + } + } + if(!found) { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; — End diff – I think we should throw an exception here to indicate users this class type is not supported, please use byte[].class instead. Otherwise, the user will think that this type can work, and select the value without deserialization, and get the unexpected result.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97696231

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          — End diff –

          addColumns -> addColumn ?

          I'm not sure whether we should use `TypeInformation` or `Class` here. Because TypeInformation indicates that we use Flink Serialization framework to serialize primitives. But it doesn't happen here. Maybe `Class` is a simpler and better choice ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97696231 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { — End diff – addColumns -> addColumn ? I'm not sure whether we should use `TypeInformation` or `Class` here. Because TypeInformation indicates that we use Flink Serialization framework to serialize primitives. But it doesn't happen here. Maybe `Class` is a simpler and better choice ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97700808

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -22,54 +22,63 @@
          import org.apache.flink.api.java.ExecutionEnvironment;
          import org.apache.flink.api.java.typeutils.RowTypeInfo;
          import org.apache.flink.table.sources.BatchTableSource;
          -import org.apache.flink.table.sources.ProjectableTableSource;
          import org.apache.flink.types.Row;
          import org.apache.flink.util.Preconditions;
          import org.apache.hadoop.conf.Configuration;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;

          /**

          • Creates a table source that helps to scan data from an hbase table
            *
          • Note : the colNames are specified along with a familyName and they are seperated by a ':'
          • For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
            */
            -public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
            +// TODO : Implement ProjectableTableSource?
            +public class HBaseTableSource implements BatchTableSource<Row> {

          private Configuration conf;
          private String tableName;

          • private byte[] rowKey;
          • private String[] colNames;
          • private TypeInformation<?>[] colTypes;
            + private HBaseTableSchema schema;
            + private String[] famNames;
          • public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          • TypeInformation<?>[] colTypes) {
            + public HBaseTableSource(Configuration conf, String tableName, HBaseTableSchema schema) { this.conf = conf; this.tableName = Preconditions.checkNotNull(tableName, "Table name"); - this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); - this.colNames = Preconditions.checkNotNull(colNames, "Field names"); - this.colTypes = Preconditions.checkNotNull(colTypes, "Field types"); + this.schema = Preconditions.checkNotNull(schema, "Schema"); + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + famNames = familyMap.keySet().toArray(new String[familyMap.size()]); }

          @Override
          public TypeInformation<Row> getReturnType()

          { - return new RowTypeInfo(colTypes); - }

          -

          • @Override
          • public DataSet<Row> getDataSet(ExecutionEnvironment execEnv) { - return execEnv.createInput(new HBaseTableSourceInputFormat(conf, tableName, colNames, colTypes), getReturnType()); - }

            + // split the fieldNames
            + Map<String, List<Pair>> famMap = schema.getFamilyMap();

          • @Override
          • public ProjectableTableSource<Row> projectFields(int[] fields) {
          • String[] newColNames = new String[fields.length];
          • TypeInformation<?>[] newColTypes = new TypeInformation<?>[fields.length];
            + List<String> qualNames = new ArrayList<String>();
              • End diff –

          We can move the code of creating `typeInfos` into `HBaseTableSchema` , named `TypeInformation<?>[] getColumnTypes()`.

          And also the code of creating family names called `String[] getFamilyNames()`.

          This can reduce the redundant code.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97700808 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -22,54 +22,63 @@ import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.api.java.typeutils.RowTypeInfo; import org.apache.flink.table.sources.BatchTableSource; -import org.apache.flink.table.sources.ProjectableTableSource; import org.apache.flink.types.Row; import org.apache.flink.util.Preconditions; import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hbase.util.Pair; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; /** Creates a table source that helps to scan data from an hbase table * Note : the colNames are specified along with a familyName and they are seperated by a ':' For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name */ -public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { +// TODO : Implement ProjectableTableSource? +public class HBaseTableSource implements BatchTableSource<Row> { private Configuration conf; private String tableName; private byte[] rowKey; private String[] colNames; private TypeInformation<?>[] colTypes; + private HBaseTableSchema schema; + private String[] famNames; public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, TypeInformation<?>[] colTypes) { + public HBaseTableSource(Configuration conf, String tableName, HBaseTableSchema schema) { this.conf = conf; this.tableName = Preconditions.checkNotNull(tableName, "Table name"); - this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); - this.colNames = Preconditions.checkNotNull(colNames, "Field names"); - this.colTypes = Preconditions.checkNotNull(colTypes, "Field types"); + this.schema = Preconditions.checkNotNull(schema, "Schema"); + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + famNames = familyMap.keySet().toArray(new String[familyMap.size()]); } @Override public TypeInformation<Row> getReturnType() { - return new RowTypeInfo(colTypes); - } - @Override public DataSet<Row> getDataSet(ExecutionEnvironment execEnv) { - return execEnv.createInput(new HBaseTableSourceInputFormat(conf, tableName, colNames, colTypes), getReturnType()); - } + // split the fieldNames + Map<String, List<Pair>> famMap = schema.getFamilyMap(); @Override public ProjectableTableSource<Row> projectFields(int[] fields) { String[] newColNames = new String [fields.length] ; TypeInformation<?>[] newColTypes = new TypeInformation<?> [fields.length] ; + List<String> qualNames = new ArrayList<String>(); End diff – We can move the code of creating `typeInfos` into `HBaseTableSchema` , named `TypeInformation<?>[] getColumnTypes()`. And also the code of creating family names called `String[] getFamilyNames()`. This can reduce the redundant code.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97699488

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -22,54 +22,63 @@
          import org.apache.flink.api.java.ExecutionEnvironment;
          import org.apache.flink.api.java.typeutils.RowTypeInfo;
          import org.apache.flink.table.sources.BatchTableSource;
          -import org.apache.flink.table.sources.ProjectableTableSource;
          import org.apache.flink.types.Row;
          import org.apache.flink.util.Preconditions;
          import org.apache.hadoop.conf.Configuration;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;

          /**

          • Creates a table source that helps to scan data from an hbase table
            *
          • Note : the colNames are specified along with a familyName and they are seperated by a ':'
          • For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
            */
            -public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
            +// TODO : Implement ProjectableTableSource?
            +public class HBaseTableSource implements BatchTableSource<Row> {

          private Configuration conf;
          private String tableName;

          • private byte[] rowKey;
          • private String[] colNames;
          • private TypeInformation<?>[] colTypes;
            + private HBaseTableSchema schema;
            + private String[] famNames;
          • public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          • TypeInformation<?>[] colTypes) {
            + public HBaseTableSource(Configuration conf, String tableName, HBaseTableSchema schema) {
            this.conf = conf;
            this.tableName = Preconditions.checkNotNull(tableName, "Table name");
          • this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey");
          • this.colNames = Preconditions.checkNotNull(colNames, "Field names");
          • this.colTypes = Preconditions.checkNotNull(colTypes, "Field types");
            + this.schema = Preconditions.checkNotNull(schema, "Schema");
            + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
            + famNames = familyMap.keySet().toArray(new String[familyMap.size()]);
              • End diff –

          I would like to move these code into `getReturnType()`, the schema may change after construction of `HBaseTableSource`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97699488 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -22,54 +22,63 @@ import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.api.java.typeutils.RowTypeInfo; import org.apache.flink.table.sources.BatchTableSource; -import org.apache.flink.table.sources.ProjectableTableSource; import org.apache.flink.types.Row; import org.apache.flink.util.Preconditions; import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hbase.util.Pair; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; /** Creates a table source that helps to scan data from an hbase table * Note : the colNames are specified along with a familyName and they are seperated by a ':' For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name */ -public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { +// TODO : Implement ProjectableTableSource? +public class HBaseTableSource implements BatchTableSource<Row> { private Configuration conf; private String tableName; private byte[] rowKey; private String[] colNames; private TypeInformation<?>[] colTypes; + private HBaseTableSchema schema; + private String[] famNames; public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, TypeInformation<?>[] colTypes) { + public HBaseTableSource(Configuration conf, String tableName, HBaseTableSchema schema) { this.conf = conf; this.tableName = Preconditions.checkNotNull(tableName, "Table name"); this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); this.colNames = Preconditions.checkNotNull(colNames, "Field names"); this.colTypes = Preconditions.checkNotNull(colTypes, "Field types"); + this.schema = Preconditions.checkNotNull(schema, "Schema"); + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + famNames = familyMap.keySet().toArray(new String [familyMap.size()] ); End diff – I would like to move these code into `getReturnType()`, the schema may change after construction of `HBaseTableSource`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97698410

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          + Preconditions.checkNotNull(family, "family name");
          + Preconditions.checkNotNull(family, "qualifier name");
          + Preconditions.checkNotNull(type, "type name");
          + List<Pair> list = this.familyMap.get(family);
          + if (list == null)

          { + list = new ArrayList<Pair>(); + }

          + boolean found = false;
          + for(Class classType : CLASS_TYPES) {
          + if(classType == type.getTypeClass())

          { + found = true; + break; + }

          + }
          + if(!found)

          { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + }

          + list.add(new Pair(qualifier, type));
          — End diff –

          add a `<>` to avoid warning. `list.add(new Pair<>(qualifier, type));`

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97698410 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { + Preconditions.checkNotNull(family, "family name"); + Preconditions.checkNotNull(family, "qualifier name"); + Preconditions.checkNotNull(type, "type name"); + List<Pair> list = this.familyMap.get(family); + if (list == null) { + list = new ArrayList<Pair>(); + } + boolean found = false; + for(Class classType : CLASS_TYPES) { + if(classType == type.getTypeClass()) { + found = true; + break; + } + } + if(!found) { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + } + list.add(new Pair(qualifier, type)); — End diff – add a `<>` to avoid warning. `list.add(new Pair<>(qualifier, type));`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97702388

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,160 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private transient Connection conn;
          + private transient org.apache.hadoop.conf.Configuration conf;
          + private HBaseTableSchema schema;
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema)

          { + this.tableName = tableName; + this.conf = conf; + this.schema = schema; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = getScanner(); + }

          + }
          +
          + @Override
          + protected Scan getScanner() {
          + // TODO : Pass 'rowkey'. For this we need FilterableTableSource
          + Scan scan = new Scan();
          + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
          + for(String family : familyMap.keySet()) {
          + // select only the fields in the 'selectedFields'
          + List<Pair> colDetails = familyMap.get(family);
          + for(Pair<String, TypeInformation<?>> pair : colDetails)

          { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + }

          + }
          + return scan;
          + }
          +
          + @Override
          + public String getTableName()

          { + return tableName; + }

          +
          + @Override
          + protected Row mapResultToTuple(Result res) {
          + List<Object> values = new ArrayList<Object>();
          + int i = 0;
          + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
          + Row[] rows = new Row[familyMap.size()];
          + for(String family : familyMap.keySet()) {
          + List<Pair> colDetails = familyMap.get(family);
          + for(Pair<String, TypeInformation<?>> pair : colDetails) {
          + byte[] value = res.getValue(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst()));
          + if(value != null)

          { + values.add(schema.deserialize(value, pair.getSecond())); + }

          else {
          + values.add(schema.deserializeNull(pair.getSecond()));
          — End diff –

          >But one main concern from my side is how to present the 'NULL' means we specify a column with a type but there is no data for that column. For now I have handled by returning the Int, Float, Long - Min_values. But that may not be right I believe. Feedback and suggestions welcome.

          I think we can return `null` directly.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97702388 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private transient Connection conn; + private transient org.apache.hadoop.conf.Configuration conf; + private HBaseTableSchema schema; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema) { + this.tableName = tableName; + this.conf = conf; + this.schema = schema; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = getScanner(); + } + } + + @Override + protected Scan getScanner() { + // TODO : Pass 'rowkey'. For this we need FilterableTableSource + Scan scan = new Scan(); + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + for(String family : familyMap.keySet()) { + // select only the fields in the 'selectedFields' + List<Pair> colDetails = familyMap.get(family); + for(Pair<String, TypeInformation<?>> pair : colDetails) { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + } + } + return scan; + } + + @Override + public String getTableName() { + return tableName; + } + + @Override + protected Row mapResultToTuple(Result res) { + List<Object> values = new ArrayList<Object>(); + int i = 0; + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + Row[] rows = new Row [familyMap.size()] ; + for(String family : familyMap.keySet()) { + List<Pair> colDetails = familyMap.get(family); + for(Pair<String, TypeInformation<?>> pair : colDetails) { + byte[] value = res.getValue(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + if(value != null) { + values.add(schema.deserialize(value, pair.getSecond())); + } else { + values.add(schema.deserializeNull(pair.getSecond())); — End diff – >But one main concern from my side is how to present the 'NULL' means we specify a column with a type but there is no data for that column. For now I have handled by returning the Int, Float, Long - Min_values. But that may not be right I believe. Feedback and suggestions welcome. I think we can return `null` directly.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97702519

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,160 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private transient Connection conn;
          + private transient org.apache.hadoop.conf.Configuration conf;
          + private HBaseTableSchema schema;
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema)

          { + this.tableName = tableName; + this.conf = conf; + this.schema = schema; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = getScanner(); + }

          + }
          +
          + @Override
          + protected Scan getScanner() {
          + // TODO : Pass 'rowkey'. For this we need FilterableTableSource
          + Scan scan = new Scan();
          + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
          + for(String family : familyMap.keySet()) {
          + // select only the fields in the 'selectedFields'
          + List<Pair> colDetails = familyMap.get(family);
          + for(Pair<String, TypeInformation<?>> pair : colDetails)

          { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + }

          + }
          + return scan;
          + }
          +
          + @Override
          + public String getTableName()

          { + return tableName; + }

          +
          + @Override
          + protected Row mapResultToTuple(Result res) {
          + List<Object> values = new ArrayList<Object>();
          + int i = 0;
          + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
          + Row[] rows = new Row[familyMap.size()];
          + for(String family : familyMap.keySet()) {
          + List<Pair> colDetails = familyMap.get(family);
          + for(Pair<String, TypeInformation<?>> pair : colDetails) {
          + byte[] value = res.getValue(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst()));
          + if(value != null)

          { + values.add(schema.deserialize(value, pair.getSecond())); + }

          else {
          + values.add(schema.deserializeNull(pair.getSecond()));
          — End diff –

          >But one main concern from my side is how to present the 'NULL' means we specify a column with a type but there is no data for that column. For now I have handled by returning the Int, Float, Long - Min_values. But that may not be right I believe. Feedback and suggestions welcome.

          I think we can return `null` directly.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97702519 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private transient Connection conn; + private transient org.apache.hadoop.conf.Configuration conf; + private HBaseTableSchema schema; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema) { + this.tableName = tableName; + this.conf = conf; + this.schema = schema; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = getScanner(); + } + } + + @Override + protected Scan getScanner() { + // TODO : Pass 'rowkey'. For this we need FilterableTableSource + Scan scan = new Scan(); + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + for(String family : familyMap.keySet()) { + // select only the fields in the 'selectedFields' + List<Pair> colDetails = familyMap.get(family); + for(Pair<String, TypeInformation<?>> pair : colDetails) { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + } + } + return scan; + } + + @Override + public String getTableName() { + return tableName; + } + + @Override + protected Row mapResultToTuple(Result res) { + List<Object> values = new ArrayList<Object>(); + int i = 0; + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + Row[] rows = new Row [familyMap.size()] ; + for(String family : familyMap.keySet()) { + List<Pair> colDetails = familyMap.get(family); + for(Pair<String, TypeInformation<?>> pair : colDetails) { + byte[] value = res.getValue(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + if(value != null) { + values.add(schema.deserialize(value, pair.getSecond())); + } else { + values.add(schema.deserializeNull(pair.getSecond())); — End diff – >But one main concern from my side is how to present the 'NULL' means we specify a column with a type but there is no data for that column. For now I have handled by returning the Int, Float, Long - Min_values. But that may not be right I believe. Feedback and suggestions welcome. I think we can return `null` directly.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97709469

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          — End diff –

          Ok. Let me check that. So if we pass Class there we could wrap it with the corresponding TypeInformation for our internal usage?

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97709469 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { — End diff – Ok. Let me check that. So if we pass Class there we could wrap it with the corresponding TypeInformation for our internal usage?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97709482

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          + Preconditions.checkNotNull(family, "family name");
          + Preconditions.checkNotNull(family, "qualifier name");
          + Preconditions.checkNotNull(type, "type name");
          + List<Pair> list = this.familyMap.get(family);
          + if (list == null)

          { + list = new ArrayList<Pair>(); + }

          + boolean found = false;
          + for(Class classType : CLASS_TYPES) {
          + if(classType == type.getTypeClass())

          { + found = true; + break; + }

          + }
          + if(!found) {
          + // by default it will be byte[] type only
          + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO;
          — End diff –

          Ok. Got it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97709482 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { + Preconditions.checkNotNull(family, "family name"); + Preconditions.checkNotNull(family, "qualifier name"); + Preconditions.checkNotNull(type, "type name"); + List<Pair> list = this.familyMap.get(family); + if (list == null) { + list = new ArrayList<Pair>(); + } + boolean found = false; + for(Class classType : CLASS_TYPES) { + if(classType == type.getTypeClass()) { + found = true; + break; + } + } + if(!found) { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; — End diff – Ok. Got it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97709488

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          + Preconditions.checkNotNull(family, "family name");
          + Preconditions.checkNotNull(family, "qualifier name");
          + Preconditions.checkNotNull(type, "type name");
          + List<Pair> list = this.familyMap.get(family);
          + if (list == null)

          { + list = new ArrayList<Pair>(); + }

          + boolean found = false;
          + for(Class classType : CLASS_TYPES) {
          + if(classType == type.getTypeClass())

          { + found = true; + break; + }

          + }
          + if(!found)

          { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + }

          + list.add(new Pair(qualifier, type));
          — End diff –

          Ok

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97709488 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { + Preconditions.checkNotNull(family, "family name"); + Preconditions.checkNotNull(family, "qualifier name"); + Preconditions.checkNotNull(type, "type name"); + List<Pair> list = this.familyMap.get(family); + if (list == null) { + list = new ArrayList<Pair>(); + } + boolean found = false; + for(Class classType : CLASS_TYPES) { + if(classType == type.getTypeClass()) { + found = true; + break; + } + } + if(!found) { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + } + list.add(new Pair(qualifier, type)); — End diff – Ok
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97709503

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -22,54 +22,63 @@
          import org.apache.flink.api.java.ExecutionEnvironment;
          import org.apache.flink.api.java.typeutils.RowTypeInfo;
          import org.apache.flink.table.sources.BatchTableSource;
          -import org.apache.flink.table.sources.ProjectableTableSource;
          import org.apache.flink.types.Row;
          import org.apache.flink.util.Preconditions;
          import org.apache.hadoop.conf.Configuration;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;

          /**

          • Creates a table source that helps to scan data from an hbase table
            *
          • Note : the colNames are specified along with a familyName and they are seperated by a ':'
          • For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
            */
            -public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
            +// TODO : Implement ProjectableTableSource?
            +public class HBaseTableSource implements BatchTableSource<Row> {

          private Configuration conf;
          private String tableName;

          • private byte[] rowKey;
          • private String[] colNames;
          • private TypeInformation<?>[] colTypes;
            + private HBaseTableSchema schema;
            + private String[] famNames;
          • public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          • TypeInformation<?>[] colTypes) {
            + public HBaseTableSource(Configuration conf, String tableName, HBaseTableSchema schema) {
            this.conf = conf;
            this.tableName = Preconditions.checkNotNull(tableName, "Table name");
          • this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey");
          • this.colNames = Preconditions.checkNotNull(colNames, "Field names");
          • this.colTypes = Preconditions.checkNotNull(colTypes, "Field types");
            + this.schema = Preconditions.checkNotNull(schema, "Schema");
            + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
            + famNames = familyMap.keySet().toArray(new String[familyMap.size()]);
              • End diff –

          Fine. Will do it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97709503 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -22,54 +22,63 @@ import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.api.java.typeutils.RowTypeInfo; import org.apache.flink.table.sources.BatchTableSource; -import org.apache.flink.table.sources.ProjectableTableSource; import org.apache.flink.types.Row; import org.apache.flink.util.Preconditions; import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hbase.util.Pair; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; /** Creates a table source that helps to scan data from an hbase table * Note : the colNames are specified along with a familyName and they are seperated by a ':' For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name */ -public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { +// TODO : Implement ProjectableTableSource? +public class HBaseTableSource implements BatchTableSource<Row> { private Configuration conf; private String tableName; private byte[] rowKey; private String[] colNames; private TypeInformation<?>[] colTypes; + private HBaseTableSchema schema; + private String[] famNames; public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, TypeInformation<?>[] colTypes) { + public HBaseTableSource(Configuration conf, String tableName, HBaseTableSchema schema) { this.conf = conf; this.tableName = Preconditions.checkNotNull(tableName, "Table name"); this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); this.colNames = Preconditions.checkNotNull(colNames, "Field names"); this.colTypes = Preconditions.checkNotNull(colTypes, "Field types"); + this.schema = Preconditions.checkNotNull(schema, "Schema"); + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + famNames = familyMap.keySet().toArray(new String [familyMap.size()] ); End diff – Fine. Will do it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97709518

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -22,54 +22,63 @@
          import org.apache.flink.api.java.ExecutionEnvironment;
          import org.apache.flink.api.java.typeutils.RowTypeInfo;
          import org.apache.flink.table.sources.BatchTableSource;
          -import org.apache.flink.table.sources.ProjectableTableSource;
          import org.apache.flink.types.Row;
          import org.apache.flink.util.Preconditions;
          import org.apache.hadoop.conf.Configuration;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;

          /**

          • Creates a table source that helps to scan data from an hbase table
            *
          • Note : the colNames are specified along with a familyName and they are seperated by a ':'
          • For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
            */
            -public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> {
            +// TODO : Implement ProjectableTableSource?
            +public class HBaseTableSource implements BatchTableSource<Row> {

          private Configuration conf;
          private String tableName;

          • private byte[] rowKey;
          • private String[] colNames;
          • private TypeInformation<?>[] colTypes;
            + private HBaseTableSchema schema;
            + private String[] famNames;
          • public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames,
          • TypeInformation<?>[] colTypes) {
            + public HBaseTableSource(Configuration conf, String tableName, HBaseTableSchema schema) { this.conf = conf; this.tableName = Preconditions.checkNotNull(tableName, "Table name"); - this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); - this.colNames = Preconditions.checkNotNull(colNames, "Field names"); - this.colTypes = Preconditions.checkNotNull(colTypes, "Field types"); + this.schema = Preconditions.checkNotNull(schema, "Schema"); + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + famNames = familyMap.keySet().toArray(new String[familyMap.size()]); }

          @Override
          public TypeInformation<Row> getReturnType()

          { - return new RowTypeInfo(colTypes); - }

          -

          • @Override
          • public DataSet<Row> getDataSet(ExecutionEnvironment execEnv) { - return execEnv.createInput(new HBaseTableSourceInputFormat(conf, tableName, colNames, colTypes), getReturnType()); - }

            + // split the fieldNames
            + Map<String, List<Pair>> famMap = schema.getFamilyMap();

          • @Override
          • public ProjectableTableSource<Row> projectFields(int[] fields) {
          • String[] newColNames = new String[fields.length];
          • TypeInformation<?>[] newColTypes = new TypeInformation<?>[fields.length];
            + List<String> qualNames = new ArrayList<String>();
              • End diff –

          I will look into this.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97709518 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -22,54 +22,63 @@ import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.api.java.typeutils.RowTypeInfo; import org.apache.flink.table.sources.BatchTableSource; -import org.apache.flink.table.sources.ProjectableTableSource; import org.apache.flink.types.Row; import org.apache.flink.util.Preconditions; import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hbase.util.Pair; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; /** Creates a table source that helps to scan data from an hbase table * Note : the colNames are specified along with a familyName and they are seperated by a ':' For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name */ -public class HBaseTableSource implements BatchTableSource<Row>, ProjectableTableSource<Row> { +// TODO : Implement ProjectableTableSource? +public class HBaseTableSource implements BatchTableSource<Row> { private Configuration conf; private String tableName; private byte[] rowKey; private String[] colNames; private TypeInformation<?>[] colTypes; + private HBaseTableSchema schema; + private String[] famNames; public HBaseTableSource(Configuration conf, String tableName, byte[] rowKey, String[] colNames, TypeInformation<?>[] colTypes) { + public HBaseTableSource(Configuration conf, String tableName, HBaseTableSchema schema) { this.conf = conf; this.tableName = Preconditions.checkNotNull(tableName, "Table name"); - this.rowKey = Preconditions.checkNotNull(rowKey, "Rowkey"); - this.colNames = Preconditions.checkNotNull(colNames, "Field names"); - this.colTypes = Preconditions.checkNotNull(colTypes, "Field types"); + this.schema = Preconditions.checkNotNull(schema, "Schema"); + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + famNames = familyMap.keySet().toArray(new String[familyMap.size()]); } @Override public TypeInformation<Row> getReturnType() { - return new RowTypeInfo(colTypes); - } - @Override public DataSet<Row> getDataSet(ExecutionEnvironment execEnv) { - return execEnv.createInput(new HBaseTableSourceInputFormat(conf, tableName, colNames, colTypes), getReturnType()); - } + // split the fieldNames + Map<String, List<Pair>> famMap = schema.getFamilyMap(); @Override public ProjectableTableSource<Row> projectFields(int[] fields) { String[] newColNames = new String [fields.length] ; TypeInformation<?>[] newColTypes = new TypeInformation<?> [fields.length] ; + List<String> qualNames = new ArrayList<String>(); End diff – I will look into this.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97709565

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -0,0 +1,160 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.io.InputFormat;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.configuration.Configuration;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.HBaseConfiguration;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.TableNotFoundException;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Connection;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.Scan;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +import org.slf4j.Logger;
          +import org.slf4j.LoggerFactory;
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.List;
          +import java.util.Map;
          +
          +/**
          + *

          {@link InputFormat}

          subclass that wraps the access for HTables. Returns the result as

          {@link Row}

          + */
          +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> {
          +
          + private static final long serialVersionUID = 1L;
          +
          + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          + private String tableName;
          + private transient Connection conn;
          + private transient org.apache.hadoop.conf.Configuration conf;
          + private HBaseTableSchema schema;
          +
          + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema)

          { + this.tableName = tableName; + this.conf = conf; + this.schema = schema; + }

          +
          + @Override
          + public void configure(Configuration parameters) {
          + LOG.info("Initializing HBaseConfiguration");
          + connectToTable();
          + if(table != null)

          { + scan = getScanner(); + }

          + }
          +
          + @Override
          + protected Scan getScanner() {
          + // TODO : Pass 'rowkey'. For this we need FilterableTableSource
          + Scan scan = new Scan();
          + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
          + for(String family : familyMap.keySet()) {
          + // select only the fields in the 'selectedFields'
          + List<Pair> colDetails = familyMap.get(family);
          + for(Pair<String, TypeInformation<?>> pair : colDetails)

          { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + }

          + }
          + return scan;
          + }
          +
          + @Override
          + public String getTableName()

          { + return tableName; + }

          +
          + @Override
          + protected Row mapResultToTuple(Result res) {
          + List<Object> values = new ArrayList<Object>();
          + int i = 0;
          + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
          + Row[] rows = new Row[familyMap.size()];
          + for(String family : familyMap.keySet()) {
          + List<Pair> colDetails = familyMap.get(family);
          + for(Pair<String, TypeInformation<?>> pair : colDetails) {
          + byte[] value = res.getValue(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst()));
          + if(value != null)

          { + values.add(schema.deserialize(value, pair.getSecond())); + }

          else {
          + values.add(schema.deserializeNull(pair.getSecond()));
          — End diff –

          Ya. I dint know this. Your question answers my doubt and the shaky method that I added can be removed.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97709565 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.io.InputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.TableNotFoundException; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Connection; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +/** + * {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} + */ +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); + private String tableName; + private transient Connection conn; + private transient org.apache.hadoop.conf.Configuration conf; + private HBaseTableSchema schema; + + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema) { + this.tableName = tableName; + this.conf = conf; + this.schema = schema; + } + + @Override + public void configure(Configuration parameters) { + LOG.info("Initializing HBaseConfiguration"); + connectToTable(); + if(table != null) { + scan = getScanner(); + } + } + + @Override + protected Scan getScanner() { + // TODO : Pass 'rowkey'. For this we need FilterableTableSource + Scan scan = new Scan(); + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + for(String family : familyMap.keySet()) { + // select only the fields in the 'selectedFields' + List<Pair> colDetails = familyMap.get(family); + for(Pair<String, TypeInformation<?>> pair : colDetails) { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + } + } + return scan; + } + + @Override + public String getTableName() { + return tableName; + } + + @Override + protected Row mapResultToTuple(Result res) { + List<Object> values = new ArrayList<Object>(); + int i = 0; + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + Row[] rows = new Row [familyMap.size()] ; + for(String family : familyMap.keySet()) { + List<Pair> colDetails = familyMap.get(family); + for(Pair<String, TypeInformation<?>> pair : colDetails) { + byte[] value = res.getValue(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + if(value != null) { + values.add(schema.deserialize(value, pair.getSecond())); + } else { + values.add(schema.deserializeNull(pair.getSecond())); — End diff – Ya. I dint know this. Your question answers my doubt and the shaky method that I added can be removed.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97709590

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java —
          @@ -19,99 +19,113 @@
          package org.apache.flink.addons.hbase;

          import org.apache.flink.api.common.io.InputFormat;
          -import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
          -import org.apache.flink.api.common.io.RichInputFormat;
          -import org.apache.flink.api.common.io.statistics.BaseStatistics;
          import org.apache.flink.api.common.typeinfo.TypeInformation;
          import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
          import org.apache.flink.api.java.typeutils.RowTypeInfo;
          import org.apache.flink.configuration.Configuration;
          -import org.apache.flink.core.io.InputSplitAssigner;
          import org.apache.flink.types.Row;
          import org.apache.hadoop.hbase.HBaseConfiguration;
          import org.apache.hadoop.hbase.TableName;
          import org.apache.hadoop.hbase.TableNotFoundException;
          -import org.apache.hadoop.hbase.client.Scan;
          -import org.apache.hadoop.hbase.client.Table;
          -import org.apache.hadoop.hbase.client.ClusterConnection;
          -import org.apache.hadoop.hbase.client.Result;
          -import org.apache.hadoop.hbase.client.ResultScanner;
          -import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.HTable;
          import org.apache.hadoop.hbase.client.Connection;
          -import org.apache.hadoop.hbase.client.HRegionLocator;
          +import org.apache.hadoop.hbase.client.ConnectionFactory;
          +import org.apache.hadoop.hbase.client.Result;
          +import org.apache.hadoop.hbase.client.Scan;
          import org.apache.hadoop.hbase.util.Bytes;
          import org.apache.hadoop.hbase.util.Pair;
          import org.slf4j.Logger;
          import org.slf4j.LoggerFactory;

          import java.io.IOException;
          -import java.math.BigDecimal;
          -import java.math.BigInteger;
          import java.util.ArrayList;
          -import java.util.Date;
          import java.util.List;
          +import java.util.Map;

          /**

          • {@link InputFormat}

            subclass that wraps the access for HTables. Returns the result as

            {@link Row}

            */
            -public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> {
            +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> {

          private static final long serialVersionUID = 1L;

          private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class);
          private String tableName;

          • private TypeInformation[] fieldTypeInfos;
          • private String[] fieldNames;
          • private transient Table table;
          • private transient Scan scan;
            private transient Connection conn;
          • private ResultScanner resultScanner = null;
            -
          • private byte[] lastRow;
          • private int scannedRows;
          • private boolean endReached = false;
          • private org.apache.hadoop.conf.Configuration conf;
          • private static final String COLON = ":";
            + private transient org.apache.hadoop.conf.Configuration conf;
            + private HBaseTableSchema schema;
          • public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) {
          • this.conf = conf;
            + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema) { this.tableName = tableName; - this.fieldNames = fieldNames; - this.fieldTypeInfos = fieldTypeInfos; + this.conf = conf; + this.schema = schema; }

          @Override
          public void configure(Configuration parameters) {
          LOG.info("Initializing HBaseConfiguration");
          connectToTable();
          if(table != null)

          { - scan = createScanner(); + scan = getScanner(); }

          }

          • private Scan createScanner() {
            + @Override
            + protected Scan getScanner() {
            + // TODO : Pass 'rowkey'. For this we need FilterableTableSource
            Scan scan = new Scan();
          • for(String field : fieldNames) {
            + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
            + for(String family : familyMap.keySet()) {
            // select only the fields in the 'selectedFields'
          • String[] famCol = field.split(COLON);
          • scan.addColumn(Bytes.toBytes(famCol[0]), Bytes.toBytes(famCol[1]));
            + List<Pair> colDetails = familyMap.get(family);
            + for(Pair<String, TypeInformation<?>> pair : colDetails) { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + }

            }
            return scan;
            }

          + @Override
          + public String getTableName()

          { + return tableName; + }

          +
          + @Override
          + protected Row mapResultToTuple(Result res) {
          + List<Object> values = new ArrayList<Object>();
          + int i = 0;
          + Map<String, List<Pair>> familyMap = schema.getFamilyMap();
          + Row[] rows = new Row[familyMap.size()];
          — End diff –

          Will check this too. Thanks for all the nice comments @wuchong .

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97709590 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSourceInputFormat.java — @@ -19,99 +19,113 @@ package org.apache.flink.addons.hbase; import org.apache.flink.api.common.io.InputFormat; -import org.apache.flink.api.common.io.LocatableInputSplitAssigner; -import org.apache.flink.api.common.io.RichInputFormat; -import org.apache.flink.api.common.io.statistics.BaseStatistics; import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.api.java.typeutils.ResultTypeQueryable; import org.apache.flink.api.java.typeutils.RowTypeInfo; import org.apache.flink.configuration.Configuration; -import org.apache.flink.core.io.InputSplitAssigner; import org.apache.flink.types.Row; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.TableNotFoundException; -import org.apache.hadoop.hbase.client.Scan; -import org.apache.hadoop.hbase.client.Table; -import org.apache.hadoop.hbase.client.ClusterConnection; -import org.apache.hadoop.hbase.client.Result; -import org.apache.hadoop.hbase.client.ResultScanner; -import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Connection; -import org.apache.hadoop.hbase.client.HRegionLocator; +import org.apache.hadoop.hbase.client.ConnectionFactory; +import org.apache.hadoop.hbase.client.Result; +import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.hbase.util.Pair; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.IOException; -import java.math.BigDecimal; -import java.math.BigInteger; import java.util.ArrayList; -import java.util.Date; import java.util.List; +import java.util.Map; /** {@link InputFormat} subclass that wraps the access for HTables. Returns the result as {@link Row} */ -public class HBaseTableSourceInputFormat extends RichInputFormat<Row, TableInputSplit> implements ResultTypeQueryable<Row> { +public class HBaseTableSourceInputFormat extends TableInputFormat<Row> implements ResultTypeQueryable<Row> { private static final long serialVersionUID = 1L; private static final Logger LOG = LoggerFactory.getLogger(HBaseTableSourceInputFormat.class); private String tableName; private TypeInformation[] fieldTypeInfos; private String[] fieldNames; private transient Table table; private transient Scan scan; private transient Connection conn; private ResultScanner resultScanner = null; - private byte[] lastRow; private int scannedRows; private boolean endReached = false; private org.apache.hadoop.conf.Configuration conf; private static final String COLON = ":"; + private transient org.apache.hadoop.conf.Configuration conf; + private HBaseTableSchema schema; public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, String[] fieldNames, TypeInformation[] fieldTypeInfos) { this.conf = conf; + public HBaseTableSourceInputFormat(org.apache.hadoop.conf.Configuration conf, String tableName, HBaseTableSchema schema) { this.tableName = tableName; - this.fieldNames = fieldNames; - this.fieldTypeInfos = fieldTypeInfos; + this.conf = conf; + this.schema = schema; } @Override public void configure(Configuration parameters) { LOG.info("Initializing HBaseConfiguration"); connectToTable(); if(table != null) { - scan = createScanner(); + scan = getScanner(); } } private Scan createScanner() { + @Override + protected Scan getScanner() { + // TODO : Pass 'rowkey'. For this we need FilterableTableSource Scan scan = new Scan(); for(String field : fieldNames) { + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + for(String family : familyMap.keySet()) { // select only the fields in the 'selectedFields' String[] famCol = field.split(COLON); scan.addColumn(Bytes.toBytes(famCol [0] ), Bytes.toBytes(famCol [1] )); + List<Pair> colDetails = familyMap.get(family); + for(Pair<String, TypeInformation<?>> pair : colDetails) { + scan.addColumn(Bytes.toBytes(family), Bytes.toBytes(pair.getFirst())); + } } return scan; } + @Override + public String getTableName() { + return tableName; + } + + @Override + protected Row mapResultToTuple(Result res) { + List<Object> values = new ArrayList<Object>(); + int i = 0; + Map<String, List<Pair>> familyMap = schema.getFamilyMap(); + Row[] rows = new Row [familyMap.size()] ; — End diff – Will check this too. Thanks for all the nice comments @wuchong .
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97710068

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          — End diff –

          Yes. We can use `TypeExtractor.getForClass(Class)` to extract the corresponding TypeInformation for internal use.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97710068 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { — End diff – Yes. We can use `TypeExtractor.getForClass(Class)` to extract the corresponding TypeInformation for internal use.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ex00 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97730063

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          + Preconditions.checkNotNull(family, "family name");
          + Preconditions.checkNotNull(family, "qualifier name");
          + Preconditions.checkNotNull(type, "type name");
          + List<Pair> list = this.familyMap.get(family);
          + if (list == null)

          { + list = new ArrayList<Pair>(); + }

          + boolean found = false;
          + for(Class classType : CLASS_TYPES) {
          + if(classType == type.getTypeClass())

          { + found = true; + break; + }

          + }
          + if(!found)

          { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + }

          + list.add(new Pair(qualifier, type));
          + familyMap.put(family, list);
          + }
          +
          + public Map<String, List<Pair>> getFamilyMap()

          { + return this.familyMap; + }

          +
          + public Object deserialize(byte[] value, TypeInformation<?> typeInfo) {
          + if (typeInfo.isBasicType()) {
          + if (typeInfo.getTypeClass() == Integer.class)

          { + return Bytes.toInt(value); + }

          else if (typeInfo.getTypeClass() == Short.class)

          { + return Bytes.toShort(value); + }

          else if (typeInfo.getTypeClass() == Float.class)

          { + return Bytes.toFloat(value); + }

          else if (typeInfo.getTypeClass() == Long.class)

          { + return Bytes.toLong(value); + }

          else if (typeInfo.getTypeClass() == String.class)

          { + return Bytes.toString(value); + }

          else if (typeInfo.getTypeClass() == Byte.class)

          { + return value[0]; + }

          else if (typeInfo.getTypeClass() == Boolean.class)

          { + return Bytes.toBoolean(value); + }

          else if (typeInfo.getTypeClass() == Double.class)

          { + return Bytes.toDouble(value); + }

          else if (typeInfo.getTypeClass() == BigInteger.class)

          { + return new BigInteger(value); + }

          else if (typeInfo.getTypeClass() == BigDecimal.class)

          { + return Bytes.toBigDecimal(value); + }

          else if (typeInfo.getTypeClass() == Date.class)

          { + return new Date(Bytes.toLong(value)); + }

          + }
          + return value;
          + }
          +
          + public Object deserializeNull(TypeInformation<?> typeInfo) {
          + // TODO : this may need better handling.
          + if(typeInfo.getTypeClass() == Integer.class) {
          — End diff –

          Why don't try use here ``org.apache.flink.table.codegen.CodeGenUtils#primitiveDefaultValue`` for basic types and ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user ex00 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97730063 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { + Preconditions.checkNotNull(family, "family name"); + Preconditions.checkNotNull(family, "qualifier name"); + Preconditions.checkNotNull(type, "type name"); + List<Pair> list = this.familyMap.get(family); + if (list == null) { + list = new ArrayList<Pair>(); + } + boolean found = false; + for(Class classType : CLASS_TYPES) { + if(classType == type.getTypeClass()) { + found = true; + break; + } + } + if(!found) { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + } + list.add(new Pair(qualifier, type)); + familyMap.put(family, list); + } + + public Map<String, List<Pair>> getFamilyMap() { + return this.familyMap; + } + + public Object deserialize(byte[] value, TypeInformation<?> typeInfo) { + if (typeInfo.isBasicType()) { + if (typeInfo.getTypeClass() == Integer.class) { + return Bytes.toInt(value); + } else if (typeInfo.getTypeClass() == Short.class) { + return Bytes.toShort(value); + } else if (typeInfo.getTypeClass() == Float.class) { + return Bytes.toFloat(value); + } else if (typeInfo.getTypeClass() == Long.class) { + return Bytes.toLong(value); + } else if (typeInfo.getTypeClass() == String.class) { + return Bytes.toString(value); + } else if (typeInfo.getTypeClass() == Byte.class) { + return value[0]; + } else if (typeInfo.getTypeClass() == Boolean.class) { + return Bytes.toBoolean(value); + } else if (typeInfo.getTypeClass() == Double.class) { + return Bytes.toDouble(value); + } else if (typeInfo.getTypeClass() == BigInteger.class) { + return new BigInteger(value); + } else if (typeInfo.getTypeClass() == BigDecimal.class) { + return Bytes.toBigDecimal(value); + } else if (typeInfo.getTypeClass() == Date.class) { + return new Date(Bytes.toLong(value)); + } + } + return value; + } + + public Object deserializeNull(TypeInformation<?> typeInfo) { + // TODO : this may need better handling. + if(typeInfo.getTypeClass() == Integer.class) { — End diff – Why don't try use here ``org.apache.flink.table.codegen.CodeGenUtils#primitiveDefaultValue`` for basic types and ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97730820

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          + Preconditions.checkNotNull(family, "family name");
          + Preconditions.checkNotNull(family, "qualifier name");
          + Preconditions.checkNotNull(type, "type name");
          + List<Pair> list = this.familyMap.get(family);
          + if (list == null)

          { + list = new ArrayList<Pair>(); + }

          + boolean found = false;
          + for(Class classType : CLASS_TYPES) {
          + if(classType == type.getTypeClass())

          { + found = true; + break; + }

          + }
          + if(!found)

          { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + }

          + list.add(new Pair(qualifier, type));
          + familyMap.put(family, list);
          + }
          +
          + public Map<String, List<Pair>> getFamilyMap()

          { + return this.familyMap; + }

          +
          + public Object deserialize(byte[] value, TypeInformation<?> typeInfo) {
          + if (typeInfo.isBasicType()) {
          + if (typeInfo.getTypeClass() == Integer.class)

          { + return Bytes.toInt(value); + }

          else if (typeInfo.getTypeClass() == Short.class)

          { + return Bytes.toShort(value); + }

          else if (typeInfo.getTypeClass() == Float.class)

          { + return Bytes.toFloat(value); + }

          else if (typeInfo.getTypeClass() == Long.class)

          { + return Bytes.toLong(value); + }

          else if (typeInfo.getTypeClass() == String.class)

          { + return Bytes.toString(value); + }

          else if (typeInfo.getTypeClass() == Byte.class)

          { + return value[0]; + }

          else if (typeInfo.getTypeClass() == Boolean.class)

          { + return Bytes.toBoolean(value); + }

          else if (typeInfo.getTypeClass() == Double.class)

          { + return Bytes.toDouble(value); + }

          else if (typeInfo.getTypeClass() == BigInteger.class)

          { + return new BigInteger(value); + }

          else if (typeInfo.getTypeClass() == BigDecimal.class)

          { + return Bytes.toBigDecimal(value); + }

          else if (typeInfo.getTypeClass() == Date.class)

          { + return new Date(Bytes.toLong(value)); + }

          + }
          + return value;
          + }
          +
          + public Object deserializeNull(TypeInformation<?> typeInfo) {
          + // TODO : this may need better handling.
          + if(typeInfo.getTypeClass() == Integer.class) {
          — End diff –

          I don't think so. `CodeGenUtils#primitiveDefaultValue` is used to give a default value to primitives. Flink code generation use primitives to improve performance. But the field of `Row` is not primitive which can be `null`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97730820 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { + Preconditions.checkNotNull(family, "family name"); + Preconditions.checkNotNull(family, "qualifier name"); + Preconditions.checkNotNull(type, "type name"); + List<Pair> list = this.familyMap.get(family); + if (list == null) { + list = new ArrayList<Pair>(); + } + boolean found = false; + for(Class classType : CLASS_TYPES) { + if(classType == type.getTypeClass()) { + found = true; + break; + } + } + if(!found) { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + } + list.add(new Pair(qualifier, type)); + familyMap.put(family, list); + } + + public Map<String, List<Pair>> getFamilyMap() { + return this.familyMap; + } + + public Object deserialize(byte[] value, TypeInformation<?> typeInfo) { + if (typeInfo.isBasicType()) { + if (typeInfo.getTypeClass() == Integer.class) { + return Bytes.toInt(value); + } else if (typeInfo.getTypeClass() == Short.class) { + return Bytes.toShort(value); + } else if (typeInfo.getTypeClass() == Float.class) { + return Bytes.toFloat(value); + } else if (typeInfo.getTypeClass() == Long.class) { + return Bytes.toLong(value); + } else if (typeInfo.getTypeClass() == String.class) { + return Bytes.toString(value); + } else if (typeInfo.getTypeClass() == Byte.class) { + return value[0]; + } else if (typeInfo.getTypeClass() == Boolean.class) { + return Bytes.toBoolean(value); + } else if (typeInfo.getTypeClass() == Double.class) { + return Bytes.toDouble(value); + } else if (typeInfo.getTypeClass() == BigInteger.class) { + return new BigInteger(value); + } else if (typeInfo.getTypeClass() == BigDecimal.class) { + return Bytes.toBigDecimal(value); + } else if (typeInfo.getTypeClass() == Date.class) { + return new Date(Bytes.toLong(value)); + } + } + return value; + } + + public Object deserializeNull(TypeInformation<?> typeInfo) { + // TODO : this may need better handling. + if(typeInfo.getTypeClass() == Integer.class) { — End diff – I don't think so. `CodeGenUtils#primitiveDefaultValue` is used to give a default value to primitives. Flink code generation use primitives to improve performance. But the field of `Row` is not primitive which can be `null`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ex00 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97734069

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          + Preconditions.checkNotNull(family, "family name");
          + Preconditions.checkNotNull(family, "qualifier name");
          + Preconditions.checkNotNull(type, "type name");
          + List<Pair> list = this.familyMap.get(family);
          + if (list == null)

          { + list = new ArrayList<Pair>(); + }

          + boolean found = false;
          + for(Class classType : CLASS_TYPES) {
          + if(classType == type.getTypeClass())

          { + found = true; + break; + }

          + }
          + if(!found)

          { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + }

          + list.add(new Pair(qualifier, type));
          + familyMap.put(family, list);
          + }
          +
          + public Map<String, List<Pair>> getFamilyMap()

          { + return this.familyMap; + }

          +
          + public Object deserialize(byte[] value, TypeInformation<?> typeInfo) {
          + if (typeInfo.isBasicType()) {
          + if (typeInfo.getTypeClass() == Integer.class)

          { + return Bytes.toInt(value); + }

          else if (typeInfo.getTypeClass() == Short.class)

          { + return Bytes.toShort(value); + }

          else if (typeInfo.getTypeClass() == Float.class)

          { + return Bytes.toFloat(value); + }

          else if (typeInfo.getTypeClass() == Long.class)

          { + return Bytes.toLong(value); + }

          else if (typeInfo.getTypeClass() == String.class)

          { + return Bytes.toString(value); + }

          else if (typeInfo.getTypeClass() == Byte.class)

          { + return value[0]; + }

          else if (typeInfo.getTypeClass() == Boolean.class)

          { + return Bytes.toBoolean(value); + }

          else if (typeInfo.getTypeClass() == Double.class)

          { + return Bytes.toDouble(value); + }

          else if (typeInfo.getTypeClass() == BigInteger.class)

          { + return new BigInteger(value); + }

          else if (typeInfo.getTypeClass() == BigDecimal.class)

          { + return Bytes.toBigDecimal(value); + }

          else if (typeInfo.getTypeClass() == Date.class)

          { + return new Date(Bytes.toLong(value)); + }

          + }
          + return value;
          + }
          +
          + public Object deserializeNull(TypeInformation<?> typeInfo) {
          + // TODO : this may need better handling.
          + if(typeInfo.getTypeClass() == Integer.class) {
          — End diff –

          `Row` type isn't basic type. I think may add check `typeInfo.isBasicType()` in this method like as in `#deserialize`. and for not primitive types could return other default values.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ex00 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97734069 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { + Preconditions.checkNotNull(family, "family name"); + Preconditions.checkNotNull(family, "qualifier name"); + Preconditions.checkNotNull(type, "type name"); + List<Pair> list = this.familyMap.get(family); + if (list == null) { + list = new ArrayList<Pair>(); + } + boolean found = false; + for(Class classType : CLASS_TYPES) { + if(classType == type.getTypeClass()) { + found = true; + break; + } + } + if(!found) { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + } + list.add(new Pair(qualifier, type)); + familyMap.put(family, list); + } + + public Map<String, List<Pair>> getFamilyMap() { + return this.familyMap; + } + + public Object deserialize(byte[] value, TypeInformation<?> typeInfo) { + if (typeInfo.isBasicType()) { + if (typeInfo.getTypeClass() == Integer.class) { + return Bytes.toInt(value); + } else if (typeInfo.getTypeClass() == Short.class) { + return Bytes.toShort(value); + } else if (typeInfo.getTypeClass() == Float.class) { + return Bytes.toFloat(value); + } else if (typeInfo.getTypeClass() == Long.class) { + return Bytes.toLong(value); + } else if (typeInfo.getTypeClass() == String.class) { + return Bytes.toString(value); + } else if (typeInfo.getTypeClass() == Byte.class) { + return value[0]; + } else if (typeInfo.getTypeClass() == Boolean.class) { + return Bytes.toBoolean(value); + } else if (typeInfo.getTypeClass() == Double.class) { + return Bytes.toDouble(value); + } else if (typeInfo.getTypeClass() == BigInteger.class) { + return new BigInteger(value); + } else if (typeInfo.getTypeClass() == BigDecimal.class) { + return Bytes.toBigDecimal(value); + } else if (typeInfo.getTypeClass() == Date.class) { + return new Date(Bytes.toLong(value)); + } + } + return value; + } + + public Object deserializeNull(TypeInformation<?> typeInfo) { + // TODO : this may need better handling. + if(typeInfo.getTypeClass() == Integer.class) { — End diff – `Row` type isn't basic type. I think may add check `typeInfo.isBasicType()` in this method like as in `#deserialize`. and for not primitive types could return other default values.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97736554

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          + Preconditions.checkNotNull(family, "family name");
          + Preconditions.checkNotNull(family, "qualifier name");
          + Preconditions.checkNotNull(type, "type name");
          + List<Pair> list = this.familyMap.get(family);
          + if (list == null)

          { + list = new ArrayList<Pair>(); + }

          + boolean found = false;
          + for(Class classType : CLASS_TYPES) {
          + if(classType == type.getTypeClass())

          { + found = true; + break; + }

          + }
          + if(!found)

          { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + }

          + list.add(new Pair(qualifier, type));
          + familyMap.put(family, list);
          + }
          +
          + public Map<String, List<Pair>> getFamilyMap()

          { + return this.familyMap; + }

          +
          + public Object deserialize(byte[] value, TypeInformation<?> typeInfo) {
          + if (typeInfo.isBasicType()) {
          + if (typeInfo.getTypeClass() == Integer.class)

          { + return Bytes.toInt(value); + }

          else if (typeInfo.getTypeClass() == Short.class)

          { + return Bytes.toShort(value); + }

          else if (typeInfo.getTypeClass() == Float.class)

          { + return Bytes.toFloat(value); + }

          else if (typeInfo.getTypeClass() == Long.class)

          { + return Bytes.toLong(value); + }

          else if (typeInfo.getTypeClass() == String.class)

          { + return Bytes.toString(value); + }

          else if (typeInfo.getTypeClass() == Byte.class)

          { + return value[0]; + }

          else if (typeInfo.getTypeClass() == Boolean.class)

          { + return Bytes.toBoolean(value); + }

          else if (typeInfo.getTypeClass() == Double.class)

          { + return Bytes.toDouble(value); + }

          else if (typeInfo.getTypeClass() == BigInteger.class)

          { + return new BigInteger(value); + }

          else if (typeInfo.getTypeClass() == BigDecimal.class)

          { + return Bytes.toBigDecimal(value); + }

          else if (typeInfo.getTypeClass() == Date.class)

          { + return new Date(Bytes.toLong(value)); + }

          + }
          + return value;
          + }
          +
          + public Object deserializeNull(TypeInformation<?> typeInfo) {
          + // TODO : this may need better handling.
          + if(typeInfo.getTypeClass() == Integer.class) {
          — End diff –

          But returning default values may not be right. I just did that here as I was not aware of that NULL handling. When a column does not have any value then it is better we return NULL rather than any default value. @ex00 - What do you think?

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97736554 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { + Preconditions.checkNotNull(family, "family name"); + Preconditions.checkNotNull(family, "qualifier name"); + Preconditions.checkNotNull(type, "type name"); + List<Pair> list = this.familyMap.get(family); + if (list == null) { + list = new ArrayList<Pair>(); + } + boolean found = false; + for(Class classType : CLASS_TYPES) { + if(classType == type.getTypeClass()) { + found = true; + break; + } + } + if(!found) { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; + } + list.add(new Pair(qualifier, type)); + familyMap.put(family, list); + } + + public Map<String, List<Pair>> getFamilyMap() { + return this.familyMap; + } + + public Object deserialize(byte[] value, TypeInformation<?> typeInfo) { + if (typeInfo.isBasicType()) { + if (typeInfo.getTypeClass() == Integer.class) { + return Bytes.toInt(value); + } else if (typeInfo.getTypeClass() == Short.class) { + return Bytes.toShort(value); + } else if (typeInfo.getTypeClass() == Float.class) { + return Bytes.toFloat(value); + } else if (typeInfo.getTypeClass() == Long.class) { + return Bytes.toLong(value); + } else if (typeInfo.getTypeClass() == String.class) { + return Bytes.toString(value); + } else if (typeInfo.getTypeClass() == Byte.class) { + return value[0]; + } else if (typeInfo.getTypeClass() == Boolean.class) { + return Bytes.toBoolean(value); + } else if (typeInfo.getTypeClass() == Double.class) { + return Bytes.toDouble(value); + } else if (typeInfo.getTypeClass() == BigInteger.class) { + return new BigInteger(value); + } else if (typeInfo.getTypeClass() == BigDecimal.class) { + return Bytes.toBigDecimal(value); + } else if (typeInfo.getTypeClass() == Date.class) { + return new Date(Bytes.toLong(value)); + } + } + return value; + } + + public Object deserializeNull(TypeInformation<?> typeInfo) { + // TODO : this may need better handling. + if(typeInfo.getTypeClass() == Integer.class) { — End diff – But returning default values may not be right. I just did that here as I was not aware of that NULL handling. When a column does not have any value then it is better we return NULL rather than any default value. @ex00 - What do you think?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97738061

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,135 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo;
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair>> familyMap =
          + new HashMap<String, List<Pair>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + }

          ;
          + private static byte[] EMPTY_BYTE_ARRAY = new byte[0];
          + public void addColumns(String family, String qualifier, TypeInformation<?> type) {
          + Preconditions.checkNotNull(family, "family name");
          + Preconditions.checkNotNull(family, "qualifier name");
          + Preconditions.checkNotNull(type, "type name");
          + List<Pair> list = this.familyMap.get(family);
          + if (list == null)

          { + list = new ArrayList<Pair>(); + }

          + boolean found = false;
          + for(Class classType : CLASS_TYPES) {
          + if(classType == type.getTypeClass())

          { + found = true; + break; + }

          + }
          + if(!found) {
          + // by default it will be byte[] type only
          + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO;
          — End diff –

          Ok I will add byte[].class into the CLASS_TYPES. And anything other than the ones in CLASS_TYPES we can throw an exception.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97738061 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.BasicArrayTypeInfo; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair>> familyMap = + new HashMap<String, List<Pair>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class + } ; + private static byte[] EMPTY_BYTE_ARRAY = new byte [0] ; + public void addColumns(String family, String qualifier, TypeInformation<?> type) { + Preconditions.checkNotNull(family, "family name"); + Preconditions.checkNotNull(family, "qualifier name"); + Preconditions.checkNotNull(type, "type name"); + List<Pair> list = this.familyMap.get(family); + if (list == null) { + list = new ArrayList<Pair>(); + } + boolean found = false; + for(Class classType : CLASS_TYPES) { + if(classType == type.getTypeClass()) { + found = true; + break; + } + } + if(!found) { + // by default it will be byte[] type only + type = BasicArrayTypeInfo.BYTE_ARRAY_TYPE_INFO; — End diff – Ok I will add byte[].class into the CLASS_TYPES. And anything other than the ones in CLASS_TYPES we can throw an exception.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on the issue:

          https://github.com/apache/flink/pull/3149

          Updated the code with the comments and have pushed again. I think I have addressed all the comments here. Feedback/comments welcome. I also found that it is better to use the TableInputSplit to specify the start and end row so that the scan is anyway restricted to the given range.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 Updated the code with the comments and have pushed again. I think I have addressed all the comments here. Feedback/comments welcome. I also found that it is better to use the TableInputSplit to specify the start and end row so that the scan is anyway restricted to the given range.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on the issue:

          https://github.com/apache/flink/pull/3149

          @fhueske , @tonycox , @wuchong - FYI.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 @fhueske , @tonycox , @wuchong - FYI.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97900030

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,137 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.TypeExtractor;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap =
          + new HashMap<String, List<Pair<String, TypeInformation<?>>>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          — End diff –

          I think, if it's used only in table api would be good java.sql.Date
          cause calcite use sql type of date

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97900030 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.TypeExtractor; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap = + new HashMap<String, List<Pair<String, TypeInformation<?>>>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql — End diff – I think, if it's used only in table api would be good java.sql.Date cause calcite use sql type of date
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97904966

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,248 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSchema;
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.functions.MapFunction;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.tuple.Tuple;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.BeforeClass;
          +import org.junit.Test;
          +
          +import java.util.ArrayList;
          +import java.util.Arrays;
          +import java.util.List;
          +
          +import static org.junit.Assert.assertEquals;
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] F_2 = Bytes.toBytes("f2");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          +
          + @BeforeClass
          + public static void activateHBaseCluster()

          { + registerHBaseMiniClusterInClasspath(); + }

          +
          + @Test
          + public void testHBaseTableSourceWithSingleColumnFamily() throws Exception {
          + // create a table with single region
          + MapFunction<Row, String> mapFunction = new MapFunction<Row, String>() {
          +
          + @Override
          + public String map(Row value) throws Exception

          { + return value == null ? "null" : value.toString(); + }

          + };
          + TableName tableName = TableName.valueOf("test");
          + // no split keys
          + byte[][] famNames = new byte[1][];
          + famNames[0] = F_1;
          + createTable(tableName, famNames, null);
          + // get the htable instance
          + HTable table = openTable(tableName);
          + List<Put> puts = new ArrayList<Put>();
          + // add some data
          + Put put = new Put(ROW_1);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(100));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l));
          + puts.add(put);
          +
          + put = new Put(ROW_2);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(101));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19992l));
          + puts.add(put);
          +
          + put = new Put(ROW_3);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(102));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l));
          + puts.add(put);
          + // add the mutations to the table
          + table.put(puts);
          + table.close();
          + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
          + BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env, new TableConfig());
          + HBaseTableSchema schema = new HBaseTableSchema();
          + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_1), Integer.class);
          + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_2), String.class);
          + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_3), Long.class);
          + // fetch row2 from the table till the end
          + BatchTableSource hbaseTable = new HBaseTableSource(getConf(), tableName.getNameAsString(), schema);
          + tableEnv.registerTableSource("test", hbaseTable);
          + Table result = tableEnv
          + .sql("SELECT test.f1.q1, test.f1.q2, test.f1.q3 FROM test");
          + DataSet<Row> resultSet = tableEnv.toDataSet(result, Row.class);
          + List<Row> results = resultSet.collect();
          +
          + String expected = "100,strvalue,19991\n" +
          + "101,strvalue1,19992\n" +
          + "102,strvalue2,19993\n";
          + compareResult(results, expected, false, true);
          + }
          +
          + @Test
          + public void testHBaseTableSourceWithTwoColumnFamily() throws Exception

          { + // create a table with single region + TableName tableName = TableName.valueOf("test1"); + // no split keys + byte[][] famNames = new byte[2][]; + famNames[0] = F_1; + famNames[1] = F_2; + createTable(tableName, famNames, null); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_2, Q_1, Bytes.toBytes(201)); + //2nd qual is String + put.addColumn(F_2, Q_2, Bytes.toBytes("newvalue1")); + // 3rd qual is long + put.addColumn(F_2, Q_3, Bytes.toBytes(29992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l)); + puts.add(put); + // add the mutations to the table + table.put(puts); + table.close(); + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env, new TableConfig()); + HBaseTableSchema schema = new HBaseTableSchema(); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_1), Integer.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_2), String.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_3), Long.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_1), Integer.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_2), String.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_3), Long.class); + // fetch row2 from the table till the end + BatchTableSource hbaseTable = new HBaseTableSource(getConf(), tableName.getNameAsString(), schema); + tableEnv.registerTableSource("test1", hbaseTable); + Table result = tableEnv + .sql("SELECT test1.f1.q1, test1.f1.q2, test1.f1.q3, test1.f2.q1, test1.f2.q2, test1.f2.q3 FROM test1"); + DataSet<Row> resultSet = tableEnv.toDataSet(result, Row.class); + List<Row> results = resultSet.collect(); + + String expected = "100,strvalue,19991,null,null,null\n" + + "null,null,null,201,newvalue1,29992\n" + + "102,strvalue2,19993,null,null,null\n"; + compareResult(results, expected, false, false); + }

          +
          +
          + static <T> void compareResult(List<T> result, String expected, boolean asTuples, boolean sort) {
          — End diff –

          big copypaste, i think there is a better way - decouple `TestBaseUtils` in two classes - cluster starter and result checker. I will start conversation in dev list

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97904966 — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java — @@ -0,0 +1,248 @@ +/* + * Copyright The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase.example; + +import org.apache.flink.addons.hbase.HBaseTableSchema; +import org.apache.flink.addons.hbase.HBaseTableSource; +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableConfig; +import org.apache.flink.table.api.TableEnvironment; +import org.apache.flink.table.api.java.BatchTableEnvironment; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Put; +import org.apache.hadoop.hbase.util.Bytes; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +import static org.junit.Assert.assertEquals; + +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter { + + public static final byte[] ROW_1 = Bytes.toBytes("row1"); + public static final byte[] ROW_2 = Bytes.toBytes("row2"); + public static final byte[] ROW_3 = Bytes.toBytes("row3"); + public static final byte[] F_1 = Bytes.toBytes("f1"); + public static final byte[] F_2 = Bytes.toBytes("f2"); + public static final byte[] Q_1 = Bytes.toBytes("q1"); + public static final byte[] Q_2 = Bytes.toBytes("q2"); + public static final byte[] Q_3 = Bytes.toBytes("q3"); + + @BeforeClass + public static void activateHBaseCluster() { + registerHBaseMiniClusterInClasspath(); + } + + @Test + public void testHBaseTableSourceWithSingleColumnFamily() throws Exception { + // create a table with single region + MapFunction<Row, String> mapFunction = new MapFunction<Row, String>() { + + @Override + public String map(Row value) throws Exception { + return value == null ? "null" : value.toString(); + } + }; + TableName tableName = TableName.valueOf("test"); + // no split keys + byte[][] famNames = new byte [1] []; + famNames [0] = F_1; + createTable(tableName, famNames, null); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(101)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l)); + puts.add(put); + // add the mutations to the table + table.put(puts); + table.close(); + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env, new TableConfig()); + HBaseTableSchema schema = new HBaseTableSchema(); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_1), Integer.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_2), String.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_3), Long.class); + // fetch row2 from the table till the end + BatchTableSource hbaseTable = new HBaseTableSource(getConf(), tableName.getNameAsString(), schema); + tableEnv.registerTableSource("test", hbaseTable); + Table result = tableEnv + .sql("SELECT test.f1.q1, test.f1.q2, test.f1.q3 FROM test"); + DataSet<Row> resultSet = tableEnv.toDataSet(result, Row.class); + List<Row> results = resultSet.collect(); + + String expected = "100,strvalue,19991\n" + + "101,strvalue1,19992\n" + + "102,strvalue2,19993\n"; + compareResult(results, expected, false, true); + } + + @Test + public void testHBaseTableSourceWithTwoColumnFamily() throws Exception { + // create a table with single region + TableName tableName = TableName.valueOf("test1"); + // no split keys + byte[][] famNames = new byte[2][]; + famNames[0] = F_1; + famNames[1] = F_2; + createTable(tableName, famNames, null); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_2, Q_1, Bytes.toBytes(201)); + //2nd qual is String + put.addColumn(F_2, Q_2, Bytes.toBytes("newvalue1")); + // 3rd qual is long + put.addColumn(F_2, Q_3, Bytes.toBytes(29992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l)); + puts.add(put); + // add the mutations to the table + table.put(puts); + table.close(); + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env, new TableConfig()); + HBaseTableSchema schema = new HBaseTableSchema(); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_1), Integer.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_2), String.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_3), Long.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_1), Integer.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_2), String.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_3), Long.class); + // fetch row2 from the table till the end + BatchTableSource hbaseTable = new HBaseTableSource(getConf(), tableName.getNameAsString(), schema); + tableEnv.registerTableSource("test1", hbaseTable); + Table result = tableEnv + .sql("SELECT test1.f1.q1, test1.f1.q2, test1.f1.q3, test1.f2.q1, test1.f2.q2, test1.f2.q3 FROM test1"); + DataSet<Row> resultSet = tableEnv.toDataSet(result, Row.class); + List<Row> results = resultSet.collect(); + + String expected = "100,strvalue,19991,null,null,null\n" + + "null,null,null,201,newvalue1,29992\n" + + "102,strvalue2,19993,null,null,null\n"; + compareResult(results, expected, false, false); + } + + + static <T> void compareResult(List<T> result, String expected, boolean asTuples, boolean sort) { — End diff – big copypaste, i think there is a better way - decouple `TestBaseUtils` in two classes - cluster starter and result checker. I will start conversation in dev list
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97903962

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,248 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSchema;
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.functions.MapFunction;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.tuple.Tuple;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.BeforeClass;
          +import org.junit.Test;
          +
          +import java.util.ArrayList;
          +import java.util.Arrays;
          +import java.util.List;
          +
          +import static org.junit.Assert.assertEquals;
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] F_2 = Bytes.toBytes("f2");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          +
          + @BeforeClass
          + public static void activateHBaseCluster()

          { + registerHBaseMiniClusterInClasspath(); + }

          +
          + @Test
          + public void testHBaseTableSourceWithSingleColumnFamily() throws Exception {
          + // create a table with single region
          + MapFunction<Row, String> mapFunction = new MapFunction<Row, String>() {
          +
          + @Override
          + public String map(Row value) throws Exception

          { + return value == null ? "null" : value.toString(); + }

          + };
          + TableName tableName = TableName.valueOf("test");
          + // no split keys
          + byte[][] famNames = new byte[1][];
          + famNames[0] = F_1;
          + createTable(tableName, famNames, null);
          + // get the htable instance
          + HTable table = openTable(tableName);
          + List<Put> puts = new ArrayList<Put>();
          + // add some data
          + Put put = new Put(ROW_1);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(100));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l));
          + puts.add(put);
          +
          + put = new Put(ROW_2);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(101));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19992l));
          + puts.add(put);
          +
          + put = new Put(ROW_3);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(102));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l));
          — End diff –

          could you change it to upper L?
          low l sometimes looks like 1

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97903962 — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java — @@ -0,0 +1,248 @@ +/* + * Copyright The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase.example; + +import org.apache.flink.addons.hbase.HBaseTableSchema; +import org.apache.flink.addons.hbase.HBaseTableSource; +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableConfig; +import org.apache.flink.table.api.TableEnvironment; +import org.apache.flink.table.api.java.BatchTableEnvironment; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Put; +import org.apache.hadoop.hbase.util.Bytes; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +import static org.junit.Assert.assertEquals; + +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter { + + public static final byte[] ROW_1 = Bytes.toBytes("row1"); + public static final byte[] ROW_2 = Bytes.toBytes("row2"); + public static final byte[] ROW_3 = Bytes.toBytes("row3"); + public static final byte[] F_1 = Bytes.toBytes("f1"); + public static final byte[] F_2 = Bytes.toBytes("f2"); + public static final byte[] Q_1 = Bytes.toBytes("q1"); + public static final byte[] Q_2 = Bytes.toBytes("q2"); + public static final byte[] Q_3 = Bytes.toBytes("q3"); + + @BeforeClass + public static void activateHBaseCluster() { + registerHBaseMiniClusterInClasspath(); + } + + @Test + public void testHBaseTableSourceWithSingleColumnFamily() throws Exception { + // create a table with single region + MapFunction<Row, String> mapFunction = new MapFunction<Row, String>() { + + @Override + public String map(Row value) throws Exception { + return value == null ? "null" : value.toString(); + } + }; + TableName tableName = TableName.valueOf("test"); + // no split keys + byte[][] famNames = new byte [1] []; + famNames [0] = F_1; + createTable(tableName, famNames, null); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(101)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l)); — End diff – could you change it to upper L? low l sometimes looks like 1
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97903708

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,248 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSchema;
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.functions.MapFunction;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.tuple.Tuple;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.BeforeClass;
          +import org.junit.Test;
          +
          +import java.util.ArrayList;
          +import java.util.Arrays;
          +import java.util.List;
          +
          +import static org.junit.Assert.assertEquals;
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] F_2 = Bytes.toBytes("f2");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          — End diff –

          why public?

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97903708 — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java — @@ -0,0 +1,248 @@ +/* + * Copyright The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase.example; + +import org.apache.flink.addons.hbase.HBaseTableSchema; +import org.apache.flink.addons.hbase.HBaseTableSource; +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableConfig; +import org.apache.flink.table.api.TableEnvironment; +import org.apache.flink.table.api.java.BatchTableEnvironment; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Put; +import org.apache.hadoop.hbase.util.Bytes; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +import static org.junit.Assert.assertEquals; + +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter { + + public static final byte[] ROW_1 = Bytes.toBytes("row1"); + public static final byte[] ROW_2 = Bytes.toBytes("row2"); + public static final byte[] ROW_3 = Bytes.toBytes("row3"); + public static final byte[] F_1 = Bytes.toBytes("f1"); + public static final byte[] F_2 = Bytes.toBytes("f2"); + public static final byte[] Q_1 = Bytes.toBytes("q1"); + public static final byte[] Q_2 = Bytes.toBytes("q2"); + public static final byte[] Q_3 = Bytes.toBytes("q3"); — End diff – why public?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97898531

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,137 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.TypeExtractor;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap =
          + new HashMap<String, List<Pair<String, TypeInformation<?>>>>();
          — End diff –

          could you just put `<>` here instead of <String, List<Pair<String, TypeInformation<?>>>>?

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97898531 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.TypeExtractor; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap = + new HashMap<String, List<Pair<String, TypeInformation<?>>>>(); — End diff – could you just put `<>` here instead of <String, List<Pair<String, TypeInformation<?>>>>?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97901774

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -0,0 +1,65 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.conf.Configuration;
          +
          +/**
          + * Creates a table source that helps to scan data from an hbase table
          + *
          + * Note : the colNames are specified along with a familyName and they are seperated by a ':'
          + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
          + */
          +// TODO : Implement ProjectableTableSource?
          +public class HBaseTableSource implements BatchTableSource<Row> {
          — End diff –

          will be `StreamTableSource` okay for HBase source?
          I think it can be implemented same as batch `getDataSet`

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97901774 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.conf.Configuration; + +/** + * Creates a table source that helps to scan data from an hbase table + * + * Note : the colNames are specified along with a familyName and they are seperated by a ':' + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name + */ +// TODO : Implement ProjectableTableSource? +public class HBaseTableSource implements BatchTableSource<Row> { — End diff – will be `StreamTableSource` okay for HBase source? I think it can be implemented same as batch `getDataSet`
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tonycox commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97897602

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,137 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.TypeExtractor;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap =
          + new HashMap<String, List<Pair<String, TypeInformation<?>>>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class, byte[].class + }

          ;
          + /**
          + * Allows specifying the family and qualifier name along with the data type of the qualifier for an HBase table
          + *
          + * @param family the family name
          + * @param qualifier the qualifier name
          + * @param clazz the data type of the qualifier
          + */
          + public void addColumn(String family, String qualifier, Class<?> clazz) {
          + Preconditions.checkNotNull(family, "family name");
          + Preconditions.checkNotNull(family, "qualifier name");
          — End diff –

          must be qualifier

          Show
          githubbot ASF GitHub Bot added a comment - Github user tonycox commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97897602 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.TypeExtractor; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap = + new HashMap<String, List<Pair<String, TypeInformation<?>>>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class, byte[].class + } ; + /** + * Allows specifying the family and qualifier name along with the data type of the qualifier for an HBase table + * + * @param family the family name + * @param qualifier the qualifier name + * @param clazz the data type of the qualifier + */ + public void addColumn(String family, String qualifier, Class<?> clazz) { + Preconditions.checkNotNull(family, "family name"); + Preconditions.checkNotNull(family, "qualifier name"); — End diff – must be qualifier
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user wuchong commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97927186

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,248 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSchema;
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.functions.MapFunction;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.tuple.Tuple;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.BeforeClass;
          +import org.junit.Test;
          +
          +import java.util.ArrayList;
          +import java.util.Arrays;
          +import java.util.List;
          +
          +import static org.junit.Assert.assertEquals;
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] F_2 = Bytes.toBytes("f2");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          +
          + @BeforeClass
          + public static void activateHBaseCluster()

          { + registerHBaseMiniClusterInClasspath(); + }

          +
          + @Test
          + public void testHBaseTableSourceWithSingleColumnFamily() throws Exception {
          + // create a table with single region
          + MapFunction<Row, String> mapFunction = new MapFunction<Row, String>() {
          +
          + @Override
          + public String map(Row value) throws Exception

          { + return value == null ? "null" : value.toString(); + }

          + };
          + TableName tableName = TableName.valueOf("test");
          + // no split keys
          + byte[][] famNames = new byte[1][];
          + famNames[0] = F_1;
          + createTable(tableName, famNames, null);
          + // get the htable instance
          + HTable table = openTable(tableName);
          + List<Put> puts = new ArrayList<Put>();
          + // add some data
          + Put put = new Put(ROW_1);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(100));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l));
          + puts.add(put);
          +
          + put = new Put(ROW_2);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(101));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19992l));
          + puts.add(put);
          +
          + put = new Put(ROW_3);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(102));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l));
          + puts.add(put);
          + // add the mutations to the table
          + table.put(puts);
          + table.close();
          + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
          + BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env, new TableConfig());
          + HBaseTableSchema schema = new HBaseTableSchema();
          + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_1), Integer.class);
          + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_2), String.class);
          + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_3), Long.class);
          + // fetch row2 from the table till the end
          + BatchTableSource hbaseTable = new HBaseTableSource(getConf(), tableName.getNameAsString(), schema);
          + tableEnv.registerTableSource("test", hbaseTable);
          + Table result = tableEnv
          + .sql("SELECT test.f1.q1, test.f1.q2, test.f1.q3 FROM test");
          + DataSet<Row> resultSet = tableEnv.toDataSet(result, Row.class);
          + List<Row> results = resultSet.collect();
          +
          + String expected = "100,strvalue,19991\n" +
          + "101,strvalue1,19992\n" +
          + "102,strvalue2,19993\n";
          + compareResult(results, expected, false, true);
          + }
          +
          + @Test
          + public void testHBaseTableSourceWithTwoColumnFamily() throws Exception

          { + // create a table with single region + TableName tableName = TableName.valueOf("test1"); + // no split keys + byte[][] famNames = new byte[2][]; + famNames[0] = F_1; + famNames[1] = F_2; + createTable(tableName, famNames, null); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_2, Q_1, Bytes.toBytes(201)); + //2nd qual is String + put.addColumn(F_2, Q_2, Bytes.toBytes("newvalue1")); + // 3rd qual is long + put.addColumn(F_2, Q_3, Bytes.toBytes(29992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l)); + puts.add(put); + // add the mutations to the table + table.put(puts); + table.close(); + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env, new TableConfig()); + HBaseTableSchema schema = new HBaseTableSchema(); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_1), Integer.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_2), String.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_3), Long.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_1), Integer.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_2), String.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_3), Long.class); + // fetch row2 from the table till the end + BatchTableSource hbaseTable = new HBaseTableSource(getConf(), tableName.getNameAsString(), schema); + tableEnv.registerTableSource("test1", hbaseTable); + Table result = tableEnv + .sql("SELECT test1.f1.q1, test1.f1.q2, test1.f1.q3, test1.f2.q1, test1.f2.q2, test1.f2.q3 FROM test1"); + DataSet<Row> resultSet = tableEnv.toDataSet(result, Row.class); + List<Row> results = resultSet.collect(); + + String expected = "100,strvalue,19991,null,null,null\n" + + "null,null,null,201,newvalue1,29992\n" + + "102,strvalue2,19993,null,null,null\n"; + compareResult(results, expected, false, false); + }

          +
          +
          + static <T> void compareResult(List<T> result, String expected, boolean asTuples, boolean sort) {
          — End diff –

          @tonycox is right. This is a copypaste. But the methods in `TestBaseUtils` are static, we do not need to decouple it. We can use it directly like following:

          ```java
          TestBaseUtils.compareResultAsText(results, expected);
          ```

          Of course, we should add the `flink-test-utils` dependency first.

          ```
          <dependency>
          <groupId>org.apache.flink</groupId>
          <artifactId>flink-test-utils_2.10</artifactId>
          <version>$

          {project.version}

          </version>
          <scope>test</scope>
          </dependency>
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user wuchong commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97927186 — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java — @@ -0,0 +1,248 @@ +/* + * Copyright The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase.example; + +import org.apache.flink.addons.hbase.HBaseTableSchema; +import org.apache.flink.addons.hbase.HBaseTableSource; +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableConfig; +import org.apache.flink.table.api.TableEnvironment; +import org.apache.flink.table.api.java.BatchTableEnvironment; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Put; +import org.apache.hadoop.hbase.util.Bytes; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +import static org.junit.Assert.assertEquals; + +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter { + + public static final byte[] ROW_1 = Bytes.toBytes("row1"); + public static final byte[] ROW_2 = Bytes.toBytes("row2"); + public static final byte[] ROW_3 = Bytes.toBytes("row3"); + public static final byte[] F_1 = Bytes.toBytes("f1"); + public static final byte[] F_2 = Bytes.toBytes("f2"); + public static final byte[] Q_1 = Bytes.toBytes("q1"); + public static final byte[] Q_2 = Bytes.toBytes("q2"); + public static final byte[] Q_3 = Bytes.toBytes("q3"); + + @BeforeClass + public static void activateHBaseCluster() { + registerHBaseMiniClusterInClasspath(); + } + + @Test + public void testHBaseTableSourceWithSingleColumnFamily() throws Exception { + // create a table with single region + MapFunction<Row, String> mapFunction = new MapFunction<Row, String>() { + + @Override + public String map(Row value) throws Exception { + return value == null ? "null" : value.toString(); + } + }; + TableName tableName = TableName.valueOf("test"); + // no split keys + byte[][] famNames = new byte [1] []; + famNames [0] = F_1; + createTable(tableName, famNames, null); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(101)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l)); + puts.add(put); + // add the mutations to the table + table.put(puts); + table.close(); + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env, new TableConfig()); + HBaseTableSchema schema = new HBaseTableSchema(); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_1), Integer.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_2), String.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_3), Long.class); + // fetch row2 from the table till the end + BatchTableSource hbaseTable = new HBaseTableSource(getConf(), tableName.getNameAsString(), schema); + tableEnv.registerTableSource("test", hbaseTable); + Table result = tableEnv + .sql("SELECT test.f1.q1, test.f1.q2, test.f1.q3 FROM test"); + DataSet<Row> resultSet = tableEnv.toDataSet(result, Row.class); + List<Row> results = resultSet.collect(); + + String expected = "100,strvalue,19991\n" + + "101,strvalue1,19992\n" + + "102,strvalue2,19993\n"; + compareResult(results, expected, false, true); + } + + @Test + public void testHBaseTableSourceWithTwoColumnFamily() throws Exception { + // create a table with single region + TableName tableName = TableName.valueOf("test1"); + // no split keys + byte[][] famNames = new byte[2][]; + famNames[0] = F_1; + famNames[1] = F_2; + createTable(tableName, famNames, null); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_2, Q_1, Bytes.toBytes(201)); + //2nd qual is String + put.addColumn(F_2, Q_2, Bytes.toBytes("newvalue1")); + // 3rd qual is long + put.addColumn(F_2, Q_3, Bytes.toBytes(29992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l)); + puts.add(put); + // add the mutations to the table + table.put(puts); + table.close(); + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env, new TableConfig()); + HBaseTableSchema schema = new HBaseTableSchema(); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_1), Integer.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_2), String.class); + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_3), Long.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_1), Integer.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_2), String.class); + schema.addColumn(Bytes.toString(F_2), Bytes.toString(Q_3), Long.class); + // fetch row2 from the table till the end + BatchTableSource hbaseTable = new HBaseTableSource(getConf(), tableName.getNameAsString(), schema); + tableEnv.registerTableSource("test1", hbaseTable); + Table result = tableEnv + .sql("SELECT test1.f1.q1, test1.f1.q2, test1.f1.q3, test1.f2.q1, test1.f2.q2, test1.f2.q3 FROM test1"); + DataSet<Row> resultSet = tableEnv.toDataSet(result, Row.class); + List<Row> results = resultSet.collect(); + + String expected = "100,strvalue,19991,null,null,null\n" + + "null,null,null,201,newvalue1,29992\n" + + "102,strvalue2,19993,null,null,null\n"; + compareResult(results, expected, false, false); + } + + + static <T> void compareResult(List<T> result, String expected, boolean asTuples, boolean sort) { — End diff – @tonycox is right. This is a copypaste. But the methods in `TestBaseUtils` are static, we do not need to decouple it. We can use it directly like following: ```java TestBaseUtils.compareResultAsText(results, expected); ``` Of course, we should add the `flink-test-utils` dependency first. ``` <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-test-utils_2.10</artifactId> <version>$ {project.version} </version> <scope>test</scope> </dependency> ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97934406

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,137 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.TypeExtractor;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap =
          + new HashMap<String, List<Pair<String, TypeInformation<?>>>>();
          — End diff –

          Ok. In our other projects we used to qualify the generic on both the sides.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97934406 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.TypeExtractor; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap = + new HashMap<String, List<Pair<String, TypeInformation<?>>>>(); — End diff – Ok. In our other projects we used to qualify the generic on both the sides.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97934428

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,137 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.TypeExtractor;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap =
          + new HashMap<String, List<Pair<String, TypeInformation<?>>>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          — End diff –

          Ok. Makes sense.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97934428 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.TypeExtractor; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap = + new HashMap<String, List<Pair<String, TypeInformation<?>>>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql — End diff – Ok. Makes sense.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97934446

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java —
          @@ -0,0 +1,137 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.typeutils.TypeExtractor;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.apache.hadoop.hbase.util.Pair;
          +
          +import java.io.Serializable;
          +import java.math.BigDecimal;
          +import java.math.BigInteger;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.HashMap;
          +import java.util.ArrayList;
          +import java.util.Date;
          +
          +/**
          + * Helps to specify an HBase Table's schema
          + */
          +public class HBaseTableSchema implements Serializable {
          +
          + // A Map with key as column family.
          + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap =
          + new HashMap<String, List<Pair<String, TypeInformation<?>>>>();
          +
          + // Allowed types. This may change.
          + // TODO : Check if the Date type should be the one in java.util or the one in java.sql
          + private static Class[] CLASS_TYPES =

          { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class, byte[].class + }

          ;
          + /**
          + * Allows specifying the family and qualifier name along with the data type of the qualifier for an HBase table
          + *
          + * @param family the family name
          + * @param qualifier the qualifier name
          + * @param clazz the data type of the qualifier
          + */
          + public void addColumn(String family, String qualifier, Class<?> clazz) {
          + Preconditions.checkNotNull(family, "family name");
          + Preconditions.checkNotNull(family, "qualifier name");
          — End diff –

          Good catch.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97934446 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSchema.java — @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.TypeExtractor; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.hbase.util.Pair; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.util.List; +import java.util.Map; +import java.util.HashMap; +import java.util.ArrayList; +import java.util.Date; + +/** + * Helps to specify an HBase Table's schema + */ +public class HBaseTableSchema implements Serializable { + + // A Map with key as column family. + private final Map<String, List<Pair<String, TypeInformation<?>>>> familyMap = + new HashMap<String, List<Pair<String, TypeInformation<?>>>>(); + + // Allowed types. This may change. + // TODO : Check if the Date type should be the one in java.util or the one in java.sql + private static Class[] CLASS_TYPES = { + Integer.class, Short.class, Float.class, Long.class, String.class, Byte.class, Boolean.class, Double.class, BigInteger.class, BigDecimal.class, Date.class, byte[].class + } ; + /** + * Allows specifying the family and qualifier name along with the data type of the qualifier for an HBase table + * + * @param family the family name + * @param qualifier the qualifier name + * @param clazz the data type of the qualifier + */ + public void addColumn(String family, String qualifier, Class<?> clazz) { + Preconditions.checkNotNull(family, "family name"); + Preconditions.checkNotNull(family, "qualifier name"); — End diff – Good catch.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97934495

          — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java —
          @@ -0,0 +1,65 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase;
          +
          +import org.apache.flink.api.common.typeinfo.TypeInformation;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.typeutils.RowTypeInfo;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.flink.util.Preconditions;
          +import org.apache.hadoop.conf.Configuration;
          +
          +/**
          + * Creates a table source that helps to scan data from an hbase table
          + *
          + * Note : the colNames are specified along with a familyName and they are seperated by a ':'
          + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name
          + */
          +// TODO : Implement ProjectableTableSource?
          +public class HBaseTableSource implements BatchTableSource<Row> {
          — End diff –

          Am not sure.. For now I think we will implement BatchTableSource only and later implement StreamTableSource? Is there any significant design expectation for a source to be StreamTableSource?

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97934495 — Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/HBaseTableSource.java — @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase; + +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.flink.util.Preconditions; +import org.apache.hadoop.conf.Configuration; + +/** + * Creates a table source that helps to scan data from an hbase table + * + * Note : the colNames are specified along with a familyName and they are seperated by a ':' + * For eg, cf1:q1 - where cf1 is the familyName and q1 is the qualifier name + */ +// TODO : Implement ProjectableTableSource? +public class HBaseTableSource implements BatchTableSource<Row> { — End diff – Am not sure.. For now I think we will implement BatchTableSource only and later implement StreamTableSource? Is there any significant design expectation for a source to be StreamTableSource?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97934502

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,248 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSchema;
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.functions.MapFunction;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.tuple.Tuple;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.BeforeClass;
          +import org.junit.Test;
          +
          +import java.util.ArrayList;
          +import java.util.Arrays;
          +import java.util.List;
          +
          +import static org.junit.Assert.assertEquals;
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] F_2 = Bytes.toBytes("f2");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          — End diff –

          ya . My bad. Will remove.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97934502 — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java — @@ -0,0 +1,248 @@ +/* + * Copyright The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase.example; + +import org.apache.flink.addons.hbase.HBaseTableSchema; +import org.apache.flink.addons.hbase.HBaseTableSource; +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableConfig; +import org.apache.flink.table.api.TableEnvironment; +import org.apache.flink.table.api.java.BatchTableEnvironment; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Put; +import org.apache.hadoop.hbase.util.Bytes; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +import static org.junit.Assert.assertEquals; + +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter { + + public static final byte[] ROW_1 = Bytes.toBytes("row1"); + public static final byte[] ROW_2 = Bytes.toBytes("row2"); + public static final byte[] ROW_3 = Bytes.toBytes("row3"); + public static final byte[] F_1 = Bytes.toBytes("f1"); + public static final byte[] F_2 = Bytes.toBytes("f2"); + public static final byte[] Q_1 = Bytes.toBytes("q1"); + public static final byte[] Q_2 = Bytes.toBytes("q2"); + public static final byte[] Q_3 = Bytes.toBytes("q3"); — End diff – ya . My bad. Will remove.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97934510

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,248 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSchema;
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.functions.MapFunction;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.tuple.Tuple;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.BeforeClass;
          +import org.junit.Test;
          +
          +import java.util.ArrayList;
          +import java.util.Arrays;
          +import java.util.List;
          +
          +import static org.junit.Assert.assertEquals;
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] F_2 = Bytes.toBytes("f2");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          +
          + @BeforeClass
          + public static void activateHBaseCluster()

          { + registerHBaseMiniClusterInClasspath(); + }

          +
          + @Test
          + public void testHBaseTableSourceWithSingleColumnFamily() throws Exception {
          + // create a table with single region
          + MapFunction<Row, String> mapFunction = new MapFunction<Row, String>() {
          +
          + @Override
          + public String map(Row value) throws Exception

          { + return value == null ? "null" : value.toString(); + }

          + };
          + TableName tableName = TableName.valueOf("test");
          + // no split keys
          + byte[][] famNames = new byte[1][];
          + famNames[0] = F_1;
          + createTable(tableName, famNames, null);
          + // get the htable instance
          + HTable table = openTable(tableName);
          + List<Put> puts = new ArrayList<Put>();
          + // add some data
          + Put put = new Put(ROW_1);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(100));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l));
          + puts.add(put);
          +
          + put = new Put(ROW_2);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(101));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19992l));
          + puts.add(put);
          +
          + put = new Put(ROW_3);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(102));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l));
          — End diff –

          ok

          Show
          githubbot ASF GitHub Bot added a comment - Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/flink/pull/3149#discussion_r97934510 — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java — @@ -0,0 +1,248 @@ +/* + * Copyright The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.addons.hbase.example; + +import org.apache.flink.addons.hbase.HBaseTableSchema; +import org.apache.flink.addons.hbase.HBaseTableSource; +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableConfig; +import org.apache.flink.table.api.TableEnvironment; +import org.apache.flink.table.api.java.BatchTableEnvironment; +import org.apache.flink.table.sources.BatchTableSource; +import org.apache.flink.types.Row; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HTable; +import org.apache.hadoop.hbase.client.Put; +import org.apache.hadoop.hbase.util.Bytes; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +import static org.junit.Assert.assertEquals; + +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter { + + public static final byte[] ROW_1 = Bytes.toBytes("row1"); + public static final byte[] ROW_2 = Bytes.toBytes("row2"); + public static final byte[] ROW_3 = Bytes.toBytes("row3"); + public static final byte[] F_1 = Bytes.toBytes("f1"); + public static final byte[] F_2 = Bytes.toBytes("f2"); + public static final byte[] Q_1 = Bytes.toBytes("q1"); + public static final byte[] Q_2 = Bytes.toBytes("q2"); + public static final byte[] Q_3 = Bytes.toBytes("q3"); + + @BeforeClass + public static void activateHBaseCluster() { + registerHBaseMiniClusterInClasspath(); + } + + @Test + public void testHBaseTableSourceWithSingleColumnFamily() throws Exception { + // create a table with single region + MapFunction<Row, String> mapFunction = new MapFunction<Row, String>() { + + @Override + public String map(Row value) throws Exception { + return value == null ? "null" : value.toString(); + } + }; + TableName tableName = TableName.valueOf("test"); + // no split keys + byte[][] famNames = new byte [1] []; + famNames [0] = F_1; + createTable(tableName, famNames, null); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(101)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l)); — End diff – ok
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ramkrish86 commented on a diff in the pull request:

          https://github.com/apache/flink/pull/3149#discussion_r97934537

          — Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseTableSourceITCase.java —
          @@ -0,0 +1,248 @@
          +/*
          + * Copyright The Apache Software Foundation
          + *
          + * Licensed to the Apache Software Foundation (ASF) under one
          + * or more contributor license agreements. See the NOTICE file
          + * distributed with this work for additional information
          + * regarding copyright ownership. The ASF licenses this file
          + * to you under the Apache License, Version 2.0 (the
          + * "License"); you may not use this file except in compliance
          + * with the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.flink.addons.hbase.example;
          +
          +import org.apache.flink.addons.hbase.HBaseTableSchema;
          +import org.apache.flink.addons.hbase.HBaseTableSource;
          +import org.apache.flink.addons.hbase.HBaseTestingClusterAutostarter;
          +import org.apache.flink.api.common.functions.MapFunction;
          +import org.apache.flink.api.java.DataSet;
          +import org.apache.flink.api.java.ExecutionEnvironment;
          +import org.apache.flink.api.java.tuple.Tuple;
          +import org.apache.flink.table.api.Table;
          +import org.apache.flink.table.api.TableConfig;
          +import org.apache.flink.table.api.TableEnvironment;
          +import org.apache.flink.table.api.java.BatchTableEnvironment;
          +import org.apache.flink.table.sources.BatchTableSource;
          +import org.apache.flink.types.Row;
          +import org.apache.hadoop.hbase.TableName;
          +import org.apache.hadoop.hbase.client.HTable;
          +import org.apache.hadoop.hbase.client.Put;
          +import org.apache.hadoop.hbase.util.Bytes;
          +import org.junit.BeforeClass;
          +import org.junit.Test;
          +
          +import java.util.ArrayList;
          +import java.util.Arrays;
          +import java.util.List;
          +
          +import static org.junit.Assert.assertEquals;
          +
          +public class HBaseTableSourceITCase extends HBaseTestingClusterAutostarter {
          +
          + public static final byte[] ROW_1 = Bytes.toBytes("row1");
          + public static final byte[] ROW_2 = Bytes.toBytes("row2");
          + public static final byte[] ROW_3 = Bytes.toBytes("row3");
          + public static final byte[] F_1 = Bytes.toBytes("f1");
          + public static final byte[] F_2 = Bytes.toBytes("f2");
          + public static final byte[] Q_1 = Bytes.toBytes("q1");
          + public static final byte[] Q_2 = Bytes.toBytes("q2");
          + public static final byte[] Q_3 = Bytes.toBytes("q3");
          +
          + @BeforeClass
          + public static void activateHBaseCluster()

          { + registerHBaseMiniClusterInClasspath(); + }

          +
          + @Test
          + public void testHBaseTableSourceWithSingleColumnFamily() throws Exception {
          + // create a table with single region
          + MapFunction<Row, String> mapFunction = new MapFunction<Row, String>() {
          +
          + @Override
          + public String map(Row value) throws Exception

          { + return value == null ? "null" : value.toString(); + }

          + };
          + TableName tableName = TableName.valueOf("test");
          + // no split keys
          + byte[][] famNames = new byte[1][];
          + famNames[0] = F_1;
          + createTable(tableName, famNames, null);
          + // get the htable instance
          + HTable table = openTable(tableName);
          + List<Put> puts = new ArrayList<Put>();
          + // add some data
          + Put put = new Put(ROW_1);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(100));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l));
          + puts.add(put);
          +
          + put = new Put(ROW_2);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(101));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue1"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19992l));
          + puts.add(put);
          +
          + put = new Put(ROW_3);
          + // add 3 qualifiers per row
          + //1st qual is integer
          + put.addColumn(F_1, Q_1, Bytes.toBytes(102));
          + //2nd qual is String
          + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue2"));
          + // 3rd qual is long
          + put.addColumn(F_1, Q_3, Bytes.toBytes(19993l));
          + puts.add(put);
          + // add the mutations to the table
          + table.put(puts);
          + table.close();
          + ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
          + BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env, new TableConfig());
          + HBaseTableSchema schema = new HBaseTableSchema();
          + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_1), Integer.class);
          + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_2), String.class);
          + schema.addColumn(Bytes.toString(F_1), Bytes.toString(Q_3), Long.class);
          + // fetch row2 from the table till the end
          + BatchTableSource hbaseTable = new HBaseTableSource(getConf(), tableName.getNameAsString(), schema);
          + tableEnv.registerTableSource("test", hbaseTable);
          + Table result = tableEnv
          + .sql("SELECT test.f1.q1, test.f1.q2, test.f1.q3 FROM test");
          + DataSet<Row> resultSet = tableEnv.toDataSet(result, Row.class);
          + List<Row> results = resultSet.collect();
          +
          + String expected = "100,strvalue,19991\n" +
          + "101,strvalue1,19992\n" +
          + "102,strvalue2,19993\n";
          + compareResult(results, expected, false, true);
          + }
          +
          + @Test
          + public void testHBaseTableSourceWithTwoColumnFamily() throws Exception

          { + // create a table with single region + TableName tableName = TableName.valueOf("test1"); + // no split keys + byte[][] famNames = new byte[2][]; + famNames[0] = F_1; + famNames[1] = F_2; + createTable(tableName, famNames, null); + // get the htable instance + HTable table = openTable(tableName); + List<Put> puts = new ArrayList<Put>(); + // add some data + Put put = new Put(ROW_1); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(100)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strvalue")); + // 3rd qual is long + put.addColumn(F_1, Q_3, Bytes.toBytes(19991l)); + puts.add(put); + + put = new Put(ROW_2); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_2, Q_1, Bytes.toBytes(201)); + //2nd qual is String + put.addColumn(F_2, Q_2, Bytes.toBytes("newvalue1")); + // 3rd qual is long + put.addColumn(F_2, Q_3, Bytes.toBytes(29992l)); + puts.add(put); + + put = new Put(ROW_3); + // add 3 qualifiers per row + //1st qual is integer + put.addColumn(F_1, Q_1, Bytes.toBytes(102)); + //2nd qual is String + put.addColumn(F_1, Q_2, Bytes.toBytes("strv