Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10464

[HBaseIO] - Protocol message was too large. May be malicious.

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • P2
    • Resolution: Workaround
    • None
    • Not applicable
    • io-java-hbase
    • None

    Description

      Hi! I just got the following error perfoming a HBaseIO.read() from scan. 

      Caused by: org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.shaded.com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) at org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) at org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) at org.apache.hadoop.hbase.shaded.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4694) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.<init>(ClientProtos.java:4658) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4767) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result$1.parsePartialFrom(ClientProtos.java:4762) at org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) at org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241) at org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253) at org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259) at org.apache.hadoop.hbase.shaded.com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Result.parseDelimitedFrom(ClientProtos.java:5131) at org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:50) at org.apache.beam.sdk.io.hbase.HBaseResultCoder.decode(HBaseResultCoder.java:34) at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:602) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:593) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:539) at org.apache.beam.runners.spark.translation.ValueAndCoderLazySerializable.getOrDecode(ValueAndCoderLazySerializable.java:73) ... 61 more
      

      It seems I'm scanning a family column bigger than 64MB, but HBaseIO doesn't provide any workaround to change the current sizeLimit of the protobuf decoder. How should we manage Big Data Datasets stored in HBase?

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            TheMatrix Marc Catrisse
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment