Flume
  1. Flume
  2. FLUME-1860

Remove transaction capacity from Memory and File channels

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Transaction Capacity was primarily meant to be a memory safeguard. It ensures that we don't have queues which are massive and can cause OOMs. I wonder if there is a way of fixing this and making sure a malicious RpcClient cannot cause OOMs by sending batches of huge sizes

        Issue Links

          Activity

          Hide
          Connor Woodson added a comment -

          On the sink side you have a max-batch-size; would putting that option/enforcing it on the source side as well solve the issue?

          Show
          Connor Woodson added a comment - On the sink side you have a max-batch-size; would putting that option/enforcing it on the source side as well solve the issue?
          Hide
          Hari Shreedharan added a comment -

          But doing something like this would create a tight coupling between config params (it already does, the sink's batch size has to be <= the next hop's transaction capacity). I don't really know if there is a way to resolve this issue without such a tight coupling, but it would be nice if there was.

          Show
          Hari Shreedharan added a comment - But doing something like this would create a tight coupling between config params (it already does, the sink's batch size has to be <= the next hop's transaction capacity). I don't really know if there is a way to resolve this issue without such a tight coupling, but it would be nice if there was.
          Hide
          Mike Percy added a comment -

          Is it that bad to have a transaction capacity? Having sanity checks is typically a good thing. If we keep them, I'd advocate for bumping the default to 10,000 and clarifying the error messages to be very clear on how to resolve the configuration issue.

          Show
          Mike Percy added a comment - Is it that bad to have a transaction capacity? Having sanity checks is typically a good thing. If we keep them, I'd advocate for bumping the default to 10,000 and clarifying the error messages to be very clear on how to resolve the configuration issue.
          Hide
          Hari Shreedharan added a comment -

          No, having a transaction capacity is not bad - it is a good sanity check, I am wondering if there is a better way to have a sanity check than having a parameter which is coupled with another config param on a previous agent.

          Show
          Hari Shreedharan added a comment - No, having a transaction capacity is not bad - it is a good sanity check, I am wondering if there is a better way to have a sanity check than having a parameter which is coupled with another config param on a previous agent.
          Hide
          Brock Noland added a comment - - edited

          I like bumping it to 10K so it almost never gets in the way of normal use. It serves it's purpose as a sanity check and otherwise doesn't bother the humans.

          Show
          Brock Noland added a comment - - edited I like bumping it to 10K so it almost never gets in the way of normal use. It serves it's purpose as a sanity check and otherwise doesn't bother the humans.
          Hide
          Hari Shreedharan added a comment -

          +1 on 10K as default for File Channel. For MemoryChannel ?

          Show
          Hari Shreedharan added a comment - +1 on 10K as default for File Channel. For MemoryChannel ?
          Hide
          Mike Percy added a comment -

          +1 on 10K for both. IMHO we are protecting against doing a batch of MAX_INT or something like that.

          Show
          Mike Percy added a comment - +1 on 10K for both. IMHO we are protecting against doing a batch of MAX_INT or something like that.
          Hide
          Roshan Naik added a comment -

          Currently the transaction capacity is used by the (memory) channel to size the put/take lists. It is usually unnecessary to size them any larger than the batchsize of the source or sink that is instantiating the transaction.

          It would be useful for the transaction to be given a batch size hint at the time of createTransaction().
          Right now we dont have this ability to pass such a hint to the transaction. I think it would be worthwhile tackling this issue as well along with this.

          Show
          Roshan Naik added a comment - Currently the transaction capacity is used by the (memory) channel to size the put/take lists. It is usually unnecessary to size them any larger than the batchsize of the source or sink that is instantiating the transaction. It would be useful for the transaction to be given a batch size hint at the time of createTransaction(). Right now we dont have this ability to pass such a hint to the transaction. I think it would be worthwhile tackling this issue as well along with this.
          Hide
          Juhani Connolly added a comment -

          Roshan: The transaction size is only used as a limit to how large the queue can get, it doesn't allocate that memory in advance. The only thing giving the batch size would help with would be allowing us to fail early where batch sizes are > transaction size.

          Hari: I think the larger issue with RPC clients is the sources/protocols themselves. The thrift in scribesource will OOM you if you send it a bad packet without anything ever reaching the channel. AvroSource is somewhat better behaved in that a huge batch is detected and flagged as an error without an attempt to allocate.

          Show
          Juhani Connolly added a comment - Roshan: The transaction size is only used as a limit to how large the queue can get, it doesn't allocate that memory in advance. The only thing giving the batch size would help with would be allowing us to fail early where batch sizes are > transaction size. Hari: I think the larger issue with RPC clients is the sources/protocols themselves. The thrift in scribesource will OOM you if you send it a bad packet without anything ever reaching the channel. AvroSource is somewhat better behaved in that a huge batch is detected and flagged as an error without an attempt to allocate.
          Hide
          Roshan Naik added a comment -

          Juhani..my bad, i was mistaken. Nevertheless..

          Knowing the batch size could be used for optimizations in the transaction. For instance the put/take lists in the memory channel can be switched from a Linked list to an array based list which requires single memory allocation (batch sized) instead of one allocation for each put().

          Also as I am implementing the spillable channel FLUME-1227, i am also finding batchSize useful for deciding whether all the put()s in a transaction should go into primary memory channel or into the overflow channel .. basically a put transaction cannot span both channels (to avoid splitting into multiple transactions) so there is a need for making such a decision up front. Currently i am using the transactionCapacity as a workaround to make this decision.

          Show
          Roshan Naik added a comment - Juhani..my bad, i was mistaken. Nevertheless.. Knowing the batch size could be used for optimizations in the transaction. For instance the put/take lists in the memory channel can be switched from a Linked list to an array based list which requires single memory allocation (batch sized) instead of one allocation for each put(). Also as I am implementing the spillable channel FLUME-1227 , i am also finding batchSize useful for deciding whether all the put()s in a transaction should go into primary memory channel or into the overflow channel .. basically a put transaction cannot span both channels (to avoid splitting into multiple transactions) so there is a need for making such a decision up front. Currently i am using the transactionCapacity as a workaround to make this decision.
          Hide
          Juhani Connolly added a comment -

          Hmm, I suspect with some careful coding you could replace the transaction buffer with an array. The preconditions in MemoryChannel are pretty strict to make sure rollback is always possible, you'll need to watch out for that if you choose to switch to using an array.

          As to giving a hint of the batch size... I suppose it could be done but it would add stronger coupling between components as they would need a heightened awareness of one another.

          Show
          Juhani Connolly added a comment - Hmm, I suspect with some careful coding you could replace the transaction buffer with an array. The preconditions in MemoryChannel are pretty strict to make sure rollback is always possible, you'll need to watch out for that if you choose to switch to using an array. As to giving a hint of the batch size... I suppose it could be done but it would add stronger coupling between components as they would need a heightened awareness of one another.
          Hide
          Roshan Naik added a comment -

          I dont see any tight coupling issues if we could pass the batch size at time of creation of the transaction.( and not at config time). So the sinks and sources would instantiate a transaction based on their respective batch sizes.

          This argument would indicate the max # of events that particular transaction instance would accept.

          Show
          Roshan Naik added a comment - I dont see any tight coupling issues if we could pass the batch size at time of creation of the transaction.( and not at config time). So the sinks and sources would instantiate a transaction based on their respective batch sizes. This argument would indicate the max # of events that particular transaction instance would accept.
          Hide
          Udai Kiran Potluri added a comment - - edited

          Hari Shreedharan - Is this ticket still relevant?

          Show
          Udai Kiran Potluri added a comment - - edited Hari Shreedharan - Is this ticket still relevant?
          Hide
          Hari Shreedharan added a comment -

          I think we should close this one out as Won't fix.

          Show
          Hari Shreedharan added a comment - I think we should close this one out as Won't fix.
          Hide
          Brock Noland added a comment -

          Seems like we had an agreement on on setting the default to math.min(channel capacity, 10000)...

          This breaks many users so unless there is an objection I will submit a patch.

          Show
          Brock Noland added a comment - Seems like we had an agreement on on setting the default to math.min(channel capacity, 10000)... This breaks many users so unless there is an objection I will submit a patch.
          Hide
          Hari Shreedharan added a comment -

          Don't mind - go ahead.

          Show
          Hari Shreedharan added a comment - Don't mind - go ahead.
          Hide
          Mike Percy added a comment -

          There is a patch from Udai Kiran Potluri on FLUME-2268, I agree with Ashish Paliwal that that JIRA could be merged into this one. Also FLUME-2267

          Show
          Mike Percy added a comment - There is a patch from Udai Kiran Potluri on FLUME-2268 , I agree with Ashish Paliwal that that JIRA could be merged into this one. Also FLUME-2267
          Hide
          Mike Percy added a comment -

          Err, or "Won't Fix" this one and commit the other two, only nice thing about this one is that it provides context to the design decisions around changing the default. I don't have a strong preference, whatever you guys wanna do

          Show
          Mike Percy added a comment - Err, or "Won't Fix" this one and commit the other two, only nice thing about this one is that it provides context to the design decisions around changing the default. I don't have a strong preference, whatever you guys wanna do
          Hide
          Mike Percy added a comment -

          I just marked those two as related to this one.

          If no concerns, I will mark this JIRA as Won't Fix and commit the other two today.

          Show
          Mike Percy added a comment - I just marked those two as related to this one. If no concerns, I will mark this JIRA as Won't Fix and commit the other two today.
          Hide
          Mike Percy added a comment -

          Marking Won't Fix. See FLUME-2267 & FLUME-2268 for followup on increasing the transactionCapacity default values.

          Show
          Mike Percy added a comment - Marking Won't Fix. See FLUME-2267 & FLUME-2268 for followup on increasing the transactionCapacity default values.

            People

            • Assignee:
              Unassigned
              Reporter:
              Hari Shreedharan
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development