Cassandra
  1. Cassandra
  2. CASSANDRA-2276

Pig memory issues with default LIMIT and large rows.

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Fix Version/s: 0.7.4
    • Component/s: Hadoop
    • Labels:

      Description

      Rows with a lot of columns, especially super-colums with a lot of values can cause OutOfMemory errors in Cassandra when queried with Pig.

      1. cassandrastorage.diff
        0.9 kB
        Matt Kennedy
      2. cassandrastorage_2.diff
        0.8 kB
        Matt Kennedy
      3. cassandrastorage3.diff
        1 kB
        Matt Kennedy

        Activity

        Hide
        Hudson added a comment -

        Integrated in Cassandra-0.7 #357 (See https://hudson.apache.org/hudson/job/Cassandra-0.7/357/)

        Show
        Hudson added a comment - Integrated in Cassandra-0.7 #357 (See https://hudson.apache.org/hudson/job/Cassandra-0.7/357/ )
        Hide
        Jonathan Ellis added a comment -

        added javadoc and committed. thanks!

        Show
        Jonathan Ellis added a comment - added javadoc and committed. thanks!
        Hide
        Matt Kennedy added a comment -

        OK, third time's the charm, coded this one against trunk and just successfully applied it to a fresh check-out.

        Show
        Matt Kennedy added a comment - OK, third time's the charm, coded this one against trunk and just successfully applied it to a fresh check-out.
        Hide
        Matt Kennedy added a comment -

        D'oh! I wrote it against a checkout of the 0.7.3 tag instead of trunk. I'll port the changes to trunk tonight. Sorry for the confusion.

        Show
        Matt Kennedy added a comment - D'oh! I wrote it against a checkout of the 0.7.3 tag instead of trunk. I'll port the changes to trunk tonight. Sorry for the confusion.
        Hide
        Jonathan Ellis added a comment -

        patch fails to apply for me, is this against current 0.7 branch?

        Show
        Jonathan Ellis added a comment - patch fails to apply for me, is this against current 0.7 branch?
        Hide
        Jeremy Hanna added a comment -

        okay - thank you - I just wanted to see if you found a way to dereference subcolumns even with PIG-1849. Sounds like you're not though. Thanks.

        Show
        Jeremy Hanna added a comment - okay - thank you - I just wanted to see if you found a way to dereference subcolumns even with PIG-1849 . Sounds like you're not though. Thanks.
        Hide
        Matt Kennedy added a comment -

        Only for the purposes of counting the super columns, no access to the subcolumns.

        Show
        Matt Kennedy added a comment - Only for the purposes of counting the super columns, no access to the subcolumns.
        Hide
        Jeremy Hanna added a comment -

        Matt - you mention super columns - are you iterating over super columns successfully, that is are you able to access the data in subcolumns successfully?

        Show
        Jeremy Hanna added a comment - Matt - you mention super columns - are you iterating over super columns successfully, that is are you able to access the data in subcolumns successfully?
        Hide
        Matt Kennedy added a comment -

        Corrected patch for final limit.

        Show
        Matt Kennedy added a comment - Corrected patch for final limit.
        Hide
        Matt Kennedy added a comment -

        new patch reflecting Jonathan Ellis' comment.

        Show
        Matt Kennedy added a comment - new patch reflecting Jonathan Ellis' comment.
        Hide
        Matt Kennedy added a comment -

        Yeah, fair point. It isn't really useful, I was just letting eclipse write code for me.

        Show
        Matt Kennedy added a comment - Yeah, fair point. It isn't really useful, I was just letting eclipse write code for me.
        Hide
        Jonathan Ellis added a comment -

        is it actually useful to call setLimit post-construction, or should we make it final?

        Show
        Jonathan Ellis added a comment - is it actually useful to call setLimit post-construction, or should we make it final?
        Hide
        Matt Kennedy added a comment -

        Adds a constructor that allows the user to modify the limit parameter used in CassandraStorage to fetch fewer rows and use less memory in Cassandra.

        Show
        Matt Kennedy added a comment - Adds a constructor that allows the user to modify the limit parameter used in CassandraStorage to fetch fewer rows and use less memory in Cassandra.

          People

          • Assignee:
            Matt Kennedy
            Reporter:
            Matt Kennedy
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development