Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-1898

json2sstable should support streaming

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 0.7.1
    • Legacy/Tools
    • None

    Description

      json2sstable loads the entire json file into memory. This is so it can sort the file before creating an sstable. If the file was created using sstable2json and the partitioner isn't changing, this isn't necessary. For very large files this means json2sstable requires a huge amount of memory.

      There should be an option to stream the file. A simple check for out of order keys will prevent writing bad sstables.

      This should be possible with the SAX style parser available in our current json library.

      Attachments

        1. CASSANDRA-1898.patch
          23 kB
          Pavel Yaskevich
        2. CASSANDRA-1898-sstable2json-comma-fix-0.7.patch
          2 kB
          Pavel Yaskevich
        3. CASSANDRA-1898-sstable2json-comma-fix-0.7-v2.patch
          2 kB
          Pavel Yaskevich
        4. CASSANDRA-1898-sstable2json-comma-fix-0.8.patch
          2 kB
          Pavel Yaskevich
        5. CASSANDRA-1898-sstable2json-comma-fix-0.8-v2.patch
          2 kB
          Pavel Yaskevich
        6. CASSANDRA-1898-v2.patch
          29 kB
          Pavel Yaskevich
        7. CASSANDRA-1898-v3.patch
          33 kB
          Pavel Yaskevich
        8. CASSANDRA-1898-v4.patch
          33 kB
          Pavel Yaskevich

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            xedin Pavel Yaskevich Assign to me
            nickmbailey Nick Bailey
            Pavel Yaskevich
            Jonathan Ellis
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 8h
              8h
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 8h
              8h

              Slack

                Issue deployment