Uploaded image for project: 'Commons CSV'
  1. Commons CSV
  2. CSV-131

Save positions of records to enable random access

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.1
    • Fix Version/s: None
    • Component/s: Parser
    • Labels:
      None

      Description

      It would be good to have CSVRecord save its position in the source stream.

      Reason: Knowing the position of the records would enable random access to retrieve records from the source (after reading it once to build an index) if the file is too large to be read into memory (or if we don't want to read the full file to access a record in the middle).

      Additional info: I have created a "random access csv reader" and a "csv viewer" (Swing) for arbitrarily large CSV files. It requires one additional scan of the file to build an index (multi-byte charsets supported). The index can be saved to a file so it only needs to be built once. Because the lexer uses a BufferedReader, we need "internal information" to know where each record starts.
      The change to "core" is minor: one field in {{CSVRecord}}s and some associated methods to store the position.
      Patch will be attached.
      Code for random access (both UI and non-UI) will be proposed (and possibly submitted) as a separate issue. It could also be an independent add-on but requires this one little change to Commons CSV.

        Attachments

        1. PositionTrackingTest_20140907.patch
          3 kB
          Holger Stratmann
        2. PositionTrackingFull_v101_20140910.patch
          10 kB
          Holger Stratmann
        3. PositionTracking_20140907.patch
          6 kB
          Holger Stratmann
        4. ggregory-CSV-131-parser-and-record.diff
          11 kB
          Gary D. Gregory
        5. CSV-131-gg-0.diff
          12 kB
          Gary D. Gregory

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                HolgerS Holger Stratmann
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - 1h
                  1h
                  Remaining:
                  Remaining Estimate - 1h
                  1h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified