Uploaded image for project: 'Commons CSV'
  1. Commons CSV
  2. CSV-277

Review Lexer simpleToken for Performance

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Running the Apache ORC benchmarks which has commons-csv as a dependency and noticed the bulk of running time is in commons-csv.

      I attached the VisualVM output and here is my test setup:

      JVM: OpenJDK 64-Bit Server VM (25.292-b10, mixed mode)
      Java: version 1.8.0_292, vendor Private Build
      Java Home: /usr/lib/jvm/java-8-openjdk-amd64/jre
      JVM Flags: <none>
      

      I suspect this is in part because ExtendedBufferedReader extends BufferedReader. BufferedReader is a synchronized method class which means that every call to read requires synchronization. Usually it's not an issue, but for commons-csv, it adds a lot of overhead because it reads each byte one-at-a-time. So even though it's buffered, it has to go through a synchronization processes for each byte read. It also has to perform a "jump" into the parent class for each byte.

      Nothing else stands out to me as being "slow."

      Attachments

        1. CSVCapture.PNG
          183 kB
          David Mollitor

        Activity

          People

            Unassigned Unassigned
            belugabehr David Mollitor
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3.5h
                3.5h