Uploaded image for project: 'Commons CSV'
  1. Commons CSV
  2. CSV-277

Review Lexer simpleToken for Performance

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Running the Apache ORC benchmarks which has commons-csv as a dependency and noticed the bulk of running time is in commons-csv.

      I attached the VisualVM output and here is my test setup:

      JVM: OpenJDK 64-Bit Server VM (25.292-b10, mixed mode)
      Java: version 1.8.0_292, vendor Private Build
      Java Home: /usr/lib/jvm/java-8-openjdk-amd64/jre
      JVM Flags: <none>
      

      I suspect this is in part because ExtendedBufferedReader extends BufferedReader. BufferedReader is a synchronized method class which means that every call to read requires synchronization. Usually it's not an issue, but for commons-csv, it adds a lot of overhead because it reads each byte one-at-a-time. So even though it's buffered, it has to go through a synchronization processes for each byte read. It also has to perform a "jump" into the parent class for each byte.

      Nothing else stands out to me as being "slow."

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            belugabehr David Mollitor

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3.5h
                3.5h

                Slack

                  Issue deployment