Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33635

Performance regression in Kafka read

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 3.0.1
    • Fix Version/s: 3.0.2, 3.1.1
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      Description

      I have observed a slowdown in the reading of data from kafka on all of our systems when migrating from spark 2.4.5 to Spark 3.0.0 (and Spark 3.0.1)

      I have created a sample project to isolate the problem as much as possible, with just a read all data from a kafka topic (see https://github.com/codegorillauk/spark-kafka-read ).

      With 2.4.5, across multiple runs,
      I get a stable read rate of 1,120,000 (1.12 mill) rows per second

      With 3.0.0 or 3.0.1, across multiple runs,
      I get a stable read rate of 632,000 (0.632 mil) rows per second

      The represents a 44% loss in performance. Which is, a lot.

      I have been working though the spark-sql-kafka-0-10 code base, but change for spark 3 have been ongoing for over a year and its difficult to pin point an exact change or reason for the degradation.

      I am happy to help fix this problem, but will need some assitance as I am unfamiliar with the spark-sql-kafka-0-10 project.

       

      A sample of the data my test reads (note: its not parsing csv - this is just test data)
      1606921800000,001e0610e532,lightsense,tsl250rd,intensity,21853,53.262,acceleration_z,651,ep,290,commit,913,pressure,138,pm1,799,uv_intensity,823,idletime,-372,count,-72,ir_intensity,185,concentration,-61,flags,-532,tx,694.36,ep_heatsink,-556.92,acceleration_x,-221.40,fw,910.53,sample_flow_rate,-959.60,uptime,-515.15,pm10,-768.03,powersupply,214.72,magnetic_field_y,-616.04,alphasense,606.73,AoT_Chicago,053,Racine Ave & 18th St Chicago IL,41.857959,-87.65642700000002,AoT Chicago (S) [C],2017/12/15 00:00:00,

        Attachments

          Activity

            People

            • Assignee:
              kabhwan Jungtaek Lim
              Reporter:
              david.wyles David Wyles
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: