Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-2920

sqoop performance deteriorates significantly on wide datasets; sqoop 100% on cpu

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.4.5, 1.4.6
    • Fix Version/s: 1.4.7
    • Environment:

      Description

      We sqoop export from datalake to Oracle quite often.
      Every time we sqoop "narrow" datasets, Oracle always have scalability issues (3-node all-flash Oracle RAC) normally can't keep up with more than 45-55 sqoop mappers. Map-reduce framework shows sqoop mappers are not so loaded.

      On wide datasets, this picture is quite opposite. Oracle shows 95% of sessions are bored and waiting for new INSERTs. Even when we go over hundred of mappers. Sqoop has serious scalability issues on very wide datasets. (Our company normally has very wide datasets)

      For example, on the last sqoop export:
      Started ~2.5 hours ago and 95 mappers already accumulated
      CPU time spent (ms) 1,065,858,760
      (looking at this metric through map-reduce framework stats)

      1 million seconds of CPU time.

      Or 11219.57 per mapper. Which is roughly 3.11 hours of CPU time per mapper.
      So they are 100% on cpu.

      Will also attach jstack files.

        Attachments

        1. top - sqoop mappers hog cpu.png
          70 kB
          Ruslan Dautkhanov
        2. SQOOP-2920.patch
          11 kB
          Attila Szabo
        3. jstack.zip
          77 kB
          Ruslan Dautkhanov

          Issue Links

            Activity

              People

              • Assignee:
                maugli Attila Szabo
                Reporter:
                Tagar Ruslan Dautkhanov
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: