Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-3714

JdbcIO.read() should create a forward-only, read-only result set

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.5.0
    • io-java-jdbc
    • None

    Description

      https://stackoverflow.com/questions/48784889/streaming-data-from-cloudsql-into-dataflow/48819934#48819934 - a user is trying to load a large table from MySQL, and the MySQL JDBC driver requires special measures when loading large result sets.

      JdbcIO currently calls simply "connection.prepareStatement(query)" https://github.com/apache/beam/blob/bb8c12c4956cbe3c6f2e57113e7c0ce2a5c05009/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L508 - it should specify type TYPE_FORWARD_ONLY and concurrency CONCUR_READ_ONLY - these values should always be used.

      Seems that different databases have different requirements for streaming result sets.
      E.g. MySQL requires setting fetch size; PostgreSQL says "The Connection must not be in autocommit mode." https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor . Oracle, I think, doesn't have any special requirements but I don't know. Fetch size should probably still be set to a reasonably large value.

      Seems that the common denominator of these requirements is: set fetch size to a reasonably large but not maximum value; disable autocommit (there's nothing to commit in read() anyway).

      Attachments

        Activity

          People

            evindj Innocent
            jkff Eugene Kirpichov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h
                4h