Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-2659

JdbcIOIT flaky when run using io-it-suite-local

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • testing
    • None

    Description

      Note: the problem below should not affect io-it-suite and thus the jdbc jenkins job that's currently in PR. I haven't tested that exact configurations so I'm not 100% certain, but I don't have any indications that there'll be a problem.

      I've been running the postgres kubernetes scripts locally for a while and haven't seen any problems.

      However, now that I'm running it via io-it-suite-local, and I'm starting to see flakiness - clients attempting to connect to the postgres server will get "connection attempt failed" error. The difference between what was working before and now is that now the load balancer and the pod are getting set up at the same time. Before I was using it with a pre-existing load balancer - that is I haven't been tearing down/starting up the load balancer+pod at run time.

      So I think the problem is in the interaction between the two or potentially just in the LoadBalancer service (it may take a little bit longer to get fully hooked up even after it reports an IP)

      Possible causes:

      • the loadbalancer is reporting it's ready before it actually can serve traffic to the postgres instance
      • the lodabalancer has another status field that I'm not looking at - today we only check IP address, perhaps the loadbalancer exposes a status field? kubectl get/describe might be able to help. A cursory examination didn't show anything helpful.
      • the postgres instance isn't actually ready when it says it is. I don't think that's the issue since I was working with postgres pods before and they seemed fine then

      Potential solutions:

      • if cause is slow postgres pod start (unlikely): determine postgres pod health by reading from sql? (pg_ctl?), and then have pkb wait for that by adding a dynamic_pipeline_option that wait for the kubernetes status to be okay and sends to a non-existent pipeline option
      • file bug about loadbalancer not being ready when it says it is? (investigate that more
      • have some way for pkb to actually connect to and validate the connection to postgres (that seems complicated.)

      If the problem is that the loadbalancer is not ready when it says it is, while we are waiting for kubernetes to fix the issue, one workaround would be to:
      1) modify io-it-suite-local to not load any kubernetes scripts (set --beam_kubernetes_scripts equal to blank line or skip it altogether - https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/pom.xml#L137)
      2) have the user run the kubernetes scripts manually beforehand, wait for it to be healthy, and then run io-it-suite-local

      To repro the problem:
      mvn verify -Dio-it-suite-local -pl sdks/java/io/jdbc -DpkbLocation="your-copy-of-PerfKitBenchmarker/pkb.py" -DintegrationTestPipelineOptions='["--tempRoot=gs://sisk-test/staging"]' -DforceDirectRunner=true

      this should fail when run repeatedly.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sisk Stephen Sisk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: