Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10832

ClickhouseIO's getTableSchema method is called before Pipeline Starts

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Triage Needed
    • Priority: P3
    • Resolution: Unresolved
    • Affects Version/s: 2.23.0
    • Fix Version/s: Not applicable
    • Component/s: beam-model
    • Labels:
      None

      Description

      A method in ClickhouseIO called getTableSchema() is being used in WriteFn's expand method which is called even before the Pipeline is started. The main issue is that getTableSchema() makes a connection with Clickhouse and if at the time of just pipeline launch, if i can't connect to a clickhouse-server, the pipeline won't even start. Let's suppose there is a clickhouse server deployed on a production server, now if i want to launch a DataFlow pipeline from my local then i shouldn't be requiring a working connection to clickhouse-server from my local environment (but i should be able to connect to clickhouse-server from dataflow).

       

      What i suggest:

      getTableSchema() should be a singleton method and must be called in setup() method (instead of PTransform's expand method) of DoFn since setup method is called after the pipeline is started (In my case "at DataFlow" not local)

       

      I would be more than happy to work on this improvement in Apache Beam (Java).

       

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                vasu7052 Vasu Gupta
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m