Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-22472

The real partition data produced time is behind meta(_SUCCESS) file produced

    XMLWordPrintableJSON

Details

    Description

      I test write some data to csv file by flink filesystem connector, but after the success file produced, the data file is still un-committed, it's very weird to me.

      bang@mac db1.db $ll /var/folders/55/cw682b314gn8jhfh565hp7q00000gp/T/junit8642959834366044048/junit484868942580135598/test-partition-time-commit/d\=2020-05-03/e\=12/
      total 8
      drwxr-xr-x  4 bang  staff  128  4 25 19:57 ./
      drwxr-xr-x  8 bang  staff  256  4 25 19:57 ../
      -rw-r--r--  1 bang  staff   12  4 25 19:57 .part-b703d4b9-067a-4dfe-935e-3afc723aed56-0-4.inprogress.b7d9cf09-0f72-4dce-8591-b61b1d23ae9b
      -rw-r--r--  1 bang  staff    0  4 25 19:57 _MY_SUCCESS
      

       

      After some debug I found I have to set  sink.rolling-policy.file-size or sink.rolling-policy.rollover-interval parameters, the default value of the two parameters is pretty big(128M and 30min). It's not convenient for test/demo. I think we can improve this.

       

      As the doc[1] described, for row formats (csv, json), you can set the parameter sink.rolling-policy.file-size or sink.rolling-policy.rollover-interval in the connector properties and parameter execution.checkpointing.interval in flink-conf.yaml together if you don’t want to wait a long period before observe the data exists in file system.

      [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/table/filesystem/#rolling-policy

      Attachments

        Issue Links

          Activity

            People

              luoyuxia luoyuxia
              leonard Leonard Xu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: