Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9407

Support orc rolling sink writer

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      Currently, we only support StringWriter, SequenceFileWriter and AvroKeyValueSinkWriter. I would suggest add an orc writer for rolling sink.

      Below, FYI.

      I tested the PR and verify the results with spark sql. Obviously, we can get the results of what we had written down before. But I will give more tests in the next couple of days. Including the performance under compression with short checkpoint intervals. And more UTs.

      scala> spark.read.orc("hdfs://10.199.196.0:9000/data/hive/man/2018-07-06--21")
      res1: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]
      
      scala>
      
      scala> res1.registerTempTable("tablerice")
      warning: there was one deprecation warning; re-run with -deprecation for details
      
      scala> spark.sql("select * from tablerice")
      res3: org.apache.spark.sql.DataFrame = [name: string, age: int ... 1 more field]
      
      scala> res3.show(3)
      +-----+---+-------+
      | name|age|married|
      +-----+---+-------+
      |Sagar| 26|  false|
      |Sagar| 30|  false|
      |Sagar| 34|  false|
      +-----+---+-------+
      only showing top 3 rows
      

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mingleizhang zhangminglei

              Dates

              • Created:
                Updated:

                Issue deployment