Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-19903

Allow to read metadata in filesystem connector

    XMLWordPrintableJSON

Details

    Description

      Use case: 

      I have a dataset where they embedded some information in the filenames
      (200k files) and I need to extract that as a new column.

      In Spark I could `
      .withColumn("id",f.split(f.reverse(f.split(f.input_file_name(),'/'))[0],'\.')[0])`
       but I don't see how can I do the same with Flink.

       

      Apparently there is FLIP-107 which would allow SQL connectors and formats to expose metadata. 

       

      So it would be great for the Filesystem SQL connector to expose the path. 

      Ideally for me the path could be exposed via a function that read the metadata. So I could write  something akin to `SELECT input_file_name(),* FROM table1`

       

       

      [1]: https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors

      [2]http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-I-get-the-filename-as-a-column-td39096.html

      Attachments

        1. image-2020-11-03-08-53-03-714.png
          372 kB
          Ruben Laguna

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              slinkydeveloper Francesco Guardiani
              ecerulm Ruben Laguna
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: