Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-992

Create CollectionDataSets by reading (client) local files.

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: DataSet API, Python API
    • Labels:

      Description

      CollectionDataSets are a nice way to feed data into programs.
      We could add support to read a client-local file at program construction time using a FileInputFormat, put its data into a CollectionDataSet, and ship its data together with the program.

      This would remove the need to upload small files into DFS which are used together with some large input (stored in DFS).

        Activity

        Hide
        nrai niraj rai added a comment -

        Please go ahead

        On Jun 21, 2017 2:00 PM, "Neelesh Srinivas Salian (JIRA)" <jira@apache.org>

        Show
        nrai niraj rai added a comment - Please go ahead On Jun 21, 2017 2:00 PM, "Neelesh Srinivas Salian (JIRA)" <jira@apache.org>
        Hide
        nssalian Neelesh Srinivas Salian added a comment -

        niraj rai, if you are not working on this at the moment, can I assign it to myself?

        Show
        nssalian Neelesh Srinivas Salian added a comment - niraj rai , if you are not working on this at the moment, can I assign it to myself?
        Hide
        neelesh77 Neelesh Srinivas Salian added a comment -

        Shall I begin to work on this if no one else is?

        Show
        neelesh77 Neelesh Srinivas Salian added a comment - Shall I begin to work on this if no one else is?
        Hide
        stefanobaghino Stefano Baghino added a comment -

        Sure, thanks for the reply. I just wanted to check out if I could be of help with this issue.

        Show
        stefanobaghino Stefano Baghino added a comment - Sure, thanks for the reply. I just wanted to check out if I could be of help with this issue.
        Hide
        nrai niraj rai added a comment -

        Hi Stefano, I will submit the patch by next week. is it ok with you?
        Thanks
        Niraj

        On Fri, Jan 15, 2016 at 8:53 AM, Stefano Baghino (JIRA) <jira@apache.org>

        Show
        nrai niraj rai added a comment - Hi Stefano, I will submit the patch by next week. is it ok with you? Thanks Niraj On Fri, Jan 15, 2016 at 8:53 AM, Stefano Baghino (JIRA) <jira@apache.org>
        Hide
        stefanobaghino Stefano Baghino added a comment -

        What is the status of this issue?

        Show
        stefanobaghino Stefano Baghino added a comment - What is the status of this issue?
        Hide
        fhueske Fabian Hueske added a comment -

        Hi niraj rai,

        thanks for picking up this issue. Flink's DataSet API features DataSets which are built from regular Java collections. This is done via the ExecutionEnvironment as ExecutionEnvironment.fromCollection(myCollection). Under the hood, the Java collection is submitted to the executing Flink instance (cluster, local, YARN, ...) and the collection's data is processed.

        This feature will use Flink's collection DataSets to process a file which is local on the user's client on a remote cluster. Instead of copying the small file into a file system or data store that can be accessed from the cluster, the client will be able to convert the file into a Java collection and use the collection as a DataSet in a Flink program. I would propose to read the local file by using Flink's regular InputFormats.

        Please let me know if you have further questions,
        Fabian

        Show
        fhueske Fabian Hueske added a comment - Hi niraj rai , thanks for picking up this issue. Flink's DataSet API features DataSets which are built from regular Java collections. This is done via the ExecutionEnvironment as ExecutionEnvironment.fromCollection(myCollection) . Under the hood, the Java collection is submitted to the executing Flink instance (cluster, local, YARN, ...) and the collection's data is processed. This feature will use Flink's collection DataSets to process a file which is local on the user's client on a remote cluster. Instead of copying the small file into a file system or data store that can be accessed from the cluster, the client will be able to convert the file into a Java collection and use the collection as a DataSet in a Flink program. I would propose to read the local file by using Flink's regular InputFormats. Please let me know if you have further questions, Fabian
        Hide
        nrai niraj rai added a comment -

        Hi Fabian,
        Can you please provide more details about this feature? My understanding is, if we need to read the data from local file system.
        Are you suggesting, we should read the data from local file system and and pass it to collection data sets?
        Thanks again.
        Niraj

        Show
        nrai niraj rai added a comment - Hi Fabian, Can you please provide more details about this feature? My understanding is, if we need to read the data from local file system. Are you suggesting, we should read the data from local file system and and pass it to collection data sets? Thanks again. Niraj
        Hide
        hsaputra Henry Saputra added a comment -

        HI niraj rai, yes you could start working on this one. Thanks!

        Show
        hsaputra Henry Saputra added a comment - HI niraj rai , yes you could start working on this one. Thanks!
        Hide
        nrai niraj rai added a comment -

        Hi, if no one is working, I can work on this.

        Show
        nrai niraj rai added a comment - Hi, if no one is working, I can work on this.
        Hide
        hsaputra Henry Saputra added a comment -

        Thanks! Assign it to myself

        Show
        hsaputra Henry Saputra added a comment - Thanks! Assign it to myself
        Hide
        rmetzger Robert Metzger added a comment -

        Sure, take it.

        Show
        rmetzger Robert Metzger added a comment - Sure, take it.
        Hide
        hsaputra Henry Saputra added a comment -

        If no object I would like to work on this to familiar myself on the API between Java and Scala.

        Show
        hsaputra Henry Saputra added a comment - If no object I would like to work on this to familiar myself on the API between Java and Scala.

          People

          • Assignee:
            nssalian Neelesh Srinivas Salian
            Reporter:
            fhueske Fabian Hueske
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development