Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5516

Hardcoded paths in flink-python/.../PythonPlanBinder.java

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: Python API
    • Labels:
      None

      Description

      The PythonPlanBinder.java contains three hardcoded filesystem paths:

      public static final String FLINK_PYTHON_FILE_PATH = System.getProperty("java.io.tmpdir") + File.separator + "flink_plan";
      
      private static String FLINK_HDFS_PATH = "hdfs:/tmp";
      public static final String FLINK_TMP_DATA_DIR = System.getProperty("java.io.tmpdir") + File.separator + "flink_data";
      

      FLINK_PYTHON_FILE_PATH and FLINK_TMP_DATA_DIR are configurable by modifying java.io.tmpdir.
      For FLINK_HDFS_PATH, there is no way of configuring otherwise but modifying the source.

      Is it possible to make all three parameters configurable in the usual flink configuration files (like flink-conf.yaml)?

        Activity

        Hide
        aljoscha Aljoscha Krettek added a comment -

        Chesnay Schepler would this be possible?

        I think no one is currently working on these parts but we would be very happy about contributions.

        Show
        aljoscha Aljoscha Krettek added a comment - Chesnay Schepler would this be possible? I think no one is currently working on these parts but we would be very happy about contributions.
        Hide
        Zentol Chesnay Schepler added a comment -

        Yeah this is easy to implement; it is pretty much a one-liner in the PythonPlanBinder.

        There is only one small thing to be wary of:

        The path we are talking about here is where we upload the python library, to then register it in the DistributedCache. The default for this is "hdfs:/tmp". However. if you execute in a local environment (i.e. the tests) then this is changed to "file:<java.io.tmpdir>/flink".

        So...we could change the default to "file:..." and force the user to configure a path. Or keep the current behavior, but introduce a flag so that we don't override the user-specified location.

        Show
        Zentol Chesnay Schepler added a comment - Yeah this is easy to implement; it is pretty much a one-liner in the PythonPlanBinder. There is only one small thing to be wary of: The path we are talking about here is where we upload the python library, to then register it in the DistributedCache. The default for this is "hdfs:/tmp". However. if you execute in a local environment (i.e. the tests) then this is changed to "file:<java.io.tmpdir>/flink". So...we could change the default to "file:..." and force the user to configure a path. Or keep the current behavior, but introduce a flag so that we don't override the user-specified location.
        Hide
        Zentol Chesnay Schepler added a comment -

        Separate config options for all paths were introduced in bdcebfda06846a1e21bb6a4678909d503ebc6333.

        `python.plan.tmp.dir` - configures the path on the client where temporary files are stored
        `python.mmap.tmp.dir` - configures the path where the memory-mapped files are stored on the TaskManagers
        `python.dc.tmp.dir` - configures the path where the python library, plan file and additional files will be uploaded to before registering them with the DC

        Show
        Zentol Chesnay Schepler added a comment - Separate config options for all paths were introduced in bdcebfda06846a1e21bb6a4678909d503ebc6333. `python.plan.tmp.dir` - configures the path on the client where temporary files are stored `python.mmap.tmp.dir` - configures the path where the memory-mapped files are stored on the TaskManagers `python.dc.tmp.dir` - configures the path where the python library, plan file and additional files will be uploaded to before registering them with the DC

          People

          • Assignee:
            Zentol Chesnay Schepler
            Reporter:
            felxe Felix seibert
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development