Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4059 Pig on Spark
  3. PIG-5241

Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • spark-branch
    • spark
    • None

    Description

      //TODO: Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java

        private void cacheFiles(String cacheFiles) throws IOException {
              if (cacheFiles != null && !cacheFiles.isEmpty()) {
                  File tmpFolder = Files.createTempDirectory("cache").toFile();
                  tmpFolder.deleteOnExit();
                  for (String file : cacheFiles.split(",")) {
                      String fileName = extractFileName(file.trim());
                      Path src = new Path(extractFileUrl(file.trim()));
                      File tmpFile = new File(tmpFolder, fileName);
                      Path tmpFilePath = new Path(tmpFile.getAbsolutePath());
                      FileSystem fs = tmpFilePath.getFileSystem(jobConf);
                      //TODO: Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java
                      fs.copyToLocalFile(src, tmpFilePath);
                      tmpFile.deleteOnExit();
                      LOG.info(String.format("CacheFile:%s", fileName));
                      addResourceToSparkJobWorkingDirectory(tmpFile, fileName,
                              ResourceType.FILE);
                  }
              }
          }
      

      Attachments

        Activity

          People

            nkollar Nándor Kollár
            kellyzly liyunzhang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: