Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13204 Über-jira: S3a phase III: scale and tuning
  3. HADOOP-14138

Remove S3A ref from META-INF service discovery, rely on existing core-default entry

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.9.0
    • Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha4
    • Component/s: fs/s3
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Incompatible change
    • Release Note:
      Hide
      The classpath implementing the s3a filesystem is now defined in core-default.xml. Attempting to instantiate an S3A filesystem instance using a Configuration instance which has not included the default resorts will fail. Applications should not be doing this anyway, as it will lose other critical configuration options needed by the filesystem.
      Show
      The classpath implementing the s3a filesystem is now defined in core-default.xml. Attempting to instantiate an S3A filesystem instance using a Configuration instance which has not included the default resorts will fail. Applications should not be doing this anyway, as it will lose other critical configuration options needed by the filesystem.

      Description

      As discussed in HADOOP-14132, the shaded AWS library is killing performance starting all hadoop operations, due to classloading on FS service discovery.

      This is despite the fact that there is an entry for fs.s3a.impl in core-default.xml, we don't need service discovery here

      Proposed:

      1. cut the entry from /hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem
      2. when HADOOP-14132 is in, move to that, including declaring an XML file exclusively for s3a entries

      I want this one in first as its a major performance regression, and one we coula actually backport to 2.7.x, just to improve load time slightly there too

      1. HADOOP-14138.001.patch
        0.6 kB
        Jason Lowe
      2. HADOOP-14138-branch-2-001.patch
        0.6 kB
        Steve Loughran

        Issue Links

          Activity

          Hide
          stevel@apache.org Steve Loughran added a comment -

          Patch 001, cuts the service entry. This just moves to on demand FS creation from the core-default entry

          testing: s3 ireland

          Show
          stevel@apache.org Steve Loughran added a comment - Patch 001, cuts the service entry. This just moves to on demand FS creation from the core-default entry testing: s3 ireland
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 6m 37s branch-2 passed
          +1 compile 0m 17s branch-2 passed with JDK v1.8.0_121
          +1 compile 0m 18s branch-2 passed with JDK v1.7.0_121
          +1 mvnsite 0m 27s branch-2 passed
          +1 mvneclipse 0m 14s branch-2 passed
          +1 javadoc 0m 13s branch-2 passed with JDK v1.8.0_121
          +1 javadoc 0m 15s branch-2 passed with JDK v1.7.0_121
          +1 mvninstall 0m 18s the patch passed
          +1 compile 0m 13s the patch passed with JDK v1.8.0_121
          +1 javac 0m 13s the patch passed
          +1 compile 0m 16s the patch passed with JDK v1.7.0_121
          +1 javac 0m 16s the patch passed
          +1 mvnsite 0m 24s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 javadoc 0m 10s the patch passed with JDK v1.8.0_121
          +1 javadoc 0m 13s the patch passed with JDK v1.7.0_121
          +1 unit 0m 21s hadoop-aws in the patch passed with JDK v1.7.0_121.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          12m 31s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:b59b8b7
          JIRA Issue HADOOP-14138
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12855613/HADOOP-14138-branch-2-001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit
          uname Linux d5cc96dcf8eb 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2 / cacaa29
          Default Java 1.7.0_121
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121
          JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11744/testReport/
          modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11744/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 6m 37s branch-2 passed +1 compile 0m 17s branch-2 passed with JDK v1.8.0_121 +1 compile 0m 18s branch-2 passed with JDK v1.7.0_121 +1 mvnsite 0m 27s branch-2 passed +1 mvneclipse 0m 14s branch-2 passed +1 javadoc 0m 13s branch-2 passed with JDK v1.8.0_121 +1 javadoc 0m 15s branch-2 passed with JDK v1.7.0_121 +1 mvninstall 0m 18s the patch passed +1 compile 0m 13s the patch passed with JDK v1.8.0_121 +1 javac 0m 13s the patch passed +1 compile 0m 16s the patch passed with JDK v1.7.0_121 +1 javac 0m 16s the patch passed +1 mvnsite 0m 24s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 javadoc 0m 10s the patch passed with JDK v1.8.0_121 +1 javadoc 0m 13s the patch passed with JDK v1.7.0_121 +1 unit 0m 21s hadoop-aws in the patch passed with JDK v1.7.0_121. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 12m 31s Subsystem Report/Notes Docker Image:yetus/hadoop:b59b8b7 JIRA Issue HADOOP-14138 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12855613/HADOOP-14138-branch-2-001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit uname Linux d5cc96dcf8eb 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2 / cacaa29 Default Java 1.7.0_121 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11744/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11744/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          No tests needed, because removing the service entry means the only way the s3a fs will be found is through on-demand loading of the class when an s3a schema is reached. The s3a tests do exactly that

          Show
          stevel@apache.org Steve Loughran added a comment - No tests needed, because removing the service entry means the only way the s3a fs will be found is through on-demand loading of the class when an s3a schema is reached. The s3a tests do exactly that
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Did run one test with Filesystem log to DEBUG, so have it print what goes on with discovery.

          s3 and s3n are coming in from service loader, s3a is coming in from the core-default file. It probably always did, given that entry existed ... that service load has always been unneeded

          2017-03-02 12:54:07,050 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3147)) - Loading filesystems
          2017-03-02 12:54:07,057 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - s3:// = class org.apache.hadoop.fs.s3.S3FileSystem from null
          2017-03-02 12:54:07,062 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - s3n:// = class org.apache.hadoop.fs.s3native.NativeS3FileSystem from null
          2017-03-02 12:54:07,070 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - file:// = class org.apache.hadoop.fs.LocalFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-common/2.9.0-SNAPSHOT/hadoop-common-2.9.0-SNAPSHOT.jar
          2017-03-02 12:54:07,075 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - viewfs:// = class org.apache.hadoop.fs.viewfs.ViewFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-common/2.9.0-SNAPSHOT/hadoop-common-2.9.0-SNAPSHOT.jar
          2017-03-02 12:54:07,077 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - ftp:// = class org.apache.hadoop.fs.ftp.FTPFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-common/2.9.0-SNAPSHOT/hadoop-common-2.9.0-SNAPSHOT.jar
          2017-03-02 12:54:07,079 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - har:// = class org.apache.hadoop.fs.HarFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-common/2.9.0-SNAPSHOT/hadoop-common-2.9.0-SNAPSHOT.jar
          2017-03-02 12:54:07,085 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - hdfs:// = class org.apache.hadoop.hdfs.DistributedFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-hdfs-client/2.9.0-SNAPSHOT/hadoop-hdfs-client-2.9.0-SNAPSHOT.jar
          2017-03-02 12:54:07,201 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - webhdfs:// = class org.apache.hadoop.hdfs.web.WebHdfsFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-hdfs-client/2.9.0-SNAPSHOT/hadoop-hdfs-client-2.9.0-SNAPSHOT.jar
          2017-03-02 12:54:07,202 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - swebhdfs:// = class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-hdfs-client/2.9.0-SNAPSHOT/hadoop-hdfs-client-2.9.0-SNAPSHOT.jar
          2017-03-02 12:54:07,206 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - hftp:// = class org.apache.hadoop.hdfs.web.HftpFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-hdfs-client/2.9.0-SNAPSHOT/hadoop-hdfs-client-2.9.0-SNAPSHOT.jar
          2017-03-02 12:54:07,206 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - hsftp:// = class org.apache.hadoop.hdfs.web.HsftpFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-hdfs-client/2.9.0-SNAPSHOT/hadoop-hdfs-client-2.9.0-SNAPSHOT.jar
          2017-03-02 12:54:07,206 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3202)) - Looking for FS supporting s3a
          2017-03-02 12:54:07,206 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3206)) - looking for configuration option fs.s3a.impl
          2017-03-02 12:54:07,250 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3216)) - Filesystem s3a defined in configuration option
          2017-03-02 12:54:07,251 [Thread-0] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3222)) - FS for s3a is class org.apache.hadoop.fs.s3a.S3AFileSystem
          
          // and the actual test itself
          
          2017-03-02 12:54:08,231 [Thread-0] INFO  contract.AbstractFSContractTestBase (AbstractFSContractTestBase.java:setup(184)) - Test filesystem = s3a://hwdev-steve-ireland-new implemented by S3AFileSystem{uri=s3a://hwdev-steve-ireland-new, workingDir=s3a://hwdev-steve-ireland-new/user/stevel, inputPolicy=normal, partSize=8000000, enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536, blockSize=33554432, multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='SSE_S3', blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@be1e9c6, boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=30, available=30, waiting=0}, activeCount=0}, unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@503f1b5a[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], statistics {0 bytes read, 0 bytes written, 0 read ops, 0 large read ops, 0 write ops}, metrics {{Context=S3AFileSystem} {FileSystemId=1408be8b-d462-4388-91fa-90b4022356fc-hwdev-steve-ireland-new} {fsURI=s3a://hwdev-steve-ireland-new/} {files_created=0} {files_copied=0} {files_copied_bytes=0} {files_deleted=0} {fake_directories_deleted=0} {directories_created=0} {directories_deleted=0} {ignored_errors=0} {op_copy_from_local_file=0} {op_exists=0} {op_get_file_status=0} {op_glob_status=0} {op_is_directory=0} {op_is_file=0} {op_list_files=0} {op_list_located_status=0} {op_list_status=0} {op_mkdirs=0} {op_rename=0} {object_copy_requests=0} {object_delete_requests=0} {object_list_requests=0} {object_continue_list_requests=0} {object_metadata_requests=0} {object_multipart_aborted=0} {object_put_bytes=0} {object_put_requests=0} {object_put_requests_completed=0} {stream_write_failures=0} {stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} {stream_write_block_uploads_aborted=0} {stream_write_total_time=0} {stream_write_total_data=0} {object_put_requests_active=0} {object_put_bytes_pending=0} {stream_write_block_uploads_active=0} {stream_write_block_uploads_pending=0} {stream_write_block_uploads_data_pending=0} {stream_read_fully_operations=0} {stream_opened=0} {stream_bytes_skipped_on_seek=0} {stream_closed=0} {stream_bytes_backwards_on_seek=0} {stream_bytes_read=0} {stream_read_operations_incomplete=0} {stream_bytes_discarded_in_abort=0} {stream_close_operations=0} {stream_read_operations=0} {stream_aborted=0} {stream_forward_seek_operations=0} {stream_backward_seek_operations=0} {stream_seek_operations=0} {stream_bytes_read_in_close=0} {stream_read_exceptions=0} }}
          2017-03-02 12:54:08,628 [JUnit-testCreateNonRecursiveSuccess] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3202)) - Looking for FS supporting file
          2017-03-02 12:54:08,628 [JUnit-testCreateNonRecursiveSuccess] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3206)) - looking for configuration option fs.file.impl
          2017-03-02 12:54:08,628 [JUnit-testCreateNonRecursiveSuccess] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3213)) - Looking in service filesystems for implementation class
          2017-03-02 12:54:08,628 [JUnit-testCreateNonRecursiveSuccess] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3222)) - FS for file is class org.apache.hadoop.fs.LocalFileSystem
          2017-03-02 12:54:09,392 [JUnit-testCreateNonRecursiveSuccess] INFO  contract.AbstractFSContractTestBase (AbstractFSContractTestBase.java:describe(255)) - closing file system
          
          
          Show
          stevel@apache.org Steve Loughran added a comment - Did run one test with Filesystem log to DEBUG, so have it print what goes on with discovery. s3 and s3n are coming in from service loader, s3a is coming in from the core-default file. It probably always did, given that entry existed ... that service load has always been unneeded 2017-03-02 12:54:07,050 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3147)) - Loading filesystems 2017-03-02 12:54:07,057 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - s3: // = class org.apache.hadoop.fs.s3.S3FileSystem from null 2017-03-02 12:54:07,062 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - s3n: // = class org.apache.hadoop.fs.s3native.NativeS3FileSystem from null 2017-03-02 12:54:07,070 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - file: // = class org.apache.hadoop.fs.LocalFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-common/2.9.0-SNAPSHOT/hadoop-common-2.9.0-SNAPSHOT.jar 2017-03-02 12:54:07,075 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - viewfs: // = class org.apache.hadoop.fs.viewfs.ViewFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-common/2.9.0-SNAPSHOT/hadoop-common-2.9.0-SNAPSHOT.jar 2017-03-02 12:54:07,077 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - ftp: // = class org.apache.hadoop.fs.ftp.FTPFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-common/2.9.0-SNAPSHOT/hadoop-common-2.9.0-SNAPSHOT.jar 2017-03-02 12:54:07,079 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - har: // = class org.apache.hadoop.fs.HarFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-common/2.9.0-SNAPSHOT/hadoop-common-2.9.0-SNAPSHOT.jar 2017-03-02 12:54:07,085 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - hdfs: // = class org.apache.hadoop.hdfs.DistributedFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-hdfs-client/2.9.0-SNAPSHOT/hadoop-hdfs-client-2.9.0-SNAPSHOT.jar 2017-03-02 12:54:07,201 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - webhdfs: // = class org.apache.hadoop.hdfs.web.WebHdfsFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-hdfs-client/2.9.0-SNAPSHOT/hadoop-hdfs-client-2.9.0-SNAPSHOT.jar 2017-03-02 12:54:07,202 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - swebhdfs: // = class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-hdfs-client/2.9.0-SNAPSHOT/hadoop-hdfs-client-2.9.0-SNAPSHOT.jar 2017-03-02 12:54:07,206 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - hftp: // = class org.apache.hadoop.hdfs.web.HftpFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-hdfs-client/2.9.0-SNAPSHOT/hadoop-hdfs-client-2.9.0-SNAPSHOT.jar 2017-03-02 12:54:07,206 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:loadFileSystems(3159)) - hsftp: // = class org.apache.hadoop.hdfs.web.HsftpFileSystem from /Users/stevel/.m2/repository/org/apache/hadoop/hadoop-hdfs-client/2.9.0-SNAPSHOT/hadoop-hdfs-client-2.9.0-SNAPSHOT.jar 2017-03-02 12:54:07,206 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3202)) - Looking for FS supporting s3a 2017-03-02 12:54:07,206 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3206)) - looking for configuration option fs.s3a.impl 2017-03-02 12:54:07,250 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3216)) - Filesystem s3a defined in configuration option 2017-03-02 12:54:07,251 [ Thread -0] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3222)) - FS for s3a is class org.apache.hadoop.fs.s3a.S3AFileSystem // and the actual test itself 2017-03-02 12:54:08,231 [ Thread -0] INFO contract.AbstractFSContractTestBase (AbstractFSContractTestBase.java:setup(184)) - Test filesystem = s3a: //hwdev-steve-ireland- new implemented by S3AFileSystem{uri=s3a://hwdev-steve-ireland- new , workingDir=s3a://hwdev-steve-ireland- new /user/stevel, inputPolicy=normal, partSize=8000000, enableMultiObjectsDelete= true , maxKeys=5000, readAhead=65536, blockSize=33554432, multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='SSE_S3', blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@be1e9c6, boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=30, available=30, waiting=0}, activeCount=0}, unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@503f1b5a[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0], statistics {0 bytes read, 0 bytes written, 0 read ops, 0 large read ops, 0 write ops}, metrics {{Context=S3AFileSystem} {FileSystemId=1408be8b-d462-4388-91fa-90b4022356fc-hwdev-steve-ireland- new } {fsURI=s3a://hwdev-steve-ireland- new /} {files_created=0} {files_copied=0} {files_copied_bytes=0} {files_deleted=0} {fake_directories_deleted=0} {directories_created=0} {directories_deleted=0} {ignored_errors=0} {op_copy_from_local_file=0} {op_exists=0} {op_get_file_status=0} {op_glob_status=0} {op_is_directory=0} {op_is_file=0} {op_list_files=0} {op_list_located_status=0} {op_list_status=0} {op_mkdirs=0} {op_rename=0} {object_copy_requests=0} {object_delete_requests=0} {object_list_requests=0} {object_continue_list_requests=0} {object_metadata_requests=0} {object_multipart_aborted=0} {object_put_bytes=0} {object_put_requests=0} {object_put_requests_completed=0} {stream_write_failures=0} {stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} {stream_write_block_uploads_aborted=0} {stream_write_total_time=0} {stream_write_total_data=0} {object_put_requests_active=0} {object_put_bytes_pending=0} {stream_write_block_uploads_active=0} {stream_write_block_uploads_pending=0} {stream_write_block_uploads_data_pending=0} {stream_read_fully_operations=0} {stream_opened=0} {stream_bytes_skipped_on_seek=0} {stream_closed=0} {stream_bytes_backwards_on_seek=0} {stream_bytes_read=0} {stream_read_operations_incomplete=0} {stream_bytes_discarded_in_abort=0} {stream_close_operations=0} {stream_read_operations=0} {stream_aborted=0} {stream_forward_seek_operations=0} {stream_backward_seek_operations=0} {stream_seek_operations=0} {stream_bytes_read_in_close=0} {stream_read_exceptions=0} }} 2017-03-02 12:54:08,628 [JUnit-testCreateNonRecursiveSuccess] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3202)) - Looking for FS supporting file 2017-03-02 12:54:08,628 [JUnit-testCreateNonRecursiveSuccess] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3206)) - looking for configuration option fs.file.impl 2017-03-02 12:54:08,628 [JUnit-testCreateNonRecursiveSuccess] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3213)) - Looking in service filesystems for implementation class 2017-03-02 12:54:08,628 [JUnit-testCreateNonRecursiveSuccess] DEBUG fs.FileSystem (FileSystem.java:getFileSystemClass(3222)) - FS for file is class org.apache.hadoop.fs.LocalFileSystem 2017-03-02 12:54:09,392 [JUnit-testCreateNonRecursiveSuccess] INFO contract.AbstractFSContractTestBase (AbstractFSContractTestBase.java:describe(255)) - closing file system
          Hide
          jlowe Jason Lowe added a comment -

          +1 patch looks good to me. I agree that this would be a good candidate for 2.8 and 2.7 Will commit this tomorrow if there are no objections.

          Curious why the patch wasn't against trunk instead of branch-2. Is there a reason this shouldn't go into trunk? It applies (mostly) cleanly there, and I see the fs.s3a.impl property is still in core-default there.

          Show
          jlowe Jason Lowe added a comment - +1 patch looks good to me. I agree that this would be a good candidate for 2.8 and 2.7 Will commit this tomorrow if there are no objections. Curious why the patch wasn't against trunk instead of branch-2. Is there a reason this shouldn't go into trunk? It applies (mostly) cleanly there, and I see the fs.s3a.impl property is still in core-default there.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Thanks

          why against branch-2? I went for branch-2 is that branch-2 registers s3:// as URL; we've dropped that from trunk, the patches won't backport. A branch-2 patch works agains what I'm trying to do internally, and I can change the trunk patch as it goes in.

          Show
          stevel@apache.org Steve Loughran added a comment - Thanks why against branch-2? I went for branch-2 is that branch-2 registers s3:// as URL; we've dropped that from trunk, the patches won't backport. A branch-2 patch works agains what I'm trying to do internally, and I can change the trunk patch as it goes in.
          Hide
          liuml07 Mingliang Liu added a comment -

          +1

          Show
          liuml07 Mingliang Liu added a comment - +1
          Hide
          jlowe Jason Lowe added a comment -

          Attaching essentially the same patch for trunk to get a Jenkins run against trunk before the commit tomorrow.

          Show
          jlowe Jason Lowe added a comment - Attaching essentially the same patch for trunk to get a Jenkins run against trunk before the commit tomorrow.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 13m 7s trunk passed
          +1 compile 0m 18s trunk passed
          +1 mvnsite 0m 19s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 javadoc 0m 14s trunk passed
          +1 mvninstall 0m 15s the patch passed
          +1 compile 0m 14s the patch passed
          +1 javac 0m 14s the patch passed
          +1 mvnsite 0m 17s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 javadoc 0m 10s the patch passed
          +1 unit 0m 19s hadoop-aws in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          16m 51s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HADOOP-14138
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12855695/HADOOP-14138.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit
          uname Linux cbc3f557766e 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / b3ec531
          Default Java 1.8.0_121
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11746/testReport/
          modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11746/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 13m 7s trunk passed +1 compile 0m 18s trunk passed +1 mvnsite 0m 19s trunk passed +1 mvneclipse 0m 13s trunk passed +1 javadoc 0m 14s trunk passed +1 mvninstall 0m 15s the patch passed +1 compile 0m 14s the patch passed +1 javac 0m 14s the patch passed +1 mvnsite 0m 17s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 javadoc 0m 10s the patch passed +1 unit 0m 19s hadoop-aws in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 16m 51s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-14138 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12855695/HADOOP-14138.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit uname Linux cbc3f557766e 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b3ec531 Default Java 1.8.0_121 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11746/testReport/ modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11746/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Thanks, committed to 2.8.0+, just to save some trouble there. There's no penalty for backporting to 2.7 if you want to do this Jason.

          Show
          stevel@apache.org Steve Loughran added a comment - Thanks, committed to 2.8.0+, just to save some trouble there. There's no penalty for backporting to 2.7 if you want to do this Jason.
          Hide
          jlowe Jason Lowe added a comment -

          I committed this to branch-2.7 as well.

          Show
          jlowe Jason Lowe added a comment - I committed this to branch-2.7 as well.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11333 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11333/)
          HADOOP-14138. Remove S3A ref from META-INF service discovery, rely on (stevel: rev a97833e0ed4b31f0403ee3d789163615c7cdd9af)

          • (edit) hadoop-tools/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11333 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11333/ ) HADOOP-14138 . Remove S3A ref from META-INF service discovery, rely on (stevel: rev a97833e0ed4b31f0403ee3d789163615c7cdd9af) (edit) hadoop-tools/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Just flagging that this change broke some downstream code which was creating a config without picking up core-default.xml, then expecting s3a:// URLs to work. That's a pretty bad thing to do, as it means all the standard config options are missing.

          Adding a note "stop this" to the JIRA & doing the same for the azure/swift equivalents

          Show
          stevel@apache.org Steve Loughran added a comment - Just flagging that this change broke some downstream code which was creating a config without picking up core-default.xml, then expecting s3a:// URLs to work. That's a pretty bad thing to do, as it means all the standard config options are missing. Adding a note "stop this" to the JIRA & doing the same for the azure/swift equivalents
          Hide
          sseth Siddharth Seth added a comment -

          Steve Loughran - why should s3a entries exist in core-default.xml?
          core-default is supposed to contain defaults for most config values, and serves as documentation.

          If someone wants to use s3a, I'd expect them to explicitly set it up in their Configuration, or rely on the ServiceLoader approach - which this jira reverses.

          Show
          sseth Siddharth Seth added a comment - Steve Loughran - why should s3a entries exist in core-default.xml? core-default is supposed to contain defaults for most config values, and serves as documentation. If someone wants to use s3a, I'd expect them to explicitly set it up in their Configuration, or rely on the ServiceLoader approach - which this jira reverses.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          why should s3a entries exist in core-default.xml?

          Because that's where you set default values which are then overridden in core-site.xml. We don't have any notion of per-FS resources other than hdfs-default.xml and hdfs-site.xml. By putting defaults

          core-default is supposed to contain defaults for most config values, and serves as documentation.

          Exactly. And because it is loaded before core-site.xml, there is a straightforward, easy to understand override mechanism.

          If someone wants to use s3a, I'd expect them to explicitly set it up in their Configuration,

          Well, no. Because that removes the ability for you set options in core-site or elsewhere, including but not limited to fs.s3a.endpoint, all the fs.s3a. security settings, along with many others.

          or rely on the ServiceLoader approach - which this jira reverses.

          The service loader mech for S3a was pulled because

          1. it was a performance hit, especially once we shifted to the fully shaded AWS JAR, that is: the one which stops breaking downstream apps due to forced jackson upgrades, and
          2. the service loader itself was a bit of trouble. HADOOP-12636 is the key one: In Hadoop 2.7.2, if you had hadoop-aws.jar on the CP but not amazon-s3-sdk, clients would fail on startup with a class not found exception during FS static init, even if s3a wasn't used; HADOOP-13323 removed the caught-but-logged entry from loggin gat warn to debug, because even that stack was causing confusion
          3. Finally, as the service loader doesn't register FileContext bindings, so if you used that API to talk to filesystems, those core-default entries were mandatory.

          Because the fs.s3a.impl declaration was already in core-default, the consequence of this introspection was at best, startup delays, at worst, startup failures. So we pulled it. Now any classloader delays are postponed until the first s3a, wasb, adl, swift FS instance is created, which happens if and only if the caller uses the class.

          You have to consider the current service loader a first pass; HADOOP-14132 discusses how to do it better: scan a zero-dependency class file which declares schemas. It could list a per-fs XML resource, but the problem which arises there is the ordering of resources: the FS scan always takes place after the core-default/core-site load, and as Configuration.addDefaultResource() doesn't let you declare an ordering of defaults, any per-fs resource load would stamp over core-default. We'd need to change allow addDefaultResource() to permit a list of before-resources and after-resources to be defined.

          Yes, the consequence of this change is that the fs.s3a.impl class isn't automatically, but if core-default isn't loading, then your code is inevitably going to break in some other way, I'd suspect security being a key point.

          Show
          stevel@apache.org Steve Loughran added a comment - why should s3a entries exist in core-default.xml? Because that's where you set default values which are then overridden in core-site.xml. We don't have any notion of per-FS resources other than hdfs-default.xml and hdfs-site.xml . By putting defaults core-default is supposed to contain defaults for most config values, and serves as documentation. Exactly. And because it is loaded before core-site.xml, there is a straightforward, easy to understand override mechanism. If someone wants to use s3a, I'd expect them to explicitly set it up in their Configuration, Well, no. Because that removes the ability for you set options in core-site or elsewhere, including but not limited to fs.s3a.endpoint , all the fs.s3a. security settings , along with many others. or rely on the ServiceLoader approach - which this jira reverses. The service loader mech for S3a was pulled because it was a performance hit, especially once we shifted to the fully shaded AWS JAR, that is: the one which stops breaking downstream apps due to forced jackson upgrades, and the service loader itself was a bit of trouble. HADOOP-12636 is the key one: In Hadoop 2.7.2, if you had hadoop-aws.jar on the CP but not amazon-s3-sdk, clients would fail on startup with a class not found exception during FS static init, even if s3a wasn't used ; HADOOP-13323 removed the caught-but-logged entry from loggin gat warn to debug, because even that stack was causing confusion Finally, as the service loader doesn't register FileContext bindings, so if you used that API to talk to filesystems, those core-default entries were mandatory. Because the fs.s3a.impl declaration was already in core-default, the consequence of this introspection was at best, startup delays, at worst, startup failures . So we pulled it. Now any classloader delays are postponed until the first s3a, wasb, adl, swift FS instance is created, which happens if and only if the caller uses the class. You have to consider the current service loader a first pass; HADOOP-14132 discusses how to do it better: scan a zero-dependency class file which declares schemas. It could list a per-fs XML resource, but the problem which arises there is the ordering of resources: the FS scan always takes place after the core-default/core-site load, and as Configuration.addDefaultResource() doesn't let you declare an ordering of defaults, any per-fs resource load would stamp over core-default. We'd need to change allow addDefaultResource() to permit a list of before-resources and after-resources to be defined. Yes, the consequence of this change is that the fs.s3a.impl class isn't automatically, but if core-default isn't loading, then your code is inevitably going to break in some other way, I'd suspect security being a key point.
          Hide
          liuml07 Mingliang Liu added a comment -

          Steve gave us very good reasons behind the fact. I second his points. Thanks,

          Show
          liuml07 Mingliang Liu added a comment - Steve gave us very good reasons behind the fact. I second his points. Thanks,
          Hide
          sseth Siddharth Seth added a comment -

          Steve Loughran - I understand the mechanics behind *-default.xml and *-site.xml. When I said "If someone wants to use s3a, I'd expect them to explicitly set it up in their Configuration," - their own Configuration could well be core-site.xml, which will then be loaded by all Hadoop services.

          What I'm asking is why s3a gets special treatment, and an entry in core-default.xml. Along with that, the 5+ additional s3a settings - why do they need to be defined in core-default.xml? Should be possible to have the default values in code. This could be a separate template, which users can include, to get all relevant settings (if custom settings are required). Without custom settings, the service loader approach is sufficient to get s3a functional, as long as the jar is available.

          Hdfs does not have an entry in core-default, and relies upon the ServiceLoader approach. (fs.hdfs.impl does not exist. fs.AbstractFileSystem.hdfs.impl exists - I don't know what this is used for).

          core-default.xml, to me at least, serves more as documentation of defaults. The files can go out of sync with the default values defined in code, YarnConfiguration for example. It takes additional effort to keep the files in sync. There's jiras to remove all the *-default.xml files, in favor of code defaults (I don't expect these to be fixed soon since such changes would be incompatible). For most parameters in these files, the code has default values (all the IPC defaults).
          I suspect nothing has broken so far, because the defaults exist in code.

          In terms of the s3a and service loader problems, HADOOP-14132 sounds like a very good fix to have. If I'm understanding this correctly, general FS operations will be faster if we don't load all filesystems in the clsaspath. I'm worried that we're introducing a new dependency on core-default by making this change, while I think we should be going in the opposite direction and getting rid of dependencies on these files.

          Show
          sseth Siddharth Seth added a comment - Steve Loughran - I understand the mechanics behind *-default.xml and *-site.xml. When I said "If someone wants to use s3a, I'd expect them to explicitly set it up in their Configuration," - their own Configuration could well be core-site.xml, which will then be loaded by all Hadoop services. What I'm asking is why s3a gets special treatment, and an entry in core-default.xml. Along with that, the 5+ additional s3a settings - why do they need to be defined in core-default.xml? Should be possible to have the default values in code. This could be a separate template, which users can include, to get all relevant settings (if custom settings are required). Without custom settings, the service loader approach is sufficient to get s3a functional, as long as the jar is available. Hdfs does not have an entry in core-default, and relies upon the ServiceLoader approach. (fs.hdfs.impl does not exist. fs.AbstractFileSystem.hdfs.impl exists - I don't know what this is used for). core-default.xml, to me at least, serves more as documentation of defaults. The files can go out of sync with the default values defined in code, YarnConfiguration for example. It takes additional effort to keep the files in sync. There's jiras to remove all the *-default.xml files, in favor of code defaults (I don't expect these to be fixed soon since such changes would be incompatible). For most parameters in these files, the code has default values (all the IPC defaults). I suspect nothing has broken so far, because the defaults exist in code. In terms of the s3a and service loader problems, HADOOP-14132 sounds like a very good fix to have. If I'm understanding this correctly, general FS operations will be faster if we don't load all filesystems in the clsaspath. I'm worried that we're introducing a new dependency on core-default by making this change, while I think we should be going in the opposite direction and getting rid of dependencies on these files.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          It doesn't get special treatment: all the filesystems we bundle within the hadoop JARs have long had the explicit import: s3n, swift, etc. That way: add the JAR and there's no need to configure anything new. The dynamic stuff really went in for third party libs, I recall.

          HDFS does different magic. When you create an HdfsConfiguration instance, hdfs-default is registered as a default resource, hdfs-site as an overlay. It's why every server-side bit of HDFS code explicitly creates instances, and it's what clients need to do if they want to read in the hdfs defaults. Yarn does the same; AMRMClientImpl creates a YarnConfiguration instance to force in all its stuff. I do not want to replicate such ugliness elsewhere.

          I'm worried that we're introducing a new dependency on core-default by making this change, while I think we should be going in the opposite direction and getting rid of dependencies on these files.

          You start ignoring core-default and core-site, you stop picking up site kerberos options, whether UGI should init, etc, etc, and at that point I don't think things will work. I don't think people should be doing that.

          Show
          stevel@apache.org Steve Loughran added a comment - It doesn't get special treatment: all the filesystems we bundle within the hadoop JARs have long had the explicit import: s3n, swift, etc. That way: add the JAR and there's no need to configure anything new. The dynamic stuff really went in for third party libs, I recall. HDFS does different magic. When you create an HdfsConfiguration instance, hdfs-default is registered as a default resource, hdfs-site as an overlay. It's why every server-side bit of HDFS code explicitly creates instances, and it's what clients need to do if they want to read in the hdfs defaults. Yarn does the same; AMRMClientImpl creates a YarnConfiguration instance to force in all its stuff. I do not want to replicate such ugliness elsewhere. I'm worried that we're introducing a new dependency on core-default by making this change, while I think we should be going in the opposite direction and getting rid of dependencies on these files. You start ignoring core-default and core-site, you stop picking up site kerberos options, whether UGI should init, etc, etc, and at that point I don't think things will work. I don't think people should be doing that.
          Hide
          sseth Siddharth Seth added a comment -

          I do not want to replicate such ugliness elsewhere.

          Absolutely. The way HDFS, Yarn insert configuration (global default resources) into every instance needs to be avoided.

          You start ignoring core-default and core-site, you stop picking up site kerberos options

          core-site.xml - yes. Core-default.xml is code defaults. Most places access this via - conf.get(PARAMETER_NAME, PARAMETER_DEFAULT). This is what I mean by setting defaults in code, rather than having them picked up from core-default.xml.

          In terms of the FileSystems - looks like we disagree on where they belong. A clean / fast service loader would be the correct approach as far as I'm concerned, and the mechanism already exists. core-default seems like a workaround.

          Show
          sseth Siddharth Seth added a comment - I do not want to replicate such ugliness elsewhere. Absolutely. The way HDFS, Yarn insert configuration (global default resources) into every instance needs to be avoided. You start ignoring core-default and core-site, you stop picking up site kerberos options core-site.xml - yes. Core-default.xml is code defaults. Most places access this via - conf.get(PARAMETER_NAME, PARAMETER_DEFAULT). This is what I mean by setting defaults in code, rather than having them picked up from core-default.xml. In terms of the FileSystems - looks like we disagree on where they belong. A clean / fast service loader would be the correct approach as far as I'm concerned, and the mechanism already exists. core-default seems like a workaround.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          The problem with the current mechanism is that it forces a transitive classload of all dependencies, which is O(filesystem)*O(dependencies); doesn't handle failure well.

          What I'm trying to understand here is "why do you want to ignore core-default.xml, given that sacrifices the ability to allow installations to override it at the core-site layer? It seems you've made that policy decision, and now suffering the consequences. And while the service discovery of S3A was the first place to break, you are still left with the problem of "how to pick up credentials".

          Show
          stevel@apache.org Steve Loughran added a comment - The problem with the current mechanism is that it forces a transitive classload of all dependencies, which is O(filesystem)*O(dependencies); doesn't handle failure well. What I'm trying to understand here is "why do you want to ignore core-default.xml, given that sacrifices the ability to allow installations to override it at the core-site layer? It seems you've made that policy decision, and now suffering the consequences. And while the service discovery of S3A was the first place to break, you are still left with the problem of "how to pick up credentials".
          Hide
          sseth Siddharth Seth added a comment -

          Not reading core-default was a bug, and is fixed. core-site.xml is independent of what is in core-default.xml, if the code defaults were used.

          See: HDFS-4820, HADOOP-7956. While searching for these bugs, there's several others which relate to removing deprecated entries, fixing default values etc from *-default files (This was within the HDFS project itself).
          My concern is that an explicit requirement on the *-default files is getting introduced, while the files should be removed IMO.

          I get the problem with the serviceloaders, and the additional time introduced. Thoughts there was an alternate plan to reduce this cost, while retaining service loaders.

          Could you please elaborate on "How to pick up credentials" - and why that needs to be part of core-default? (And not a conf.get(PARAMETER, DEFAULT)).

          Show
          sseth Siddharth Seth added a comment - Not reading core-default was a bug, and is fixed. core-site.xml is independent of what is in core-default.xml, if the code defaults were used. See: HDFS-4820 , HADOOP-7956 . While searching for these bugs, there's several others which relate to removing deprecated entries, fixing default values etc from *-default files (This was within the HDFS project itself). My concern is that an explicit requirement on the *-default files is getting introduced, while the files should be removed IMO. I get the problem with the serviceloaders, and the additional time introduced. Thoughts there was an alternate plan to reduce this cost, while retaining service loaders. Could you please elaborate on "How to pick up credentials" - and why that needs to be part of core-default? (And not a conf.get(PARAMETER, DEFAULT)).
          Hide
          stevel@apache.org Steve Loughran added a comment -

          those JIRAs are so old they are implicitly dead.

          right now things are, well, confused, as we have both code and XML defaults. The XML ones win. Benefit: easy place to see what is there, one single spot to track down the declarations without having access to the source

          price:

          • the defaults in the source files need to be kept in sync, when they aren't they diverge and people loading in the configs without the defaults get undefined outcomes
          • sometimes those defaults are actually broken (S3AOutputStream temp dir setup prior to HADOOP-13560), and nobody notices
          • if someone has mistyped an entry (HADOOP-12735) then its not valid and nobody knows why.

          Personally, I like the XML file for its one-stop shop. But I don't like the maintenance/sync problem

          Thinking about it, maybe the way to go is an IDL mech which generates the XML, the java constant names and the default value strings. That way: one single thing to work off.

          Either XML -> XSLT -> .java constants file, or see if we can't use protoc, avroc or even, dare I say it, CORBA and the idlc compiler, to go from a .idl declaration to java code. Though none of the protoc/avroc/idlj workflows would generate the core-default.xml file. Our own XML file format, adding in name of generated constant, maybe desired value type if not String, could be used via <xslt> to generate everything. It could be done, though it'd need XSL skills

          Show
          stevel@apache.org Steve Loughran added a comment - those JIRAs are so old they are implicitly dead. right now things are, well, confused, as we have both code and XML defaults. The XML ones win. Benefit: easy place to see what is there, one single spot to track down the declarations without having access to the source price: the defaults in the source files need to be kept in sync, when they aren't they diverge and people loading in the configs without the defaults get undefined outcomes sometimes those defaults are actually broken (S3AOutputStream temp dir setup prior to HADOOP-13560 ), and nobody notices if someone has mistyped an entry ( HADOOP-12735 ) then its not valid and nobody knows why. Personally, I like the XML file for its one-stop shop. But I don't like the maintenance/sync problem Thinking about it, maybe the way to go is an IDL mech which generates the XML, the java constant names and the default value strings. That way: one single thing to work off. Either XML -> XSLT -> .java constants file, or see if we can't use protoc, avroc or even, dare I say it, CORBA and the idlc compiler, to go from a .idl declaration to java code. Though none of the protoc/avroc/idlj workflows would generate the core-default.xml file. Our own XML file format, adding in name of generated constant, maybe desired value type if not String, could be used via <xslt> to generate everything. It could be done, though it'd need XSL skills
          Hide
          jzhuge John Zhuge added a comment -

          I'd like source code with annotation as the source of truth over xml if it is possible.

          DeprecatedProperties.md should be auto-generated as well.

          Show
          jzhuge John Zhuge added a comment - I'd like source code with annotation as the source of truth over xml if it is possible. DeprecatedProperties.md should be auto-generated as well.
          Hide
          fabbri Aaron Fabbri added a comment -

          Steve Loughran's original proposal here seems like the right approach for the short term goal (serious perf regression) to me, especially given the points about the core-default config being required for things like FileContext anyways.. Changing core-default.xml behavior is a big, incompatible change, so I'm glad we're not tackling that up front.

          I personally think the xml default approach works well in practice. Having all the defaults in one spot makes things easy to find, as well as isolating vendor-specific defaults in an easy to change place (versus having to keep commit changes throughout the code). To me, having to change defaults is pretty common (we frequently have to tweak core-default settings for a shipping product), and being able to do that in a default config is very low-friction compared to code changes.

          For the problem of keeping code defaults and core-default.xml up to date, some good ideas here. Another idea is this: Let humans keep them in sync, and enforce with a pre-commit check script.

          Show
          fabbri Aaron Fabbri added a comment - Steve Loughran 's original proposal here seems like the right approach for the short term goal (serious perf regression) to me, especially given the points about the core-default config being required for things like FileContext anyways.. Changing core-default.xml behavior is a big, incompatible change, so I'm glad we're not tackling that up front. I personally think the xml default approach works well in practice. Having all the defaults in one spot makes things easy to find, as well as isolating vendor-specific defaults in an easy to change place (versus having to keep commit changes throughout the code). To me, having to change defaults is pretty common (we frequently have to tweak core-default settings for a shipping product), and being able to do that in a default config is very low-friction compared to code changes. For the problem of keeping code defaults and core-default.xml up to date, some good ideas here. Another idea is this: Let humans keep them in sync, and enforce with a pre-commit check script.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          +use XSD to define schema, allow for type of defval to be declared too.

          <option name="fs.s3a.experimental.input.fadvise" type="String" field="INPUT_FADVISE" default="INPUT_FADVISE_DEFAULT" deprecated="false" 
          stability="unstable"
          scope="public">
          <description>
          Which input strategy to use for buffering, seeking and similar when reading data.
          </description>
          <value>
          normal
          </value>
          </option> 
          

          (or, if you declare the fieldname, the DEFAULT value falls out). scope (public, private) would be @ scope attribute; deprecated would set the deprecated tag on both

          
            /**
             * Which input strategy to use for buffering, seeking and similar when
             * reading data.
             * Value: {@value}
             */
            @InterfaceStability.Unstable
            @InterfaceAudience.Public
            public static final String INPUT_FADVISE =
                "fs.s3a.experimental.input.fadvise";
          
            /** Default value for {@link #INPUT_FADVISE}: value: (@value}. */
            @InterfaceAudience.Public
            @InterfaceStability.Unstable
            public static final String INPUT_FADVISE_DEFAULT = "normal";
          

          Build wise you'd need a new src/xml area, a build section calling ant for <schemavalidate> then <xslt> to generate things. Oh, and we need somebody who understands XSL.

          Show
          stevel@apache.org Steve Loughran added a comment - +use XSD to define schema, allow for type of defval to be declared too. <option name= "fs.s3a.experimental.input.fadvise" type= " String " field= "INPUT_FADVISE" default = "INPUT_FADVISE_DEFAULT" deprecated= " false " stability= "unstable" scope= " public " > <description> Which input strategy to use for buffering, seeking and similar when reading data. </description> <value> normal </value> </option> (or, if you declare the fieldname, the DEFAULT value falls out). scope (public, private) would be @ scope attribute; deprecated would set the deprecated tag on both /** * Which input strategy to use for buffering, seeking and similar when * reading data. * Value: {@value} */ @InterfaceStability.Unstable @InterfaceAudience.Public public static final String INPUT_FADVISE = "fs.s3a.experimental.input.fadvise" ; /** Default value for {@link #INPUT_FADVISE}: value: (@value}. */ @InterfaceAudience.Public @InterfaceStability.Unstable public static final String INPUT_FADVISE_DEFAULT = "normal" ; Build wise you'd need a new src/xml area, a build section calling ant for <schemavalidate> then <xslt> to generate things. Oh, and we need somebody who understands XSL.
          Hide
          sseth Siddharth Seth added a comment -

          those JIRAs are so old they are implicitly dead.

          Don't think they're any less relevant today, than they were when they were filed.
          Realistically though, the jiras will likely not be fixed - 1) Incompatible, and incompatible in a manner that is not easy to find since this is not a compilation breakage. 2) Someone needs to actually put in some work to make this happen.

          To me, having to change defaults is pretty common (we frequently have to tweak core-default settings for a shipping product), and being able to do that in a default config is very low-friction compared to code changes.

          Isn't that what the site files are for?

          A lot of people consider the core-default files as documentation. Available Config, Default Value, Description.
          In Tez we went the approach of explicitly not having a default file, and generated an output file from the code defaults.
          Hive uses a nice approach where HiveConf.get(ParamName) implicitly picks up default values. No *-default.xml file here either.

          That said, if we're moving to discussing core-default.xml vs Code defaults - probably needs a wider audience.

          The change helps with performance, so that's really good. Think this affects simple invocations like hadoop fs -ls, and it's really good to see this run faster. Hoping that a longer term change to fix service loaders goes in. Unfortunately will not be able to contribute, the patch in any case.

          Show
          sseth Siddharth Seth added a comment - those JIRAs are so old they are implicitly dead. Don't think they're any less relevant today, than they were when they were filed. Realistically though, the jiras will likely not be fixed - 1) Incompatible, and incompatible in a manner that is not easy to find since this is not a compilation breakage. 2) Someone needs to actually put in some work to make this happen. To me, having to change defaults is pretty common (we frequently have to tweak core-default settings for a shipping product), and being able to do that in a default config is very low-friction compared to code changes. Isn't that what the site files are for? A lot of people consider the core-default files as documentation. Available Config, Default Value, Description. In Tez we went the approach of explicitly not having a default file, and generated an output file from the code defaults. Hive uses a nice approach where HiveConf.get(ParamName) implicitly picks up default values. No *-default.xml file here either. That said, if we're moving to discussing core-default.xml vs Code defaults - probably needs a wider audience. The change helps with performance, so that's really good. Think this affects simple invocations like hadoop fs -ls, and it's really good to see this run faster. Hoping that a longer term change to fix service loaders goes in. Unfortunately will not be able to contribute, the patch in any case.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          A lot of people consider the core-default files as documentation

          they are, but on the basis that you need to create a Configuration(true) for the basics of talking to anything in a Hadoop cluster, they are effectively a declarative configuration point.

          Hive uses a nice approach where HiveConf.get(ParamName) implicitly picks up default values. No *-default.xml file here either.

          Big issue in Hadoop core is there is no central config point: Configuration, HdfsConfiguration, YarnConfiguration, JTConf, plus lots of other bits through the code. For bonus fun, different projects have different world views on visibility of even fieldnames, with the HDFS project considering even HdfsClientConfigKeys to be private data.

          w.r.t opening this up, if someone actually makes a commit to allocate time to do this, and others to review it, then it's worth doing. Otherwise it'll just be another abandoned wish list item. While I find the current situation annoying, I have enough half-complete spare-time-work items to get in to worry about this. Just load Configuration with default=true and not worry about it.

          Show
          stevel@apache.org Steve Loughran added a comment - A lot of people consider the core-default files as documentation they are, but on the basis that you need to create a Configuration(true) for the basics of talking to anything in a Hadoop cluster, they are effectively a declarative configuration point. Hive uses a nice approach where HiveConf.get(ParamName) implicitly picks up default values. No *-default.xml file here either. Big issue in Hadoop core is there is no central config point: Configuration, HdfsConfiguration, YarnConfiguration, JTConf, plus lots of other bits through the code. For bonus fun, different projects have different world views on visibility of even fieldnames, with the HDFS project considering even HdfsClientConfigKeys to be private data. w.r.t opening this up, if someone actually makes a commit to allocate time to do this, and others to review it, then it's worth doing. Otherwise it'll just be another abandoned wish list item. While I find the current situation annoying, I have enough half-complete spare-time-work items to get in to worry about this. Just load Configuration with default=true and not worry about it.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Actually, doing it in code would work with some @Annotations: have 1+ class which defines the constants, have something else which scans for annotated classes, enumerates all constants declared their and generates the XML and MD from it

          @DefaultValues
          public class CoreDefaults {
          
          @Attribute
          public static final Attribute FS_DEFAULT_NAME = new Attribute("fs.default.name", "file://", "default filesystem", "URI"}
          }
          

          You could then programatically get at FS_DEFAULT_NAME.key, FS_DEFAULT_NAME.value, FS_DEFAULT_NAME.type, FS_DEFAULT_NAME.description, use that for generating things, or just hooking up the IDEs.

          And you could transit from existing declarations by having the existing code reference the specific fields.

          This would work for core, hdfs. MR. Where it would be somewhat complicated is in things like S3A., where the constants are currently set in the hadoop-aws module. All the S3A constants would have to be initially defined in hadoop-common, so that they would go into core-default.

          Show
          stevel@apache.org Steve Loughran added a comment - Actually, doing it in code would work with some @Annotations: have 1+ class which defines the constants, have something else which scans for annotated classes, enumerates all constants declared their and generates the XML and MD from it @DefaultValues public class CoreDefaults { @Attribute public static final Attribute FS_DEFAULT_NAME = new Attribute( "fs. default .name" , "file: //" , " default filesystem" , "URI" } } You could then programatically get at FS_DEFAULT_NAME.key , FS_DEFAULT_NAME.value , FS_DEFAULT_NAME.type , FS_DEFAULT_NAME.description , use that for generating things, or just hooking up the IDEs. And you could transit from existing declarations by having the existing code reference the specific fields. This would work for core, hdfs. MR. Where it would be somewhat complicated is in things like S3A., where the constants are currently set in the hadoop-aws module. All the S3A constants would have to be initially defined in hadoop-common, so that they would go into core-default.

            People

            • Assignee:
              stevel@apache.org Steve Loughran
              Reporter:
              stevel@apache.org Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development