Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6813

WebHdfsFileSystem#OffsetUrlInputStream should implement PositionedReadable with thead-safe.

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.6.0
    • Fix Version/s: None
    • Component/s: webhdfs
    • Labels:
      None

      Description

      PositionedReadable definition requires the implementations for its interfaces should be thread-safe.

      OffsetUrlInputStream(WebHdfsFileSystem inputstream) doesn't implement these interfaces with tread-safe, this JIRA is to fix this.

        Issue Links

          Activity

          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12659533/HDFS-6813.001.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
          org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7546//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7546//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659533/HDFS-6813.001.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7546//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7546//console This message is automatically generated.
          Hide
          umamaheswararao Uma Maheswara Rao G added a comment - - edited

          I think from PositionedReadable doc, this looks reasonable to me. But I also noticed FsInputStream also have the APIs with out synchronized. Also the read api in DFSInputStream also not synchronized, but not sure that was left without synchronization intentionally.

          Tsz Wo Nicholas Sze, can you please confirm if there is any reason for not synchronized and did not follow the PositionedReadable java doc? Thanks

          Show
          umamaheswararao Uma Maheswara Rao G added a comment - - edited I think from PositionedReadable doc, this looks reasonable to me. But I also noticed FsInputStream also have the APIs with out synchronized. Also the read api in DFSInputStream also not synchronized, but not sure that was left without synchronization intentionally. Tsz Wo Nicholas Sze , can you please confirm if there is any reason for not synchronized and did not follow the PositionedReadable java doc? Thanks
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          The javadoc in PositionedReadable may be out dated. Even DFSInputStream does not seem to be thread safe.

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - The javadoc in PositionedReadable may be out dated. Even DFSInputStream does not seem to be thread safe.
          Hide
          hitliuyi Yi Liu added a comment -

          Hi Tsz Wo Nicholas Sze, thanks for your comments. I think the original intention in DFSInputStream is to make thread-safe for PositionedReadable implementation, we can see there are synchronized for other methods; normal read is synchronized, and there is no reason that positioned read is not synchronized (few object variables are used in this method, maybe forget to synchronize?). I have checked other inputstreams in Hadoop which implement PositionedReadable almost are thread-safe. If we don't want thread-safe, it's better to remove synchronized, then it's a bit more efficient.

          What's your suggestion about this?

          Show
          hitliuyi Yi Liu added a comment - Hi Tsz Wo Nicholas Sze , thanks for your comments. I think the original intention in DFSInputStream is to make thread-safe for PositionedReadable implementation, we can see there are synchronized for other methods; normal read is synchronized, and there is no reason that positioned read is not synchronized (few object variables are used in this method, maybe forget to synchronize?). I have checked other inputstreams in Hadoop which implement PositionedReadable almost are thread-safe. If we don't want thread-safe, it's better to remove synchronized, then it's a bit more efficient. What's your suggestion about this?
          Hide
          szetszwo Tsz Wo Nicholas Sze added a comment -

          I think requiring PositionedReadable to be thread-safe may be overkill. I believe few user applications need thread-safe and, if it is needed, it is very easy to be synchronized in the user applications. Just have checked Java InputStream and DataInputStream, they are not enforcing thread-safe. The DataInputStream javadoc even emphasizes that it is not thread-safe:

          DataInputStream is not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class.

          I suggest simple update the javadoc of PositionedReadable. What do you think?

          Show
          szetszwo Tsz Wo Nicholas Sze added a comment - I think requiring PositionedReadable to be thread-safe may be overkill. I believe few user applications need thread-safe and, if it is needed, it is very easy to be synchronized in the user applications. Just have checked Java InputStream and DataInputStream, they are not enforcing thread-safe. The DataInputStream javadoc even emphasizes that it is not thread-safe: DataInputStream is not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class. I suggest simple update the javadoc of PositionedReadable. What do you think?
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Nicholas, can you look at HADOOP-10938 and see if we can pull this discussion together.

          For HBase they are looking at what could be done for concurrent HDFS operations, that is what can be relaxed/unrelaxed.

          The native IO streams are all generally wrapped by RawLocal and checksummed filesystems, so they get their sync semantics from there.

          What we do have, however is a model in which {{getPos() }} can block until a potentially remote, long-haul IO operation completes, which I think is an anomaly.

          Show
          stevel@apache.org Steve Loughran added a comment - Nicholas, can you look at HADOOP-10938 and see if we can pull this discussion together. For HBase they are looking at what could be done for concurrent HDFS operations, that is what can be relaxed/unrelaxed. The native IO streams are all generally wrapped by RawLocal and checksummed filesystems, so they get their sync semantics from there. What we do have, however is a model in which {{getPos() }} can block until a potentially remote, long-haul IO operation completes, which I think is an anomaly.
          Hide
          hitliuyi Yi Liu added a comment -

          Thanks Tsz Wo Nicholas Sze

          if it is needed, it is very easy to be synchronized in the user applications

          Agree.

          I suggest simple update the javadoc of PositionedReadable. What do you think?

          I opened a new JIRA HADOOP-10938 for javadoc. Let's discuss with Steve Loughran and other guys together to see what thread safety mode we want. Then we can decide whether to modify the javadoc or make implementations thread-safe.

          Show
          hitliuyi Yi Liu added a comment - Thanks Tsz Wo Nicholas Sze if it is needed, it is very easy to be synchronized in the user applications Agree. I suggest simple update the javadoc of PositionedReadable. What do you think? I opened a new JIRA HADOOP-10938 for javadoc. Let's discuss with Steve Loughran and other guys together to see what thread safety mode we want. Then we can decide whether to modify the javadoc or make implementations thread-safe.
          Hide
          hitliuyi Yi Liu added a comment -

          Hi Tsz Wo Nicholas Sze and Steve Loughran, after further thinking, I think we need to make thread-safe in DFSInputStream, my points are:

          • Pread in org.apache.hadoop.hdfs.DFSInputStream doesn't modify the pos, so we have opportunity to make read and pread concurrent. We need to synchronize few object variables like failures, we can define local variable to resolve this problem.
            If Pread doesn't access object variable, then it's thread-safe and we don't need to synchronize it with other Ops like read, seek.
          • If we leave the synchronization to user applications, then read and pread cannot be concurrent.
          Show
          hitliuyi Yi Liu added a comment - Hi Tsz Wo Nicholas Sze and Steve Loughran , after further thinking, I think we need to make thread-safe in DFSInputStream, my points are: Pread in org.apache.hadoop.hdfs.DFSInputStream doesn't modify the pos, so we have opportunity to make read and pread concurrent. We need to synchronize few object variables like failures , we can define local variable to resolve this problem. If Pread doesn't access object variable, then it's thread-safe and we don't need to synchronize it with other Ops like read, seek. If we leave the synchronization to user applications, then read and pread cannot be concurrent.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 patch 0m 0s The patch command could not apply the patch during dryrun.



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12659533/HDFS-6813.001.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / f1a152c
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10640/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12659533/HDFS-6813.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f1a152c Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10640/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 patch 0m 0s The patch command could not apply the patch during dryrun.



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12659533/HDFS-6813.001.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / f1a152c
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10645/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12659533/HDFS-6813.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f1a152c Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10645/console This message was automatically generated.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Moving bugs out of previously closed releases into the next minor release 2.8.0.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Moving bugs out of previously closed releases into the next minor release 2.8.0.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 patch 0m 5s HDFS-6813 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Issue HDFS-6813
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12659533/HDFS-6813.001.patch
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18072/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 5s HDFS-6813 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HDFS-6813 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12659533/HDFS-6813.001.patch Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18072/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

            People

            • Assignee:
              hitliuyi Yi Liu
              Reporter:
              hitliuyi Yi Liu
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:

                Development