Hadoop Common
  1. Hadoop Common
  2. HADOOP-6909

Umbrella ticket for reducing the number of libraries in Common

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      There seems to be some consensus that moving libraries out of Common, with the eventual goal of getting rid of the Common package all together, is a good idea: http://hadoop.markmail.org/thread/jhiactdekhywznac. Hopefully we can collect the incremental tasks needed to achieve that goal here.

        Issue Links

          Activity

          Hide
          Jeff Hammerbacher added a comment -

          Doug also mentions on that thread that the metrics and fs packages might also be moved to Jakarta Commons.

          Show
          Jeff Hammerbacher added a comment - Doug also mentions on that thread that the metrics and fs packages might also be moved to Jakarta Commons.
          Hide
          Arun C Murthy added a comment -

          Moving fs to HDFS might be reasonable, but moving metrics to Commons means we will have another dependency, plus it isn't clear metrics is usable in a wider context.

          Show
          Arun C Murthy added a comment - Moving fs to HDFS might be reasonable, but moving metrics to Commons means we will have another dependency, plus it isn't clear metrics is usable in a wider context.
          Hide
          Arun C Murthy added a comment -

          There seems to be some consensus that moving libraries out of Common

          I'm sure I'd call it 'consensus' yet, we need a discussion before we reach consensus.

          Show
          Arun C Murthy added a comment - There seems to be some consensus that moving libraries out of Common I'm sure I'd call it 'consensus' yet, we need a discussion before we reach consensus.
          Hide
          Jeff Hammerbacher added a comment -

          I'm sure I'd call it 'consensus' yet, we need a discussion before we reach consensus.

          Sure, and hopefully that discussion can take place here, to avoid derailing the thread on combining the committer lists.

          Show
          Jeff Hammerbacher added a comment - I'm sure I'd call it 'consensus' yet, we need a discussion before we reach consensus. Sure, and hopefully that discussion can take place here, to avoid derailing the thread on combining the committer lists.
          Hide
          Arun C Murthy added a comment -

          Yep.

          Show
          Arun C Murthy added a comment - Yep.
          Hide
          Jeff Hammerbacher added a comment -

          Also, I apologize if I jumped the gun a little early, I became irrationally exuberant when I thought about moving to a generic configuration system.

          Show
          Jeff Hammerbacher added a comment - Also, I apologize if I jumped the gun a little early, I became irrationally exuberant when I thought about moving to a generic configuration system.
          Hide
          Jay Booth added a comment -

          Well, if we're encourage to derail this thread with config-related stuff, what do people think about a variable-substituted properties file instead of XML?

          #comment
          key=value
          #other comment
          key2=$

          {key}
          Show
          Jay Booth added a comment - Well, if we're encourage to derail this thread with config-related stuff, what do people think about a variable-substituted properties file instead of XML? #comment key=value #other comment key2=$ {key}
          Hide
          Arun C Murthy added a comment -

          Jay, can we please use HADOOP-6910 for discussions on configuration? Thanks.

          Show
          Arun C Murthy added a comment - Jay, can we please use HADOOP-6910 for discussions on configuration? Thanks.
          Hide
          Owen O'Malley added a comment -

          Moving FileSystem and FileContext to Jakarta is a non-starter. Having it in Common where we are all committers is hard enough.

          We've discussed moving FileSystem to HDFS, but that is both problematic (it moves a lot of non-HDFS stuff into HDFS) and breaks the abstraction that HDFS implements the contract required by FileSystem.

          I don't think this is a good direction.

          Show
          Owen O'Malley added a comment - Moving FileSystem and FileContext to Jakarta is a non-starter. Having it in Common where we are all committers is hard enough. We've discussed moving FileSystem to HDFS, but that is both problematic (it moves a lot of non-HDFS stuff into HDFS) and breaks the abstraction that HDFS implements the contract required by FileSystem. I don't think this is a good direction.
          Hide
          Konstantin Shvachko added a comment -
          • Getting rid of common, as Dough proposed, is a good idea, imo.
          • I think it makes sense to move fs APIs like FileSystem and FileContext into HDFS. The same way as mapreduce APIs like Mapper and Reducer are in mapreduce project.
          • I looked at commons.vfs packages. There are clearly some similarities with fs.FileSystem, but the differences are substantial. I believe one can implement commons.vfs.HDFS using hadoop.fs.FileSystem. But using commons.vfs as a new API for HDFS would be a big, incompatible, hard to justify change, which also lacks intrinsic for HDFS apis like getBlockLocations().
            So at this point I don't understand what the proposal of moving fs packages into Jakarta Commons means.
          Show
          Konstantin Shvachko added a comment - Getting rid of common, as Dough proposed, is a good idea, imo. I think it makes sense to move fs APIs like FileSystem and FileContext into HDFS. The same way as mapreduce APIs like Mapper and Reducer are in mapreduce project. I looked at commons.vfs packages. There are clearly some similarities with fs.FileSystem, but the differences are substantial. I believe one can implement commons.vfs.HDFS using hadoop.fs.FileSystem . But using commons.vfs as a new API for HDFS would be a big, incompatible, hard to justify change, which also lacks intrinsic for HDFS apis like getBlockLocations(). So at this point I don't understand what the proposal of moving fs packages into Jakarta Commons means.
          Hide
          Ranjit Mathew added a comment -

          I think it makes sense to move fs APIs like FileSystem and FileContext into HDFS. The same way as mapreduce APIs like Mapper and Reducer are in mapreduce project.

          I believe HDFS is a file-system supported by Hadoop and it could potentially end up supporting others (e.g Ceph via HADOOP-6253).
          So it makes sense to have a common FS API that then has a concrete implementation in HDFS.

          Show
          Ranjit Mathew added a comment - I think it makes sense to move fs APIs like FileSystem and FileContext into HDFS. The same way as mapreduce APIs like Mapper and Reducer are in mapreduce project. I believe HDFS is a file-system supported by Hadoop and it could potentially end up supporting others (e.g Ceph via HADOOP-6253 ). So it makes sense to have a common FS API that then has a concrete implementation in HDFS.
          Hide
          Allen Wittenauer added a comment -

          Resolved? Won't Fix?

          You decide.

          But with the 3 or 4 project splits, mavenization, etc, etc, that's happened since this was filed, it isn't particularly relevant anymore.

          Show
          Allen Wittenauer added a comment - Resolved? Won't Fix? You decide. But with the 3 or 4 project splits, mavenization, etc, etc, that's happened since this was filed, it isn't particularly relevant anymore.

            People

            • Assignee:
              Unassigned
              Reporter:
              Jeff Hammerbacher
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development