Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      current mapreduce api is using RawComparator interface in:

      setGroupingComparatorClass
      setSortComparatorClass

      This interface has no lifecycle support. I propose to change that methods to take argument new class implements RawComparator with setup and cleanup methods.

      This will leave existing code with RawComparator alone, Providing some backward compatibility.

      new class:

      class SortComparator implements RawComparator

        Activity

        Hide
        Radim Kolar added a comment -

        yes, taskattempt level lifecycle (same as mapper lifecycle) will be enough. I am doing secondary sorting according to live data from datagrid.

        Show
        Radim Kolar added a comment - yes, taskattempt level lifecycle (same as mapper lifecycle) will be enough. I am doing secondary sorting according to live data from datagrid.
        Hide
        Luke Lu added a comment -

        You still haven't explained why lifecycle is needed for comparators? It seems to me that you only need task level lifecycle, which should work with JVM reuse as well.

        Show
        Luke Lu added a comment - You still haven't explained why lifecycle is needed for comparators? It seems to me that you only need task level lifecycle, which should work with JVM reuse as well.
        Hide
        Radim Kolar added a comment -

        Actually it can be made backward compatible. Accept old style comparators and do runtime check if they are instance of new class and then invoke lifecycle methods.

        Show
        Radim Kolar added a comment - Actually it can be made backward compatible. Accept old style comparators and do runtime check if they are instance of new class and then invoke lifecycle methods.
        Hide
        Radim Kolar added a comment -

        for example you can not use datagrid in comparator because if you start it in setConf() you can shut it down, which is kinda huge memory leak if JVM reuse is enabled. It will also block JVM exit unless hadoop is using System.exit() call.

        Show
        Radim Kolar added a comment - for example you can not use datagrid in comparator because if you start it in setConf() you can shut it down, which is kinda huge memory leak if JVM reuse is enabled. It will also block JVM exit unless hadoop is using System.exit() call.
        Hide
        Radim Kolar added a comment -

        point of having lifecycle for comparators means that you can do more elaborate actions in them. They can become 1st class component instead of just mere helper class for comparing bytes. Most important is to be able to do runtime link with other components.

        why you have lifecycle in mapper/reducer?

        Show
        Radim Kolar added a comment - point of having lifecycle for comparators means that you can do more elaborate actions in them. They can become 1st class component instead of just mere helper class for comparing bytes. Most important is to be able to do runtime link with other components. why you have lifecycle in mapper/reducer?
        Hide
        Radim Kolar added a comment -

        Actually entire API including mapper/reducer could be reworked into standard JEE component which means:

        1. make it java bean
        2. @PostConstruct annotated initmethod
        3. @PreDestroy annotated shutdown method

        and container will handle super classes and init/shutdown them in right order. But i do not think that community is capable of making such design decisions, so i am aiming for something simple and similar to an existing code.

        Show
        Radim Kolar added a comment - Actually entire API including mapper/reducer could be reworked into standard JEE component which means: 1. make it java bean 2. @PostConstruct annotated initmethod 3. @PreDestroy annotated shutdown method and container will handle super classes and init/shutdown them in right order. But i do not think that community is capable of making such design decisions, so i am aiming for something simple and similar to an existing code.
        Hide
        Harsh J added a comment -

        Use case for cleanup() method is in my case: to disconnect from transaction manager, shutdown datagrid, shutdown spring context.

        This sounds generic, external framework-tied. How does this benefit a simple comparator though?

        Show
        Harsh J added a comment - Use case for cleanup() method is in my case: to disconnect from transaction manager, shutdown datagrid, shutdown spring context. This sounds generic, external framework-tied. How does this benefit a simple comparator though?
        Hide
        Radim Kolar added a comment -

        No, partitioner change is backward compatible. This is not. Also i am not merging tickets because this will delay review process, which is slow enough already. They do not depends on each other.

        Use case for cleanup() method is in my case: to disconnect from transaction manager, shutdown datagrid, shutdown spring context.

        Show
        Radim Kolar added a comment - No, partitioner change is backward compatible. This is not. Also i am not merging tickets because this will delay review process, which is slow enough already. They do not depends on each other. Use case for cleanup() method is in my case: to disconnect from transaction manager, shutdown datagrid, shutdown spring context.
        Hide
        Harsh J added a comment -

        I feel all your proposed "setup and cleanup" method addition work may as well go to a single JIRA cause they all almost border on making interface changes - and seem to have common motives (just spring again, or is there also a valid usecase to have a cleanup method behind a simple class such as a comparator?).

        Show
        Harsh J added a comment - I feel all your proposed "setup and cleanup" method addition work may as well go to a single JIRA cause they all almost border on making interface changes - and seem to have common motives (just spring again, or is there also a valid usecase to have a cleanup method behind a simple class such as a comparator?).

          People

          • Assignee:
            Unassigned
            Reporter:
            Radim Kolar
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development