Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: java
    • Labels:
      None
    1. AVRO-606.v5.txt
      28 kB
      Patrick Wendell
    2. AVRO-606.v4.txt
      28 kB
      Patrick Wendell
    3. AVRO-606.v3.txt
      28 kB
      Patrick Wendell
    4. AVRO-606.v2.txt
      27 kB
      Patrick Wendell
    5. AVRO-606.v1.txt
      107 kB
      Patrick Wendell

      Issue Links

        Activity

        Hide
        Patrick Wendell added a comment -

        This is the full diff from origin/HEAD and includes outstanding code for AVRO-595.

        The exact set of new files (but only a subset of all of the changes) are shown in this git hash:
        http://github.com/pwendell/Avro/commit/c3127daa936a74db66b817ea97c6b9f5c1e24997

        This patch is a preliminary file-based storage mechanism for Avro Trace.

        Show
        Patrick Wendell added a comment - This is the full diff from origin/HEAD and includes outstanding code for AVRO-595 . The exact set of new files (but only a subset of all of the changes) are shown in this git hash: http://github.com/pwendell/Avro/commit/c3127daa936a74db66b817ea97c6b9f5c1e24997 This patch is a preliminary file-based storage mechanism for Avro Trace.
        Hide
        Patrick Wendell added a comment -

        This is the same patch, but only the incremental diff from AVRO-595 which has been committed.

        Show
        Patrick Wendell added a comment - This is the same patch, but only the incremental diff from AVRO-595 which has been committed.
        Hide
        Philip Zeyliger added a comment -

        simply drop oldest file if

        Nit: missing "the", and perhaps insert a "we" in there, to indicate that this is what the code does.

        private Long currentTimestamp

        Why not "long"?

        private static int SECONDS_PER_FILE = 60 * 10; // ten minute chunks

        Consider making this configurable.

        catch (InterruptedException e1) { I think (http://www.ibm.com/developerworks/java/library/j-jtp05236.html) you're supposed to call Thread.currentThread().interrupt(); at this point. I kind of hate this part of Java. You could also allow yourself to get interrupted, as a flow control mechanism. Look around to see what other Hadoop worker threads do. bq. } catch (IOException e) {

        You should probably log the exception.

        You should also probably catch all Throwables, and log those too. If you don't, your thread tends to die, and nobody tends to know where it went.

        this.spansSoFar -= spansPerFile.get(oldest);

        Is this always the same as this.spansSoFar = 0?

        if (createNewFile) {

        Wouldn't you want to set spansSoFar to 0 if you're writing a new file?

        + private static String TRACE_FILE_DIR = "/tmp";

        This stuff ought to be configurable.

        Thread writer;

        Why not private?

        + Boolean writerEnabled;

        Why not primitive type?

        private static void addFileSpans

        this feels like "readFileSpans()" more so than "add".

        private static void addFileSpans

        You could do binary search here, but for now it's probably fine to just indicate that it's linear.

        public static void setFileGranularityForTesting(int granularity) {

        Package-private (default) is probably good enough here.

        protected void finalize() {

        Finalize is almost always a bad idea; why is this here?

        // TODO Auto-generated method stub

        Throw UnsupportedOperationException() here.

        * @param start UNIX time (in nanoseconds) as a long

        Huh? I thought this was millis/1000, i.e., seconds and not nanos.

        Show
        Philip Zeyliger added a comment - simply drop oldest file if Nit: missing "the", and perhaps insert a "we" in there, to indicate that this is what the code does. private Long currentTimestamp Why not "long"? private static int SECONDS_PER_FILE = 60 * 10; // ten minute chunks Consider making this configurable. catch (InterruptedException e1) { I think (http://www.ibm.com/developerworks/java/library/j-jtp05236.html) you're supposed to call Thread.currentThread().interrupt(); at this point. I kind of hate this part of Java. You could also allow yourself to get interrupted, as a flow control mechanism. Look around to see what other Hadoop worker threads do. bq. } catch (IOException e) { You should probably log the exception. You should also probably catch all Throwables, and log those too. If you don't, your thread tends to die, and nobody tends to know where it went. this.spansSoFar -= spansPerFile.get(oldest); Is this always the same as this.spansSoFar = 0? if (createNewFile) { Wouldn't you want to set spansSoFar to 0 if you're writing a new file? + private static String TRACE_FILE_DIR = "/tmp"; This stuff ought to be configurable. Thread writer; Why not private? + Boolean writerEnabled; Why not primitive type? private static void addFileSpans this feels like "readFileSpans()" more so than "add". private static void addFileSpans You could do binary search here, but for now it's probably fine to just indicate that it's linear. public static void setFileGranularityForTesting(int granularity) { Package-private (default) is probably good enough here. protected void finalize() { Finalize is almost always a bad idea; why is this here? // TODO Auto-generated method stub Throw UnsupportedOperationException() here. * @param start UNIX time (in nanoseconds) as a long Huh? I thought this was millis/1000, i.e., seconds and not nanos.
        Hide
        Philip Zeyliger added a comment -

        Pressed "add" quite early.

        Looks reasonable, for the most part. It feels a bit odd that the reader and writer of these log directories are the same class, still. That might be worth splitting up at some point. Instead of keeping a list of all previous files, you could also simply scan the directory and look around. Note that it's possible to add some metadata to AVRO data files when you open them, so you might store the open time in the header. (The close time would require a seek...). That way, you could retrieve spans across restarts.

        Show
        Philip Zeyliger added a comment - Pressed "add" quite early. Looks reasonable, for the most part. It feels a bit odd that the reader and writer of these log directories are the same class, still. That might be worth splitting up at some point. Instead of keeping a list of all previous files, you could also simply scan the directory and look around. Note that it's possible to add some metadata to AVRO data files when you open them, so you might store the open time in the header. (The close time would require a seek...). That way, you could retrieve spans across restarts.
        Hide
        Patrick Wendell added a comment -

        this.spansSoFar -= spansPerFile.get(oldest);
        Is this always the same as this.spansSoFar = 0?

        No. spansSoFar tracks spans stored in all files which is what the limit applies to. So this is just subtracting the spans of the most recently deleted file.

        Wouldn't you want to set spansSoFar to 0 if you're writing a new file?

        Same answer.

        Boolean writerEnabled;
        Why not primitive type?

        This is passed by reference to the thread so that the parent thread can signal the child to stop if it wishes. Is there a better way to do this?

        @param start UNIX time (in nanoseconds) as a long
        Huh? I thought this was millis/1000, i.e., seconds and not nanos.

        I changed from milli to nanoseconds to support implementations which have better clock granularity. Currently I don't think any systems offer granularity better than microseconds (1*10^-6) but don't see any reason not to be more specific if there is no additional overhead.

        Show
        Patrick Wendell added a comment - this.spansSoFar -= spansPerFile.get(oldest); Is this always the same as this.spansSoFar = 0? No. spansSoFar tracks spans stored in all files which is what the limit applies to. So this is just subtracting the spans of the most recently deleted file. Wouldn't you want to set spansSoFar to 0 if you're writing a new file? Same answer. Boolean writerEnabled; Why not primitive type? This is passed by reference to the thread so that the parent thread can signal the child to stop if it wishes. Is there a better way to do this? @param start UNIX time (in nanoseconds) as a long Huh? I thought this was millis/1000, i.e., seconds and not nanos. I changed from milli to nanoseconds to support implementations which have better clock granularity. Currently I don't think any systems offer granularity better than microseconds (1*10^-6) but don't see any reason not to be more specific if there is no additional overhead.
        Hide
        Patrick Wendell added a comment -

        Hey Phil,

        I addressed all of the small things.

        For persistent storage - this gets very difficult to do if we don't enforce that there is only one TracePlugin running at a time. If multiple TracePlugin's are running at the same time, and they both are managing the files in that directory, we don't have any good way to coordinate the deletion of files. It turns into a consistency nightmare.

        I have some code that does persistent storage in a local branch, but I'm going to punt it until we decide to make this into a Factory method, or figure out in more detail the context in which a TracePlugin is going to run.

        • Patrick
        Show
        Patrick Wendell added a comment - Hey Phil, I addressed all of the small things. For persistent storage - this gets very difficult to do if we don't enforce that there is only one TracePlugin running at a time. If multiple TracePlugin's are running at the same time, and they both are managing the files in that directory, we don't have any good way to coordinate the deletion of files. It turns into a consistency nightmare. I have some code that does persistent storage in a local branch, but I'm going to punt it until we decide to make this into a Factory method, or figure out in more detail the context in which a TracePlugin is going to run. Patrick
        Hide
        Philip Zeyliger added a comment -

        Looks good.

        I don't see "writerEnabled" being used, except to set it to true once. Am I missing something? You probably need a volatile for it. I tend to use methods instead of fields for this sort of control (disableWriter() or what-not), so as to abstract away the implementation.

        – Philip

        Show
        Philip Zeyliger added a comment - Looks good. I don't see "writerEnabled" being used, except to set it to true once. Am I missing something? You probably need a volatile for it. I tend to use methods instead of fields for this sort of control (disableWriter() or what-not), so as to abstract away the implementation. – Philip
        Hide
        Patrick Wendell added a comment -

        Yep I was actually in the process of deleting that variable but must have left a few references around. For now, we actually don't have any case where we need to explicitly disable the storage, so just keeping it out.

        Show
        Patrick Wendell added a comment - Yep I was actually in the process of deleting that variable but must have left a few references around. For now, we actually don't have any case where we need to explicitly disable the storage, so just keeping it out.
        Hide
        Patrick Wendell added a comment -

        Tweaked one test which was failing occasionally. Also, patched this on a clean checkout and tests/checksyle pass, FYI.

        Show
        Patrick Wendell added a comment - Tweaked one test which was failing occasionally. Also, patched this on a clean checkout and tests/checksyle pass, FYI.
        Hide
        Philip Zeyliger added a comment -

        I just committed this as r985363. Thanks for your contribution, Patrick!

        Show
        Philip Zeyliger added a comment - I just committed this as r985363. Thanks for your contribution, Patrick!

          People

          • Assignee:
            Patrick Wendell
            Reporter:
            Patrick Wendell
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development