Lucene - Core
  1. Lucene - Core
  2. LUCENE-6564

Fix thread safety in PrintStreamInfoStream, unify logging format with tests

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: core/index, general/test
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Noticed while debugging some IW output in an unit test that milliseconds were not output in the date, changed this to reuse the date format used by PrintStreamInfoStream.

      1. LUCENE-6564.patch
        3 kB
        Uwe Schindler
      2. LUCENE-6564.patch
        2 kB
        Michael McCandless
      3. LUCENE-6564.patch
        2 kB
        Ramkumar Aiyengar
      4. LUCENE-6564-5x.patch
        3 kB
        Uwe Schindler
      5. LUCENE-6564-trunk.patch
        3 kB
        Uwe Schindler

        Activity

        Hide
        Ramkumar Aiyengar added a comment -

        Trivial patch to change this..

        Show
        Ramkumar Aiyengar added a comment - Trivial patch to change this..
        Hide
        ASF subversion and git services added a comment -

        Commit 1685431 from Ramkumar Aiyengar in branch 'dev/trunk'
        [ https://svn.apache.org/r1685431 ]

        LUCENE-6564: Add milliseconds to timestamp in IW output for tests

        Show
        ASF subversion and git services added a comment - Commit 1685431 from Ramkumar Aiyengar in branch 'dev/trunk' [ https://svn.apache.org/r1685431 ] LUCENE-6564 : Add milliseconds to timestamp in IW output for tests
        Hide
        ASF subversion and git services added a comment -

        Commit 1685432 from Ramkumar Aiyengar in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1685432 ]

        LUCENE-6564: Add milliseconds to timestamp in IW output for tests

        Show
        ASF subversion and git services added a comment - Commit 1685432 from Ramkumar Aiyengar in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1685432 ] LUCENE-6564 : Add milliseconds to timestamp in IW output for tests
        Hide
        Robert Muir added a comment -

        I don't think we should make this change. SimpleDateFormat is not thread safe.

        Show
        Robert Muir added a comment - I don't think we should make this change. SimpleDateFormat is not thread safe.
        Hide
        Ramkumar Aiyengar added a comment -

        Will revert. PrintStreamInfoStream does the same, it's not required to be thread safe? I can just instantiate this with every message, is that fine? This is used only with tests..

        Show
        Ramkumar Aiyengar added a comment - Will revert. PrintStreamInfoStream does the same, it's not required to be thread safe? I can just instantiate this with every message, is that fine? This is used only with tests..
        Hide
        Robert Muir added a comment -

        What was the problem with 'new Date()'. To me that is the simplest. In general this is just a diagnostics mechanism for IW. we shouldnt be writing tests where you have to read massive amounts of logs.

        Show
        Robert Muir added a comment - What was the problem with 'new Date()'. To me that is the simplest. In general this is just a diagnostics mechanism for IW. we shouldnt be writing tests where you have to read massive amounts of logs.
        Hide
        Ramkumar Aiyengar added a comment -

        In this particular case, `new Date()` in my locale at least was not showing milliseconds, and that was hindering investigation.

        Show
        Ramkumar Aiyengar added a comment - In this particular case, `new Date()` in my locale at least was not showing milliseconds, and that was hindering investigation.
        Hide
        Robert Muir added a comment -

        The bug was introduced here: LUCENE-6119

        It should be 'new Date' in all cases, and avoid any sync or use of simpledateformat. I really really really really do not want this logging shit hiding bugs.

        Show
        Robert Muir added a comment - The bug was introduced here: LUCENE-6119 It should be 'new Date' in all cases, and avoid any sync or use of simpledateformat. I really really really really do not want this logging shit hiding bugs.
        Hide
        Robert Muir added a comment -

        I am also fine with removing any time information entirely: it doesnt really seem necessary to print times here. Its just a diagnostics thing and a sequential order of events.

        Show
        Robert Muir added a comment - I am also fine with removing any time information entirely: it doesnt really seem necessary to print times here. Its just a diagnostics thing and a sequential order of events.
        Hide
        Ramkumar Aiyengar added a comment -

        Frankly, removing date information seems better, either we should have something with a reasonable accuracy, or not have it at all.. Second resolution is not reasonable IMO..

        The other option here is to use a threadlocal, but you could argue that adds complexity as well..

        Show
        Ramkumar Aiyengar added a comment - Frankly, removing date information seems better, either we should have something with a reasonable accuracy, or not have it at all.. Second resolution is not reasonable IMO.. The other option here is to use a threadlocal, but you could argue that adds complexity as well..
        Hide
        ASF subversion and git services added a comment -

        Commit 1685461 from Ramkumar Aiyengar in branch 'dev/trunk'
        [ https://svn.apache.org/r1685461 ]

        Revert r1685431: LUCENE-6564: Add milliseconds to timestamp in IW output for tests

        Show
        ASF subversion and git services added a comment - Commit 1685461 from Ramkumar Aiyengar in branch 'dev/trunk' [ https://svn.apache.org/r1685461 ] Revert r1685431: LUCENE-6564 : Add milliseconds to timestamp in IW output for tests
        Hide
        ASF subversion and git services added a comment -

        Commit 1685462 from Ramkumar Aiyengar in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1685462 ]

        Revert r1685432: LUCENE-6564: Add milliseconds to timestamp in IW output for tests

        Show
        ASF subversion and git services added a comment - Commit 1685462 from Ramkumar Aiyengar in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1685462 ] Revert r1685432: LUCENE-6564 : Add milliseconds to timestamp in IW output for tests
        Hide
        Michael McCandless added a comment -

        Oh that's bad bad bad that SimpleDateFormat is not thread safe.

        But I do find msec time to be helpful when debugging tests ... I don't think we should make tests even harder to debug.

        What if we switch to a simple "seconds since start"? E.g.:

        IW 2 [0.0327s; main]: now flush at close
        IW 2 [0.0328s; main]:   start flush: applyAllDeletes=true
        IW 2 [0.0329s; main]:   index before flush
        DW 2 [0.0330s; main]: startFullFlush
        DW 2 [0.0331s; main]: anyChanges? numDocsInRam=1 deletes=false hasTickets:false pendingChangesInFullFlush: false
        DWFC 2 [0.0333s; main]: addFlushableState DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_0, aborted=false, numDocsInRAM=1, deleteQueue=DWDQ: [ generation: 0 ]]
        DWPT 2 [0.0351s; main]: flush postings as segment _0 numDocs=1
        DWPT 2 [0.0604s; main]: new segment has 0 deleted docs
        DWPT 2 [0.0606s; main]: new segment has no vectors; norms; no docValues; prox; freqs
        DWPT 2 [0.0607s; main]: flushedFiles=[_0.nvd, _0.fdx, _0_Memory_0.ram, _0.fdt, _0.nvm, _0.fnm]
        
        Show
        Michael McCandless added a comment - Oh that's bad bad bad that SimpleDateFormat is not thread safe. But I do find msec time to be helpful when debugging tests ... I don't think we should make tests even harder to debug. What if we switch to a simple "seconds since start"? E.g.: IW 2 [0.0327s; main]: now flush at close IW 2 [0.0328s; main]: start flush: applyAllDeletes=true IW 2 [0.0329s; main]: index before flush DW 2 [0.0330s; main]: startFullFlush DW 2 [0.0331s; main]: anyChanges? numDocsInRam=1 deletes=false hasTickets:false pendingChangesInFullFlush: false DWFC 2 [0.0333s; main]: addFlushableState DocumentsWriterPerThread [pendingDeletes=gen=0, segment=_0, aborted=false, numDocsInRAM=1, deleteQueue=DWDQ: [ generation: 0 ]] DWPT 2 [0.0351s; main]: flush postings as segment _0 numDocs=1 DWPT 2 [0.0604s; main]: new segment has 0 deleted docs DWPT 2 [0.0606s; main]: new segment has no vectors; norms; no docValues; prox; freqs DWPT 2 [0.0607s; main]: flushedFiles=[_0.nvd, _0.fdx, _0_Memory_0.ram, _0.fdt, _0.nvm, _0.fnm]
        Hide
        Michael McCandless added a comment -

        How about this? I just do elapsed time since the info stream was created, using System.nanoTime, and I cutover to String.format.

        Show
        Michael McCandless added a comment - How about this? I just do elapsed time since the info stream was created, using System.nanoTime, and I cutover to String.format.
        Hide
        Michael McCandless added a comment -

        Probably doesn't matter but the patch is against 5.x...

        Show
        Michael McCandless added a comment - Probably doesn't matter but the patch is against 5.x...
        Hide
        Uwe Schindler added a comment -

        There is also another possibility to get milliseconds without a date format: http://docs.oracle.com/javase/7/docs/api/java/nio/file/attribute/FileTime.html#toString():

        FileTime.fromMillies(...).toString()
        

        (this is new since Java 7 and somehow "misuses" the FileTime API, but it might be useful here. It also uses UTC and uses the generic ISO8601 format!

        The other possibility is using a ThreadLocal:

        static final ThreadLocal<DateFormat> FORMAT = ThreadLocal.withInitial(() -> new SimpleDateFormat(..., Locale, Timezone));
        
        Show
        Uwe Schindler added a comment - There is also another possibility to get milliseconds without a date format: http://docs.oracle.com/javase/7/docs/api/java/nio/file/attribute/FileTime.html#toString() : FileTime.fromMillies(...).toString() (this is new since Java 7 and somehow "misuses" the FileTime API, but it might be useful here. It also uses UTC and uses the generic ISO8601 format! The other possibility is using a ThreadLocal: static final ThreadLocal<DateFormat> FORMAT = ThreadLocal.withInitial(() -> new SimpleDateFormat(..., Locale, Timezone));
        Hide
        Mark Miller added a comment -

        The other possibility is using a ThreadLocal:

        That seems to be a fairly std practice for SimpleDateFormat usage - kind of silly though.

        Show
        Mark Miller added a comment - The other possibility is using a ThreadLocal: That seems to be a fairly std practice for SimpleDateFormat usage - kind of silly though.
        Hide
        Uwe Schindler added a comment - - edited

        Because of this, I would suggest to keep everything like it is and just "misuse" java.nio.file.attribute.FileTime for formatting as ISO-8601 timestamp. So patch is quite easy...

        Show
        Uwe Schindler added a comment - - edited Because of this, I would suggest to keep everything like it is and just "misuse" java.nio.file.attribute.FileTime for formatting as ISO-8601 timestamp. So patch is quite easy...
        Hide
        Ramkumar Aiyengar added a comment -

        +1 for Uwe's suggestion, even though it's misuse, it perfectly fits our use. And even if Java changes this, we can always remove or go to something more complicated..

        Show
        Ramkumar Aiyengar added a comment - +1 for Uwe's suggestion, even though it's misuse, it perfectly fits our use. And even if Java changes this, we can always remove or go to something more complicated..
        Hide
        Uwe Schindler added a comment -

        And even if Java changes this, we can always remove or go to something more complicated..

        It won't change as the behaviour is specified in the Javadocs, so the format of FileTime#toString() cannot change without breaking backwards!

        I will provide a patch in a minute!

        Show
        Uwe Schindler added a comment - And even if Java changes this, we can always remove or go to something more complicated.. It won't change as the behaviour is specified in the Javadocs, so the format of FileTime#toString() cannot change without breaking backwards! I will provide a patch in a minute!
        Hide
        Uwe Schindler added a comment -

        Here my patch.

        The output looks like this:

        $ ant test -Dtestcase=TestIndexWriter -Dtests.verbose=true
        ...
           [junit4]   1> DW 108 [2015-06-15T17:06:45.2Z; main]: publishFlushedSegment seg-private updates=null
           [junit4]   1> IW 108 [2015-06-15T17:06:45.201Z; main]: publishFlushedSegment
           [junit4]   1> IW 108 [2015-06-15T17:06:45.201Z; main]: publish sets newSegment delGen=3 seg=_1(6.0.0):C1
           [junit4]   1> IFD 108 [2015-06-15T17:06:45.201Z; main]: now checkpoint "_0(6.0.0):C1 _1(6.0.0):C1" [2 segments ; isCommit = false]
           [junit4]   1> IFD 108 [2015-06-15T17:06:45.202Z; main]: 0 msec to checkpoint
           [junit4]   1> IW 108 [2015-06-15T17:06:45.202Z; main]: apply all deletes during flush
           [junit4]   1> IW 108 [2015-06-15T17:06:45.203Z; main]: now apply all deletes for all segments maxDoc=2
           [junit4]   1> BD 108 [2015-06-15T17:06:45.203Z; main]: applyDeletes: open segment readers took 0 msec
           [junit4]   1> BD 108 [2015-06-15T17:06:45.203Z; main]: applyDeletes: no segments; skipping
           [junit4]   1> BD 108 [2015-06-15T17:06:45.203Z; main]: prune sis=segments_1: _0(6.0.0):C1 _1(6.0.0):C1 minGen=1 packetCount=0
           [junit4]   1> DW 108 [2015-06-15T17:06:45.204Z; main]: TEST-TestIndexWriter.testDeleteUnusedFiles2-seed#[4A62A887A4681DB2] finishFullFlush success=true
           [junit4]   1> TMP 108 [2015-06-15T17:06:45.204Z; main]: findMerges: 2 segments
           [junit4]   1> TMP 108 [2015-06-15T17:06:45.205Z; main]:   seg=_0(6.0.0):C1 size=0.001 MB [floored]
           [junit4]   1> TMP 108 [2015-06-15T17:06:45.205Z; main]:   seg=_1(6.0.0):C1 size=0.001 MB [floored]
           [junit4]   1> TMP 108 [2015-06-15T17:06:45.206Z; main]:   allowedSegmentCount=1 vs count=2 (eligible count=2) tooBigCount=0
           [junit4]   1> IW 108 [2015-06-15T17:06:45.206Z; main]: startCommit(): start
           [junit4]   1> IW 108 [2015-06-15T17:06:45.206Z; main]: startCommit index=_0(6.0.0):C1 _1(6.0.0):C1 changeCount=7
           [junit4]   1> IW 108 [2015-06-15T17:06:45.208Z; main]: startCommit: wrote pending segments file "pending_segments_2"
        
        Show
        Uwe Schindler added a comment - Here my patch. The output looks like this: $ ant test -Dtestcase=TestIndexWriter -Dtests.verbose=true ... [junit4] 1> DW 108 [2015-06-15T17:06:45.2Z; main]: publishFlushedSegment seg-private updates=null [junit4] 1> IW 108 [2015-06-15T17:06:45.201Z; main]: publishFlushedSegment [junit4] 1> IW 108 [2015-06-15T17:06:45.201Z; main]: publish sets newSegment delGen=3 seg=_1(6.0.0):C1 [junit4] 1> IFD 108 [2015-06-15T17:06:45.201Z; main]: now checkpoint "_0(6.0.0):C1 _1(6.0.0):C1" [2 segments ; isCommit = false] [junit4] 1> IFD 108 [2015-06-15T17:06:45.202Z; main]: 0 msec to checkpoint [junit4] 1> IW 108 [2015-06-15T17:06:45.202Z; main]: apply all deletes during flush [junit4] 1> IW 108 [2015-06-15T17:06:45.203Z; main]: now apply all deletes for all segments maxDoc=2 [junit4] 1> BD 108 [2015-06-15T17:06:45.203Z; main]: applyDeletes: open segment readers took 0 msec [junit4] 1> BD 108 [2015-06-15T17:06:45.203Z; main]: applyDeletes: no segments; skipping [junit4] 1> BD 108 [2015-06-15T17:06:45.203Z; main]: prune sis=segments_1: _0(6.0.0):C1 _1(6.0.0):C1 minGen=1 packetCount=0 [junit4] 1> DW 108 [2015-06-15T17:06:45.204Z; main]: TEST-TestIndexWriter.testDeleteUnusedFiles2-seed#[4A62A887A4681DB2] finishFullFlush success=true [junit4] 1> TMP 108 [2015-06-15T17:06:45.204Z; main]: findMerges: 2 segments [junit4] 1> TMP 108 [2015-06-15T17:06:45.205Z; main]: seg=_0(6.0.0):C1 size=0.001 MB [floored] [junit4] 1> TMP 108 [2015-06-15T17:06:45.205Z; main]: seg=_1(6.0.0):C1 size=0.001 MB [floored] [junit4] 1> TMP 108 [2015-06-15T17:06:45.206Z; main]: allowedSegmentCount=1 vs count=2 (eligible count=2) tooBigCount=0 [junit4] 1> IW 108 [2015-06-15T17:06:45.206Z; main]: startCommit(): start [junit4] 1> IW 108 [2015-06-15T17:06:45.206Z; main]: startCommit index=_0(6.0.0):C1 _1(6.0.0):C1 changeCount=7 [junit4] 1> IW 108 [2015-06-15T17:06:45.208Z; main]: startCommit: wrote pending segments file "pending_segments_2"
        Hide
        Uwe Schindler added a comment -

        By the way: In Lucene Trunk we can use the new Java 8 Datetime API (Jodatime fork) to get an ISO datestamp without misusing an API. We should maybe use the new API in Java 8 to implement getTimeStamp(): LocalDateTime.now().toString() (in local time), or by giving a timezone.

        Show
        Uwe Schindler added a comment - By the way: In Lucene Trunk we can use the new Java 8 Datetime API (Jodatime fork) to get an ISO datestamp without misusing an API. We should maybe use the new API in Java 8 to implement getTimeStamp(): LocalDateTime.now().toString() (in local time), or by giving a timezone.
        Hide
        Mark Miller added a comment -

        I would love to calculate the developer time / cost ratio of this ticket Gotta love Open Source.

        Show
        Mark Miller added a comment - I would love to calculate the developer time / cost ratio of this ticket Gotta love Open Source.
        Hide
        Michael McCandless added a comment -

        +1 to Uwe Schindler's patch! But can you add a comment explaining why we abuse the FileTime API?

        Separately it's kinda cool that TestIndexWriter makes at least 108 IW instances...

        Show
        Michael McCandless added a comment - +1 to Uwe Schindler 's patch! But can you add a comment explaining why we abuse the FileTime API? Separately it's kinda cool that TestIndexWriter makes at least 108 IW instances...
        Hide
        Uwe Schindler added a comment -

        I would love to calculate the developer time / cost ratio of this ticket Gotta love Open Source.

        LOL. Yes, oh yes! But we also fixed a bug. Because the SImpleDateFormat used in the PrintStreamInfoStream was not thread safe at all! So we also fixed a bug. Thanks to Robert for pointing this out (unfortunately ForbiddenApis cannot detect multithreaded use of SimpleDateFormat....)

        But can you add a comment explaining why we abuse the FileTime API?

        Will do. I am out of office, will prepare this later.

        Show
        Uwe Schindler added a comment - I would love to calculate the developer time / cost ratio of this ticket Gotta love Open Source. LOL. Yes, oh yes! But we also fixed a bug. Because the SImpleDateFormat used in the PrintStreamInfoStream was not thread safe at all! So we also fixed a bug. Thanks to Robert for pointing this out (unfortunately ForbiddenApis cannot detect multithreaded use of SimpleDateFormat....) But can you add a comment explaining why we abuse the FileTime API? Will do. I am out of office, will prepare this later.
        Hide
        Uwe Schindler added a comment -

        Here patch for 5.x and trunk. In trunk I use the Java 8 Datetime API with Instant.now().toString(), in 5.x I added a comment why misusing FileTime.

        Please have in mind that the output is always in UTC. In Java 8 we could use local time, too, but this would make it identical for both cases. In addition, while running tests, we randomize the timezone, so the "local" date is bit useless...

        Any comments?

        Show
        Uwe Schindler added a comment - Here patch for 5.x and trunk. In trunk I use the Java 8 Datetime API with Instant.now().toString(), in 5.x I added a comment why misusing FileTime. Please have in mind that the output is always in UTC. In Java 8 we could use local time, too, but this would make it identical for both cases. In addition, while running tests, we randomize the timezone, so the "local" date is bit useless... Any comments?
        Hide
        Erick Erickson added a comment -

        Never, never, never use local time ... Since I'm always running the log file through a program to figure things out, I'll put any conversions I need there.

        FWIW

        Show
        Erick Erickson added a comment - Never, never, never use local time ... Since I'm always running the log file through a program to figure things out, I'll put any conversions I need there. FWIW
        Hide
        ASF subversion and git services added a comment -

        Commit 1685675 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1685675 ]

        LUCENE-6564: Change PrintStreamInfoStream to use thread safe Java 8 ISO-8601 date formatting (in Lucene 5.x use Java 7 FileTime#toString as workaround); fix output of tests to use same format

        Show
        ASF subversion and git services added a comment - Commit 1685675 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1685675 ] LUCENE-6564 : Change PrintStreamInfoStream to use thread safe Java 8 ISO-8601 date formatting (in Lucene 5.x use Java 7 FileTime#toString as workaround); fix output of tests to use same format
        Hide
        ASF subversion and git services added a comment -

        Commit 1685676 from Uwe Schindler in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1685676 ]

        Merged revision(s) 1685675 from lucene/dev/trunk:
        LUCENE-6564: Change PrintStreamInfoStream to use thread safe Java 8 ISO-8601 date formatting (in Lucene 5.x use Java 7 FileTime#toString as workaround); fix output of tests to use same format

        Show
        ASF subversion and git services added a comment - Commit 1685676 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1685676 ] Merged revision(s) 1685675 from lucene/dev/trunk: LUCENE-6564 : Change PrintStreamInfoStream to use thread safe Java 8 ISO-8601 date formatting (in Lucene 5.x use Java 7 FileTime#toString as workaround); fix output of tests to use same format
        Hide
        Uwe Schindler added a comment -

        I committed the Java 8 and Java 7 variants to Trunk and 5.x

        Show
        Uwe Schindler added a comment - I committed the Java 8 and Java 7 variants to Trunk and 5.x
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Ramkumar Aiyengar
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development