Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.4, 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved.

      1. LUCENE-3218.patch
        62 kB
        Simon Willnauer
      2. LUCENE-3218.patch
        83 kB
        Simon Willnauer
      3. LUCENE-3218.patch
        104 kB
        Simon Willnauer
      4. LUCENE-3218.patch
        104 kB
        Simon Willnauer
      5. LUCENE-3218_3x.patch
        101 kB
        Simon Willnauer
      6. LUCENE-3218_tests.patch
        3 kB
        Robert Muir
      7. LUCENE-3218_test_fix.patch
        3 kB
        Simon Willnauer
      8. LUCENE-3218.patch
        63 kB
        Simon Willnauer
      9. LUCENE-3218.patch
        69 kB
        Simon Willnauer
      10. LUCENE-3218.patch
        69 kB
        Simon Willnauer

        Issue Links

          Activity

          Hide
          Simon Willnauer added a comment -

          I am closing this - we are not backporting this to 3.x and its committed & stable on trunk

          Show
          Simon Willnauer added a comment - I am closing this - we are not backporting this to 3.x and its committed & stable on trunk
          Hide
          Robert Muir added a comment -

          not a blocker, it was pulled from 3.x (and fixed in trunk)

          Show
          Robert Muir added a comment - not a blocker, it was pulled from 3.x (and fixed in trunk)
          Hide
          Simon Willnauer added a comment -

          I committed this to trunk. I will leave this issue open until we decide to backport to 3.x.

          simon

          Show
          Simon Willnauer added a comment - I committed this to trunk. I will leave this issue open until we decide to backport to 3.x. simon
          Hide
          Simon Willnauer added a comment -

          new patch, I renamed IndexInputHandle to IndexInputSlicer and made the createSlicer method public otherwise Directory impls outside of o.a.l.store can not delegate to it.

          I would also rename CFIndexInput to SliceIndexInput, it's private so does not matter, but wozuld be nice to have.

          done

          Otherwise I agree with committing to trunk. As far as I see, the format did not change in trunk, so once we get this back into 3.x we are at the state pre-revert?

          yes that's true.

          I think is ready to commit, if nobody objects I am going to commit this later today.

          Show
          Simon Willnauer added a comment - new patch, I renamed IndexInputHandle to IndexInputSlicer and made the createSlicer method public otherwise Directory impls outside of o.a.l.store can not delegate to it. I would also rename CFIndexInput to SliceIndexInput, it's private so does not matter, but wozuld be nice to have. done Otherwise I agree with committing to trunk. As far as I see, the format did not change in trunk, so once we get this back into 3.x we are at the state pre-revert? yes that's true. I think is ready to commit, if nobody objects I am going to commit this later today.
          Hide
          Uwe Schindler added a comment -

          I would also rename CFIndexInput to SliceIndexInput, it's private so does not matter, but wozuld be nice to have.

          Otherwise I agree with committing to trunk. As far as I see, the format did not change in trunk, so once we get this back into 3.x we are at the state pre-revert?

          Show
          Uwe Schindler added a comment - I would also rename CFIndexInput to SliceIndexInput, it's private so does not matter, but wozuld be nice to have. Otherwise I agree with committing to trunk. As far as I see, the format did not change in trunk, so once we get this back into 3.x we are at the state pre-revert?
          Hide
          Simon Willnauer added a comment -

          I don't really like the name IndexInputHandle what about

          • IndexInputFactory
          • IndexInputProducer
          • IndexInputSlicer

          more ideas?

          Show
          Simon Willnauer added a comment - I don't really like the name IndexInputHandle what about IndexInputFactory IndexInputProducer IndexInputSlicer more ideas?
          Hide
          Simon Willnauer added a comment -

          Why don't we get it committed to trunk and let it chill for a while, let it hit random testing for a while, get used by adventurous users, and then make the decision?

          +1

          Show
          Simon Willnauer added a comment - Why don't we get it committed to trunk and let it chill for a while, let it hit random testing for a while, get used by adventurous users, and then make the decision? +1
          Hide
          Mark Miller added a comment -

          this seems close, the question is if we want to backport this to 3.x too?

          Why don't we get it committed to trunk and let it chill for a while, let it hit random testing for a while, get used by adventurous users, and then make the decision?

          Show
          Mark Miller added a comment - this seems close, the question is if we want to backport this to 3.x too? Why don't we get it committed to trunk and let it chill for a while, let it hit random testing for a while, get used by adventurous users, and then make the decision?
          Hide
          Michael McCandless added a comment -

          This approach looks nice! Maybe rename IndexInputHandle to
          IndexInputProvider? IndexInputSlicer? SliceCreator?

          Maybe rename CSIndexInput -> SlicedIndexInput?

          In SimpleFSDir we may as well move that static Descriptor class out?
          Rather than having to import it to itself.

          Show
          Michael McCandless added a comment - This approach looks nice! Maybe rename IndexInputHandle to IndexInputProvider? IndexInputSlicer? SliceCreator? Maybe rename CSIndexInput -> SlicedIndexInput? In SimpleFSDir we may as well move that static Descriptor class out? Rather than having to import it to itself.
          Hide
          Simon Willnauer added a comment -

          next iteration

          • removed 3.4 support for the current format (cfe file) in SI
          • regenerated the BW test indices (not in the patch)
          • add javadoc for IndexInputHandle
          • added IndexInputHandle#openFullSlice to get a slice spanning the entire file.
          • Track indexInputHandle instances in MockDirectoryWrapper to ensure they are closed.
          • Use the IndexInputHandle ie. the underlying file handle to create all streams in CFS (uwes suggestion - thanks for that)

          I didn't include the generated indices for bw tests in the patch for size / readability. Yet, if you want to run the tests you need to generate them otherwise TestBackwardsCompatibility will fail.

          this seems close, the question is if we want to backport this to 3.x too?

          Show
          Simon Willnauer added a comment - next iteration removed 3.4 support for the current format (cfe file) in SI regenerated the BW test indices (not in the patch) add javadoc for IndexInputHandle added IndexInputHandle#openFullSlice to get a slice spanning the entire file. Track indexInputHandle instances in MockDirectoryWrapper to ensure they are closed. Use the IndexInputHandle ie. the underlying file handle to create all streams in CFS (uwes suggestion - thanks for that) I didn't include the generated indices for bw tests in the patch for size / readability. Yet, if you want to run the tests you need to generate them otherwise TestBackwardsCompatibility will fail. this seems close, the question is if we want to backport this to 3.x too?
          Hide
          Robert Muir added a comment -

          This looks really nice and easier to understand.

          I agree! At a glance, it seems this design looks much better to and avoids the sneaky delegation problems.

          Only one minor thing glancing at the patch: in MockDirectoryWrapper, can we track that the handle is actually closed?

          Show
          Robert Muir added a comment - This looks really nice and easier to understand. I agree! At a glance, it seems this design looks much better to and avoids the sneaky delegation problems. Only one minor thing glancing at the patch: in MockDirectoryWrapper, can we track that the handle is actually closed?
          Hide
          Uwe Schindler added a comment -

          Hi Simon,

          thanks for taking care. This looks really nice and easier to understand. I agree, the problem with the RAF open file is hard to manage (especially when to close it).

          One small suggestion: Currently the CFS file is opened twice: One time to read the contents and a second time to read the actual files using the handle (and for new format to read the CFE file, but thats unavoidable - once we nuke old index support in Lucene 5, we can always open the cfe first and read the contents, but until then we need to do both). Why not open the IndexInputHandle at the beginning and then simply request a full slice for the directory initialization (or ideally only that part that contains the directory)? The slice can then be closed afterwards as before.

          So very cool work!
          Greetings from Berkeley!

          Show
          Uwe Schindler added a comment - Hi Simon, thanks for taking care. This looks really nice and easier to understand. I agree, the problem with the RAF open file is hard to manage (especially when to close it). One small suggestion: Currently the CFS file is opened twice: One time to read the contents and a second time to read the actual files using the handle (and for new format to read the CFE file, but thats unavoidable - once we nuke old index support in Lucene 5, we can always open the cfe first and read the contents, but until then we need to do both). Why not open the IndexInputHandle at the beginning and then simply request a full slice for the directory initialization (or ideally only that part that contains the directory)? The slice can then be closed afterwards as before. So very cool work! Greetings from Berkeley!
          Hide
          Simon Willnauer added a comment -

          here is a rough patch against trunk that detaches CFDirectory from other dir impls but still extends Directory. all optimizations still remain while always the top-level directory is used to create / open the CFS.
          I didn't follow uwes last approach since I could figure out when to close the RAF reference since like in the MMap case we want to forcefully unmap the file and therefor would also need to close the RAF reference in the base stream. I use a helper construct (IndexInputHandle - yes need to find a better name) that can only be pulled from another directory (protected). So this is private to the directory impls but solves our cases here nicely I think.

          still rough but its a start.

          Show
          Simon Willnauer added a comment - here is a rough patch against trunk that detaches CFDirectory from other dir impls but still extends Directory. all optimizations still remain while always the top-level directory is used to create / open the CFS. I didn't follow uwes last approach since I could figure out when to close the RAF reference since like in the MMap case we want to forcefully unmap the file and therefor would also need to close the RAF reference in the base stream. I use a helper construct (IndexInputHandle - yes need to find a better name) that can only be pulled from another directory (protected). So this is private to the directory impls but solves our cases here nicely I think. still rough but its a start.
          Hide
          Simon Willnauer added a comment -

          Showuld we send an email to java-user as the index format in the stable branch changed by this (indexes with new CFS files can no longer be read)?

          I will do

          Show
          Simon Willnauer added a comment - Showuld we send an email to java-user as the index format in the stable branch changed by this (indexes with new CFS files can no longer be read)? I will do
          Hide
          Uwe Schindler added a comment -

          Showuld we send an email to java-user as the index format in the stable branch changed by this (indexes with new CFS files can no longer be read)?

          Show
          Uwe Schindler added a comment - Showuld we send an email to java-user as the index format in the stable branch changed by this (indexes with new CFS files can no longer be read)?
          Hide
          Simon Willnauer added a comment -

          FYI - I backed out the changes from 3.x

          Show
          Simon Willnauer added a comment - FYI - I backed out the changes from 3.x
          Hide
          Mark Miller added a comment -

          Isn't there a lot of middle ground here? Why don't we just back out of 3.x for now and keep pushing towards a consensus implementation on trunk?

          Show
          Mark Miller added a comment - Isn't there a lot of middle ground here? Why don't we just back out of 3.x for now and keep pushing towards a consensus implementation on trunk?
          Hide
          Simon Willnauer added a comment -

          None of this is reasonable.

          your unreasonable comments here are totally counter productive IMO. Just my $0.05

          Show
          Simon Willnauer added a comment - None of this is reasonable. your unreasonable comments here are totally counter productive IMO. Just my $0.05
          Hide
          Simon Willnauer added a comment -

          its all yours do whatever you think needs to be done. have fun

          Show
          Simon Willnauer added a comment - its all yours do whatever you think needs to be done. have fun
          Hide
          Mark Miller added a comment -

          +1 on backing out of 3.x at least - this is our stable branch...I can't imagine this optimization belongs in our stable branch given all of this discussion...

          Show
          Mark Miller added a comment - +1 on backing out of 3.x at least - this is our stable branch...I can't imagine this optimization belongs in our stable branch given all of this discussion...
          Hide
          Michael McCandless added a comment -

          I think both Simon's and Uwe's ideas are good and should be explored! With all these ideas we will find a clean way to get CFS reading/writing integrated into Directory.

          But I think that exploration should just be outside of trunk and 3.x, eg on a branch. Once we iterate to a good point again we can commit it back to trunk, let it bake/age, then merge back to 3.x if it seems stable.

          Show
          Michael McCandless added a comment - I think both Simon's and Uwe's ideas are good and should be explored! With all these ideas we will find a clean way to get CFS reading/writing integrated into Directory. But I think that exploration should just be outside of trunk and 3.x, eg on a branch. Once we iterate to a good point again we can commit it back to trunk, let it bake/age, then merge back to 3.x if it seems stable.
          Hide
          Uwe Schindler added a comment -

          But we can still consider this as solutions to solve the issue later? I just dont want to make suggestions with lots of brainwork and sleepless nights involved, if it's not considered and just be backed out with "None of this is reasonable.".

          Show
          Uwe Schindler added a comment - But we can still consider this as solutions to solve the issue later? I just dont want to make suggestions with lots of brainwork and sleepless nights involved, if it's not considered and just be backed out with "None of this is reasonable.".
          Hide
          Robert Muir added a comment -

          None of this is reasonable.

          When something goes wrong with an optimization, and multiple people ask for you to back it out, back it out.

          then later we can discuss how to re-implement it.

          Show
          Robert Muir added a comment - None of this is reasonable. When something goes wrong with an optimization, and multiple people ask for you to back it out, back it out. then later we can discuss how to re-implement it.
          Hide
          Uwe Schindler added a comment -

          Hi when thinking about the whole stuff one more time again, I may have a solution to again decouple CFS from the parent directory, so one can create any CFS using one single class (but perhaps the factory in directory is still an idea to make it customizable). There are several solutions, but most of them have customization problems:

          • The current approach was discussed already, nothing more to say
          • A possibility to make it possible for MMap to map certain parts of the file is to move the getIndexInputSlice up to the abstract Directory base class and make the default implementation the current CFIndexInput from the default CFS impl. This would be even backwards compatible. So the CFS impl can simply ask the parent directory it warps for a slice. The problem here is easy: Current CFS impl opens the CFS file exactly one time and consumes exactly one file handle. The slices work on the same file handle. If we move the slice handling up to the directory, the "state" is gone, so handling the all-the-time open CFS file cannot be managed anymore. When using a new file handle for each slice, we gain nothing (CFS is to reduce file handles).
          • Last night I had one idea that might fix this issue. Lets move the slice handling into the abstract IndexInput base class, again the default impl would simply use the current CFIndexInput to return a slice. In the case of MMapIndexInput it would simply return a remapped slice on the current file handle. The only thing that would change is that the RAF would kept open the wohle time (like MMapCFDirectory does), in contrast to curren, where th RAF is closed directly after mapping. This approach would allow it for the CFS impl to simply ask it parant directory for an IndexInput to handle the SFC file itsself and for each sub-slice ask this IndexInput for this.

          The last approach seems reasonable, but we need some more checks how to implement that. The last approach keeps both "features" of CFS:

          • One OS file handle
          • possibility for certain directory implementations to return sliced IndexInputs in an optimal way. The current IndexInput have a clone method, in this case we would need a similar method, where you can give offset and length.

          On the other hand, we can remove the "factory" for CFS files from directory, we can go back to a simple new CFSDirectory(parentDirectory, cfsName).

          Does this sound reasonable?

          Show
          Uwe Schindler added a comment - Hi when thinking about the whole stuff one more time again, I may have a solution to again decouple CFS from the parent directory, so one can create any CFS using one single class (but perhaps the factory in directory is still an idea to make it customizable). There are several solutions, but most of them have customization problems: The current approach was discussed already, nothing more to say A possibility to make it possible for MMap to map certain parts of the file is to move the getIndexInputSlice up to the abstract Directory base class and make the default implementation the current CFIndexInput from the default CFS impl. This would be even backwards compatible. So the CFS impl can simply ask the parent directory it warps for a slice. The problem here is easy: Current CFS impl opens the CFS file exactly one time and consumes exactly one file handle. The slices work on the same file handle. If we move the slice handling up to the directory, the "state" is gone, so handling the all-the-time open CFS file cannot be managed anymore. When using a new file handle for each slice, we gain nothing (CFS is to reduce file handles). Last night I had one idea that might fix this issue. Lets move the slice handling into the abstract IndexInput base class, again the default impl would simply use the current CFIndexInput to return a slice. In the case of MMapIndexInput it would simply return a remapped slice on the current file handle. The only thing that would change is that the RAF would kept open the wohle time (like MMapCFDirectory does), in contrast to curren, where th RAF is closed directly after mapping. This approach would allow it for the CFS impl to simply ask it parant directory for an IndexInput to handle the SFC file itsself and for each sub-slice ask this IndexInput for this. The last approach seems reasonable, but we need some more checks how to implement that. The last approach keeps both "features" of CFS: One OS file handle possibility for certain directory implementations to return sliced IndexInputs in an optimal way. The current IndexInput have a clone method, in this case we would need a similar method, where you can give offset and length. On the other hand, we can remove the "factory" for CFS files from directory, we can go back to a simple new CFSDirectory(parentDirectory, cfsName). Does this sound reasonable?
          Hide
          Simon Willnauer added a comment -

          I think we should back out these CFS changes for now.

          -1

          I think we are over reacting here, especially robert gets too crazy about this. Honestly I think CFS should be detached from directory and we should make it a delegating directory if at all. That way we would always operate on the right directory, can safely create two files and keep Directory itself clean. We can still add the ability to partially map a certain file (offset, length) into memory like we do now in the specialized CFS Dirs. This entire think is not a problem of appending at all IMO.

          how does that sound? I think this would solve all the problems we are having and keeps it appendable.

          simon

          Show
          Simon Willnauer added a comment - I think we should back out these CFS changes for now. -1 I think we are over reacting here, especially robert gets too crazy about this. Honestly I think CFS should be detached from directory and we should make it a delegating directory if at all. That way we would always operate on the right directory, can safely create two files and keep Directory itself clean. We can still add the ability to partially map a certain file (offset, length) into memory like we do now in the specialized CFS Dirs. This entire think is not a problem of appending at all IMO. how does that sound? I think this would solve all the problems we are having and keeps it appendable. simon
          Hide
          Michael McCandless added a comment -

          I think we should back out these CFS changes for now.

          +1

          Generally if we add a cool optimization and it turns out that optimization risks even just apparent index corruption and/or adds scary traps / confusing complexity to the API I think we should pull the change and iterate on the issue / branch until these problems are addressed?

          We had a similar experience with copyBytes, but that time it was real corruption.

          Optimizations aren't worth such risks I think, especially if it's only an index-time opto?

          Show
          Michael McCandless added a comment - I think we should back out these CFS changes for now. +1 Generally if we add a cool optimization and it turns out that optimization risks even just apparent index corruption and/or adds scary traps / confusing complexity to the API I think we should pull the change and iterate on the issue / branch until these problems are addressed? We had a similar experience with copyBytes, but that time it was real corruption. Optimizations aren't worth such risks I think, especially if it's only an index-time opto?
          Hide
          Robert Muir added a comment -

          I think the situation here is too complicated already, we are discussing all kinds of complicated stuff and I dont think "appendable" CFS is worth any of this.

          I think we should back out these CFS changes for now.

          Show
          Robert Muir added a comment - I think the situation here is too complicated already, we are discussing all kinds of complicated stuff and I dont think "appendable" CFS is worth any of this. I think we should back out these CFS changes for now.
          Hide
          Uwe Schindler added a comment -

          When looking into the CompoundFileDirectory code I also found a small bug in version handling.
          readEntries() reads the first VInt and uses it for version checking (if negative). This check has 2 problems:

          • if the VInt is smaller then FORMAT_CURRENT it should throw IndexTooNewException
          • the comparison should not be against FORMAT_CURRENT itsself (this constant should only be used for writing CFS files), it should compare against real version numbers. This would otherwise break on later additions of new formats.
          Show
          Uwe Schindler added a comment - When looking into the CompoundFileDirectory code I also found a small bug in version handling. readEntries() reads the first VInt and uses it for version checking (if negative). This check has 2 problems: if the VInt is smaller then FORMAT_CURRENT it should throw IndexTooNewException the comparison should not be against FORMAT_CURRENT itsself (this constant should only be used for writing CFS files), it should compare against real version numbers. This would otherwise break on later additions of new formats.
          Hide
          Robert Muir added a comment -

          So eventually this is about creating the cfe and cfs file from the "right" directory.

          That's not the only issue: while that is the primary reason I reopened this issue I also have concerns about the API being complicated and non-intuitive.

          Making the API even more complicated because Filesystem X can only write WingDings or cannot seek doesn't seem to be a good solution to me.

          Show
          Robert Muir added a comment - So eventually this is about creating the cfe and cfs file from the "right" directory. That's not the only issue: while that is the primary reason I reopened this issue I also have concerns about the API being complicated and non-intuitive. Making the API even more complicated because Filesystem X can only write WingDings or cannot seek doesn't seem to be a good solution to me.
          Hide
          Uwe Schindler added a comment -

          If we want append only, we should also remove seek methods from IndexOutput... I DISAGREE, too!

          Show
          Uwe Schindler added a comment - If we want append only, we should also remove seek methods from IndexOutput... I DISAGREE, too!
          Hide
          Robert Muir added a comment -

          I disagree, we don't need to compensate for hadoop's problems.

          Show
          Robert Muir added a comment - I disagree, we don't need to compensate for hadoop's problems.
          Hide
          Simon Willnauer added a comment -

          personally I think we should try to be append only on general. So eventually this is about creating the cfe and cfs file from the "right" directory. What we could do to use the parent ie. FileSwitchDir etc. is add a protected method that allows passing the parent dir to the createCompoundOutput / openCompoundInput which is then in turn used to create the actual files. We can call this method from the public createCompoundOutput / openCompoundInput versions with "this" as the directory to create files. How does that sound? Lemme know if I miss something...

          Show
          Simon Willnauer added a comment - personally I think we should try to be append only on general. So eventually this is about creating the cfe and cfs file from the "right" directory. What we could do to use the parent ie. FileSwitchDir etc. is add a protected method that allows passing the parent dir to the createCompoundOutput / openCompoundInput which is then in turn used to create the actual files. We can call this method from the public createCompoundOutput / openCompoundInput versions with "this" as the directory to create files. How does that sound? Lemme know if I miss something...
          Hide
          Uwe Schindler added a comment - - edited

          The trick with the latest updates to compound files is that the CompoundFileWriter/Reader is returned by the directory implementation - and this is broken and the discussion is about this.
          So this would be the place, where you theoretically could completely make another CFS on-disk format or e.g. write the stuff to a ZIP file

          See here: https://builds.apache.org/job/Lucene-3.x/javadoc/core/org/apache/lucene/store/Directory.html#createCompoundOutput(java.lang.String)

          Show
          Uwe Schindler added a comment - - edited The trick with the latest updates to compound files is that the CompoundFileWriter/Reader is returned by the directory implementation - and this is broken and the discussion is about this. So this would be the place, where you theoretically could completely make another CFS on-disk format or e.g. write the stuff to a ZIP file See here: https://builds.apache.org/job/Lucene-3.x/javadoc/core/org/apache/lucene/store/Directory.html#createCompoundOutput(java.lang.String)
          Hide
          Andrzej Bialecki added a comment -

          I think append-only filesystems (eg HDFS) can make their own impl that uses the file length instead (like AppendingCodecc).

          AppendingCodec solves only one issue, that of postings and SegmentInfos. I'm worried that adding seek+rewrite tricks in other places that are not under the control of Codec or under any other configurable implementation (such as CFS) will ultimately prevent the efficient use of Lucene on Hadoop. Unless we put those places under the control of a Codec (or some other configurable interface).

          Show
          Andrzej Bialecki added a comment - I think append-only filesystems (eg HDFS) can make their own impl that uses the file length instead (like AppendingCodecc). AppendingCodec solves only one issue, that of postings and SegmentInfos. I'm worried that adding seek+rewrite tricks in other places that are not under the control of Codec or under any other configurable implementation (such as CFS) will ultimately prevent the efficient use of Lucene on Hadoop. Unless we put those places under the control of a Codec (or some other configurable interface).
          Hide
          Robert Muir added a comment -

          Anyone have opinions on 3.x?

          I think it might be wise to think about pulling these CFS changes (LUCENE-3201, too) back from 3.x and instead
          give things some time to settle in trunk?

          We could always then backport at a later release, but we got lucky here that we still haven't released anything.

          Show
          Robert Muir added a comment - Anyone have opinions on 3.x? I think it might be wise to think about pulling these CFS changes ( LUCENE-3201 , too) back from 3.x and instead give things some time to settle in trunk? We could always then backport at a later release, but we got lucky here that we still haven't released anything.
          Hide
          Michael McCandless added a comment -

          Why not put that file pointer at the very end of the
          file? So that the read-time sequence of actions is: seek to 8 bytes before the
          end, read the file pointer, seek back to beginning of metadata.

          I would rather not rely on metadata (file length) when reading, only the contents of the file.

          I think append-only filesystems (eg HDFS) can make their own impl that uses the file length instead (like AppendingCodecc).

          Show
          Michael McCandless added a comment - Why not put that file pointer at the very end of the file? So that the read-time sequence of actions is: seek to 8 bytes before the end, read the file pointer, seek back to beginning of metadata. I would rather not rely on metadata (file length) when reading, only the contents of the file. I think append-only filesystems (eg HDFS) can make their own impl that uses the file length instead (like AppendingCodecc).
          Hide
          Robert Muir added a comment -

          Right, the fix I applied is really a hack, but I didnt want to leave our codebase broken while we figure this out.

          Its not just a problem from a performance perspective, I think its just bad to make assumptions about how the inner directory works.
          In this case with fileswitchdirectory etc, it really should be fully delegating this stuff down, and be clueless about how its implemented by the underlying sub directory.

          Show
          Robert Muir added a comment - Right, the fix I applied is really a hack, but I didnt want to leave our codebase broken while we figure this out. Its not just a problem from a performance perspective, I think its just bad to make assumptions about how the inner directory works. In this case with fileswitchdirectory etc, it really should be fully delegating this stuff down, and be clueless about how its implemented by the underlying sub directory.
          Hide
          Uwe Schindler added a comment -

          i think we should prevent the seek if not absolutely necessary.

          You have a very small optimization here that only affects opening the CFS.

          But because we need to fix the wrong behaviour in FileSwitch (and also NRTCaching dir, which is in my opinion more serious!!!!), FileSwitch and NRTCachingDir now use the default CompoundFileImpl. If you wrap MMapDir by FileSwitch or NRTCaching, the whole custom impl of the compound file in MMap that speeds up even further is obsolete, as not used (you can use the compound file with really no slowdown at all as we can map parts of the CFS file into memeory and need no offset calculations and can also save mapping costs).

          This is gone now, just because a one-time seek at opening time is prevented.

          Show
          Uwe Schindler added a comment - i think we should prevent the seek if not absolutely necessary. You have a very small optimization here that only affects opening the CFS. But because we need to fix the wrong behaviour in FileSwitch (and also NRTCaching dir, which is in my opinion more serious!!!!), FileSwitch and NRTCachingDir now use the default CompoundFileImpl. If you wrap MMapDir by FileSwitch or NRTCaching, the whole custom impl of the compound file in MMap that speeds up even further is obsolete, as not used (you can use the compound file with really no slowdown at all as we can map parts of the CFS file into memeory and need no offset calculations and can also save mapping costs). This is gone now, just because a one-time seek at opening time is prevented.
          Hide
          Robert Muir added a comment -

          Maybe we can avoid making a separate _X.cfe file?

          +1, this sounds great (however it can be done, ideally with Marvin's idea to support appendable-only filesystems also), and would end the confusion here.

          Show
          Robert Muir added a comment - Maybe we can avoid making a separate _X.cfe file? +1, this sounds great (however it can be done, ideally with Marvin's idea to support appendable-only filesystems also), and would end the confusion here.
          Hide
          Uwe Schindler added a comment - - edited

          Thanks Robert for explaining this again, I agree 100% with you, the current cfe/cfs discussion is really serious and the current impl is heavy broken.

          Show
          Uwe Schindler added a comment - - edited Thanks Robert for explaining this again, I agree 100% with you, the current cfe/cfs discussion is really serious and the current impl is heavy broken.
          Hide
          Robert Muir added a comment -

          This is definitely not a bug in the directory, and its a serious issue (i think a blocker for release myself).

          I'll try to explain the issue again a little better than I did on https://issues.apache.org/jira/browse/LUCENE-3380?focusedCommentId=13086872&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13086872

          This is just an example of the API problem, with FileSwitchDirectory.

          In Lucene we have FileSwitchDirectory which is a Directory that lets you "switch" between 2 different directory implementations based on file extension.
          So conceptually it looks like this:

          FileSwitchDirectory extends Directory {
            Directory a;
            Directory b;
            Set extensions; // these are the file extensions that go to "a", all other ones are handled by "b"
          }
          

          Imagine you configure this directory to put all "*.cfs" in "a", and everything else in "b".

          So when FileSwitchDirectory is asked where to put "1.cfs", it forwards the request to "a".

          But the "1.cfe" file is actually wrongly created in "a" also, causing FileNotFoundExceptions later when the file is to be read, because its in the wrong directory. This is because of how the compound file mechanism works now, it calls a.createOutput("1.cfe") instead of fileswitchdirectory.createOutput("1.cfe").

          So this is a serious problem for any Directories that delegate responsibility like this, not just the ones in Lucene.

          Show
          Robert Muir added a comment - This is definitely not a bug in the directory, and its a serious issue (i think a blocker for release myself). I'll try to explain the issue again a little better than I did on https://issues.apache.org/jira/browse/LUCENE-3380?focusedCommentId=13086872&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13086872 This is just an example of the API problem, with FileSwitchDirectory. In Lucene we have FileSwitchDirectory which is a Directory that lets you "switch" between 2 different directory implementations based on file extension. So conceptually it looks like this: FileSwitchDirectory extends Directory { Directory a; Directory b; Set extensions; // these are the file extensions that go to "a" , all other ones are handled by "b" } Imagine you configure this directory to put all "*.cfs" in "a", and everything else in "b". So when FileSwitchDirectory is asked where to put "1.cfs", it forwards the request to "a". But the "1.cfe" file is actually wrongly created in "a" also, causing FileNotFoundExceptions later when the file is to be read, because its in the wrong directory. This is because of how the compound file mechanism works now, it calls a.createOutput("1.cfe") instead of fileswitchdirectory.createOutput("1.cfe"). So this is a serious problem for any Directories that delegate responsibility like this, not just the ones in Lucene.
          Hide
          Marvin Humphrey added a comment -

          I don't fully grok Robert's concern, but with regards to Mike's suggestion of
          inlining the metadata: Why not put that file pointer at the very end of the
          file? So that the read-time sequence of actions is: seek to 8 bytes before the
          end, read the file pointer, seek back to beginning of metadata.

          That way you don't need to seek backwards during writing, which IIRC used to
          be an issue for Hadoop.

          Show
          Marvin Humphrey added a comment - I don't fully grok Robert's concern, but with regards to Mike's suggestion of inlining the metadata: Why not put that file pointer at the very end of the file? So that the read-time sequence of actions is: seek to 8 bytes before the end, read the file pointer, seek back to beginning of metadata. That way you don't need to seek backwards during writing, which IIRC used to be an issue for Hadoop.
          Hide
          Simon Willnauer added a comment -

          We did this because previously the CFS stored the header in the front of the file (I think)?

          is this really the problem here? I mean this problem is in FileSwitch / NRT Directory. The CFS uses a directory to write files, I would expect that if we use for instance NRT directory it gets the NRT directory instead of either of of its sub directories. its not really a CFS problem IMO and we should rather fix the actual directory rather than reverting the small optimization having the header in a separate file. i think we should prevent the seek if not absolutely necessary.

          Show
          Simon Willnauer added a comment - We did this because previously the CFS stored the header in the front of the file (I think)? is this really the problem here? I mean this problem is in FileSwitch / NRT Directory. The CFS uses a directory to write files, I would expect that if we use for instance NRT directory it gets the NRT directory instead of either of of its sub directories. its not really a CFS problem IMO and we should rather fix the actual directory rather than reverting the small optimization having the header in a separate file. i think we should prevent the seek if not absolutely necessary.
          Hide
          Michael McCandless added a comment -

          Maybe we can avoid making a separate _X.cfe file?

          We did this because previously the CFS stored the header in the front of the file (I think)?

          Could we, instead, put the header at the end of the file, but place a long pointer at the start of the file saying where the header is located (I'd rather not rely on file.length())? Then we could have a single (_X.cfs) file again and we can not use the Dir impl for delegation?

          Show
          Michael McCandless added a comment - Maybe we can avoid making a separate _X.cfe file? We did this because previously the CFS stored the header in the front of the file (I think)? Could we, instead, put the header at the end of the file, but place a long pointer at the start of the file saying where the header is located (I'd rather not rely on file.length())? Then we could have a single (_X.cfs) file again and we can not use the Dir impl for delegation?
          Hide
          Robert Muir added a comment -

          Can CFS reading/writing not take a parent directory, instead of:

          CompoundFileDirectory(Directory parent, ....)

          I think it should be
          CompoundFileDirectory(IndexInput cfs, IndexInput cfe)

          And directory.createOutput etc should take both filenames, this would remove this backdooring completely.

          Show
          Robert Muir added a comment - Can CFS reading/writing not take a parent directory, instead of: CompoundFileDirectory(Directory parent, ....) I think it should be CompoundFileDirectory(IndexInput cfs, IndexInput cfe) And directory.createOutput etc should take both filenames, this would remove this backdooring completely.
          Hide
          Robert Muir added a comment -

          See LUCENE-3374, LUCENE-3380.

          There are some serious traps here with how the CFE files are created for "delegator"-like Directory impls such as FileSwitchDirectory and NRTCachingDirectory.

          With such dirs that might have policies the writer is "backdooring" their decision about where files should go since it only has a reference to the "sub" directory.

          I think this is non-intuitive and we should do something to try to prevent surprises.

          Show
          Robert Muir added a comment - See LUCENE-3374 , LUCENE-3380 . There are some serious traps here with how the CFE files are created for "delegator"-like Directory impls such as FileSwitchDirectory and NRTCachingDirectory. With such dirs that might have policies the writer is "backdooring" their decision about where files should go since it only has a reference to the "sub" directory. I think this is non-intuitive and we should do something to try to prevent surprises.
          Hide
          Simon Willnauer added a comment -

          backported to 3.x - thanks guys

          Show
          Simon Willnauer added a comment - backported to 3.x - thanks guys
          Hide
          Robert Muir added a comment -

          Thanks Simon, I feel better now that we get our open-files-for-write tracking back.

          Show
          Robert Muir added a comment - Thanks Simon, I feel better now that we get our open-files-for-write tracking back.
          Hide
          Simon Willnauer added a comment -

          thank you robert, while this has actually been tested since its in the base class though its now cleaner. The test failure came from RAMDirectory simply overriding existing files. I added an explicit check for it.

          Show
          Simon Willnauer added a comment - thank you robert, while this has actually been tested since its in the base class though its now cleaner. The test failure came from RAMDirectory simply overriding existing files. I added an explicit check for it.
          Hide
          Robert Muir added a comment -

          Hi Simon, currently this attached patch fails... not sure why yet.

          But I think we should resolve this tests issue before backporting

          Show
          Robert Muir added a comment - Hi Simon, currently this attached patch fails... not sure why yet. But I think we should resolve this tests issue before backporting
          Hide
          Simon Willnauer added a comment -

          here is a patch against 3.x. I had to change one test in lucene/backwards and remove some tests from there which used the CFW / CFR.

          A review would be good here!

          Show
          Simon Willnauer added a comment - here is a patch against 3.x. I had to change one test in lucene/backwards and remove some tests from there which used the CFW / CFR. A review would be good here!
          Hide
          Simon Willnauer added a comment -

          Committed in revision 1138063.
          I will try to backport this to 3.x if possible

          Show
          Simon Willnauer added a comment - Committed in revision 1138063. I will try to backport this to 3.x if possible
          Hide
          Simon Willnauer added a comment -

          final patch.

          • fixed javadocs + several javadoc warnings
          • renamed openCompoundOutput to createCompoundOutput
          • fixed file extensions in test CSF LOL!!
          • copyFileEntry now deletes files that are separately written once copied into the CFS.
          • converted asserts to exceptions in CFW

          I plan to commit this today if nobody objects.

          Show
          Simon Willnauer added a comment - final patch. fixed javadocs + several javadoc warnings renamed openCompoundOutput to createCompoundOutput fixed file extensions in test CSF LOL!! copyFileEntry now deletes files that are separately written once copied into the CFS. converted asserts to exceptions in CFW I plan to commit this today if nobody objects.
          Hide
          Michael McCandless added a comment -

          Patch looks great!

          Can we name it createCompoundOutput? Emphasizes that we are
          write-once (this file shouldn't exist), and matches createOutput.

          On checkAbort... we could not send that to the CFW and instead call
          checkAbort in the outer loops? (Ie, where we .copy the files in).
          The existing CFW already only checks once-per-file anyway...

          Maybe instead of asserts for the mis-use of the CFD API (eg no
          entries, something is still open), we should make these real
          exceptions (ie, thrown even when assertions are off)?

          This comment looks stale (in CFW.java)?:

                // Close the output stream. Set the os to null before trying to
                // close so that if an exception occurs during the close, the
                // finally clause below will not attempt to close the stream
                // the second time.
          

          openCompoundOutput needs javadoc.

          CFD.createOutput's jdoc says Not Implememented but it is.

          The new test cases in TestCompoundFile names its file d.csf Column
          stride fields lives on!! Too many tlas...

          Show
          Michael McCandless added a comment - Patch looks great! Can we name it createCompoundOutput? Emphasizes that we are write-once (this file shouldn't exist), and matches createOutput. On checkAbort... we could not send that to the CFW and instead call checkAbort in the outer loops? (Ie, where we .copy the files in). The existing CFW already only checks once-per-file anyway... Maybe instead of asserts for the mis-use of the CFD API (eg no entries, something is still open), we should make these real exceptions (ie, thrown even when assertions are off)? This comment looks stale (in CFW.java)?: // Close the output stream. Set the os to null before trying to // close so that if an exception occurs during the close, the // finally clause below will not attempt to close the stream // the second time. openCompoundOutput needs javadoc. CFD.createOutput's jdoc says Not Implememented but it is. The new test cases in TestCompoundFile names its file d.csf Column stride fields lives on!! Too many tlas...
          Hide
          Simon Willnauer added a comment -

          updated patch NOW containing all files

          sorry for the missing files in the last patch

          Show
          Simon Willnauer added a comment - updated patch NOW containing all files sorry for the missing files in the last patch
          Hide
          Simon Willnauer added a comment -

          next iteration - seems close.

          • moved CFW to o.a.l.store and made package private.
          • added createCompoundOutput to Directory instead of passing OpenMode
          • added write support to CompundFileDirectory
          • Separately written file are appended during close if possible (no other file is currently written directly to the CF). If files is locked append happens once that file is closed.
          • IW uses Directory methods only, addFile has been converted to Directory#copy

          once thing which still bugs me is the setAbortCheck on CFDirectory.. I wonder if we can solve that differently, ideas?

          Show
          Simon Willnauer added a comment - next iteration - seems close. moved CFW to o.a.l.store and made package private. added createCompoundOutput to Directory instead of passing OpenMode added write support to CompundFileDirectory Separately written file are appended during close if possible (no other file is currently written directly to the CF). If files is locked append happens once that file is closed. IW uses Directory methods only, addFile has been converted to Directory#copy once thing which still bugs me is the setAbortCheck on CFDirectory.. I wonder if we can solve that differently, ideas?
          Hide
          Michael McCandless added a comment -

          Patch looks cool!

          So the CFW will take the first output opened against it and let it write
          directly into the "actual" CFS file, and then if another file is
          opened while that first one is still open, the 2nd file will write to
          separate file and then will copy in on close. We may want to delegate
          the separate files too? So that on close they copy themselves into
          the CFS and remove the original? This way IW won't have to separately
          create CFS in the end.

          Somehow we need IW to add the biggest sub-file first...

          s/compund/compound

          CFW.close should assert currentOutput != null (and, if we delegate sep
          entries, that they are also all closed)?

          You might need to sync the CompoundFileWriter.this.currentOutput test
          / setting to null? Though... Lucene is always single threaded in
          writing files for the same segment, today anyway.

          Can we make a separate createCompoundOutput? (Ie, instaed of passing
          OpenMode to openCompoundInput). And: I'm assuming a given compound
          output can only be opened once, appended to / separate files copied
          into, closed and then never opened again for writing? (Ie, still
          "write once" at the file level).

          Show
          Michael McCandless added a comment - Patch looks cool! So the CFW will take the first output opened against it and let it write directly into the "actual" CFS file, and then if another file is opened while that first one is still open, the 2nd file will write to separate file and then will copy in on close. We may want to delegate the separate files too? So that on close they copy themselves into the CFS and remove the original? This way IW won't have to separately create CFS in the end. Somehow we need IW to add the biggest sub-file first... s/compund/compound CFW.close should assert currentOutput != null (and, if we delegate sep entries, that they are also all closed)? You might need to sync the CompoundFileWriter.this.currentOutput test / setting to null? Though... Lucene is always single threaded in writing files for the same segment, today anyway. Can we make a separate createCompoundOutput? (Ie, instaed of passing OpenMode to openCompoundInput). And: I'm assuming a given compound output can only be opened once, appended to / separate files copied into, closed and then never opened again for writing? (Ie, still "write once" at the file level).
          Hide
          Simon Willnauer added a comment -

          first sketch still some nocommits - this patch includes the latest patch from LUCENE-3201 which made the CFS part of directory. This patch adds write support to the CompoundFileDirectory. The CFWriter tries to write files directly to the CFS if possible like when no other file is currently open for writing it opens a stream directly on the CFS. Yet, this change also adds a new file to the CFS (.cfe) which only holds the entry table which makes all seeks unneeded (plays better with AppendingCodec).

          I currently don't use it during indexing since we decided after flush if we use CFS or not. Yet this might change with this optimization but I will leave this to another issue.

          Show
          Simon Willnauer added a comment - first sketch still some nocommits - this patch includes the latest patch from LUCENE-3201 which made the CFS part of directory. This patch adds write support to the CompoundFileDirectory. The CFWriter tries to write files directly to the CFS if possible like when no other file is currently open for writing it opens a stream directly on the CFS. Yet, this change also adds a new file to the CFS (.cfe) which only holds the entry table which makes all seeks unneeded (plays better with AppendingCodec). I currently don't use it during indexing since we decided after flush if we use CFS or not. Yet this might change with this optimization but I will leave this to another issue.

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Simon Willnauer
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development