Solr
  1. Solr
  2. SOLR-440

Should use longs for internal DateField storage

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 1.2, 1.3
    • Fix Version/s: None
    • Component/s: search
    • Labels:
      None

      Description

      The current DateField implementation uses formatted Strings internally to store dates.

      As a user creating a schema, I assumed that using the DateField type would be more efficient than using an integer field to store seconds. In fact, the DateField type is currently significantly less efficient: ~20 extra bytes are required per value, and String sorting requires that all values fit in memory.

      As soon as sorting on long fields has been implemented (SOLR-324), I'd suggest that the date implementation be switched to use long values internally, representing milliseconds since the epoch in UTC. Unfortunately, this will cause index incompatibilities, so the schema version will need to be bumped.

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          using longs for millis since epoch in addition to "long" sorting might be more efficient if all you care about is strict date sorting, but "range queries" wouldn't work in that case.

          that plus index back compatibility are reason enough to leave String as the internal representation of DateField ... but there is no reason we can't add a new FieldType that uses a long as the internal representation.

          before a lot of work is invested in this issue however, it might be a good worthwhile to revist and consider some of the previous discussions about potential changes/improvements to Solr date handling...

          http://www.nabble.com/dates---times-to10417533.html#a10417533

          Show
          Hoss Man added a comment - using longs for millis since epoch in addition to "long" sorting might be more efficient if all you care about is strict date sorting, but "range queries" wouldn't work in that case. that plus index back compatibility are reason enough to leave String as the internal representation of DateField ... but there is no reason we can't add a new FieldType that uses a long as the internal representation. before a lot of work is invested in this issue however, it might be a good worthwhile to revist and consider some of the previous discussions about potential changes/improvements to Solr date handling... http://www.nabble.com/dates---times-to10417533.html#a10417533
          Hide
          Hoss Man added a comment -

          From an email reply i'm just now looking at...

          Date: Sat, 22 Dec 2007 22:59:29 -0500 (EST)
          From: Stu Hood 
          
          If sorting was working, why couldn't range queries be supported?
          

          RangeQueries (or RangeFilters more specifically) require that a single in order walk of the TermEnum from "low" to "high" include all values in the index that are truly inclusive, and none that aren't .... Sorting works differently, the first time a field is sorted on, all the values are walked and an "inverted-invertedindex" (as i like to call it) FieldCache is built mapping docIds to values.

          Although .... assuming the FieldCache support for Longs supports a "LongParser" the same way the Int and Float support does (i just checked, and it does) then a "new SortableLongDateField" could be created which uses the same tricks as SortableLongField to "index" values that sort lexigraphicaly and builds a FieldCache using a long[] ... but knows how to parse/return ISO formated dates (and do DateMath).

          Show
          Hoss Man added a comment - From an email reply i'm just now looking at... Date: Sat, 22 Dec 2007 22:59:29 -0500 (EST) From: Stu Hood If sorting was working, why couldn't range queries be supported? RangeQueries (or RangeFilters more specifically) require that a single in order walk of the TermEnum from "low" to "high" include all values in the index that are truly inclusive, and none that aren't .... Sorting works differently, the first time a field is sorted on, all the values are walked and an "inverted-invertedindex" (as i like to call it) FieldCache is built mapping docIds to values. Although .... assuming the FieldCache support for Longs supports a "LongParser" the same way the Int and Float support does (i just checked, and it does) then a "new SortableLongDateField" could be created which uses the same tricks as SortableLongField to "index" values that sort lexigraphicaly and builds a FieldCache using a long[] ... but knows how to parse/return ISO formated dates (and do DateMath).
          Hide
          Stu Hood added a comment - - edited

          Thanks a lot for the explanation, I think I get it now.

          That sounds like a good plan. Perhaps it could replace the DateField in Solr 2.0.

          Show
          Stu Hood added a comment - - edited Thanks a lot for the explanation, I think I get it now. That sounds like a good plan. Perhaps it could replace the DateField in Solr 2.0.
          Hide
          Hoss Man added a comment -

          depending on what the final implementation looks like, i could easily be convinced to use a new class like this in the example configs, and maybe even go so far as to deprecate the existing DateFiled ... but i can't imagine being convinced to completley replaced DateField with it at any point ... a RAM performance improvement in sorting doesn't seem worth breaking back compatibility and forcing people to reindex ... not when it would be just as easy to add a new (subclass) FieldType.

          consider the people who currently use DateField for range queries but never for sorting – they'll be pretty upset if we tell them they have to reindex and they won't see any noticeable benefit from doing so.

          Show
          Hoss Man added a comment - depending on what the final implementation looks like, i could easily be convinced to use a new class like this in the example configs, and maybe even go so far as to deprecate the existing DateFiled ... but i can't imagine being convinced to completley replaced DateField with it at any point ... a RAM performance improvement in sorting doesn't seem worth breaking back compatibility and forcing people to reindex ... not when it would be just as easy to add a new (subclass) FieldType. consider the people who currently use DateField for range queries but never for sorting – they'll be pretty upset if we tell them they have to reindex and they won't see any noticeable benefit from doing so.
          Hide
          Stu Hood added a comment -

          Is there no way to make DateField backwards compatible? Perhaps with 'magic' values, or by looking at the byte length of the field before attempting to parse it as a long or string?

          I'm probably misunderstanding how significant a difference in memory usage it would be, but I kindof feel like I was suckered into using DateField. I have a field (the timestamp field in the dist solrconfig.xml that defaults to NOW) that I currently cannot sort on, because it runs instances out of memory trying to do a string sort.

          Thanks for considering it!

          Show
          Stu Hood added a comment - Is there no way to make DateField backwards compatible? Perhaps with 'magic' values, or by looking at the byte length of the field before attempting to parse it as a long or string? I'm probably misunderstanding how significant a difference in memory usage it would be, but I kindof feel like I was suckered into using DateField. I have a field (the timestamp field in the dist solrconfig.xml that defaults to NOW) that I currently cannot sort on, because it runs instances out of memory trying to do a string sort. Thanks for considering it!
          Hide
          Jan Høydahl added a comment -

          This is fixed way back, closing.

          Show
          Jan Høydahl added a comment - This is fixed way back, closing.

            People

            • Assignee:
              Unassigned
              Reporter:
              Stu Hood
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development