Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8830

DefaultIndexingChain.getOrAddField method ignores omitNorms from FieldType

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 6.6.1
    • Fix Version/s: None
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Norms are being computed and written even when omitNorms is set to true in the fieldTypes. I chased the issue and found that the method getOrAddField tries to create a FieldInfo object in the 1st pass. By default this object has omitNorms to false. The method sets the indexOptions as specified in the fieldType on this newly created object but doesn't do the same for omitNorms. This effectively overrides this flag which creates issues down the line. 
       
      Here's the code snippet for the method with the fieldInfos.getOrAdd call 
       
       

      private PerField getOrAddField(String name, IndexableFieldType fieldType, boolean invert) {
      
       // Make sure we have a PerField allocated
       final int hashPos = name.hashCode() & hashMask;
       PerField fp = fieldHash[hashPos];
       while (fp != null && !fp.fieldInfo.name.equals(name)) {
       fp = fp.next;
       }
      
       if (fp == null) {
       // First time we are seeing this field in this segment
      
       FieldInfo fi = fieldInfos.getOrAdd(name);
      
      // Messy: must set this here because e.g. FreqProxTermsWriterPerField looks at the // initial IndexOptions to decide what arrays it must create). Then, we also must // set it in PerField.invert to allow for later downgrading of the index options:
      
       fi.setIndexOptions(fieldType.indexOptions());
      
       fp = new PerField(fi, invert);
       ...   

       
       
       
      The getOrAdd method below instantiates a new object with omitNorms set to false as the 4th parameter.
       

      /** Create a new field, or return existing one. */
      public FieldInfo getOrAdd(String name) {
       FieldInfo fi = fieldInfo(name);
       
      if (fi == null) {
       // This field wasn't yet added to this in-RAM
       // segment's FieldInfo, so now we get a global
       // number for this field. If the field was seen
       // before then we'll get the same name and number,
       // else we'll allocate a new one:
      
       final int fieldNumber = globalFieldNumbers.addOrGet(name, -1, DocValuesType.NONE, 0, 0);
       
      fi = new FieldInfo(name, fieldNumber, false, false, false, IndexOptions.NONE, DocValuesType.NONE, -1, new HashMap<>(), 0, 0);
      
       assert !byName.containsKey(fi.name);
       globalFieldNumbers.verifyConsistent(Integer.valueOf(fi.number), fi.name, DocValuesType.NONE);
       byName.put(fi.name, fi);
       }
      
       return fi;
      }

       
      This will cause norms to always be computed which not only produces incorrect scores but also impacts the disk usage if there are many documents with multiple fields which have this flag set to true but ignored

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ishansri Ishan Sri
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: