Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2006 ability to support storing extended attributes per file
  3. HDFS-6346

Optimize OP_SET_XATTRS by persisting single Xattr entry per setXattr/removeXattr api call

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When a new xattrs set on an Inode, it may add this with OP_SET_XATTRS and let's say [user.name1:value1]
      On a next call if we set another xattrs, then it may store along with older existing xattrs again. It may be like [user.name1:value1, user.name2:value2]
      So, on adding more xattrs on same Inode, that list may grow and we keep store the entries of already existing name, value fairs.
      Right now we defaulted the max Xattrs on an Inode to 32 and configured. If user modified it to much larger value and start setting xattrs, then edits loading may create issue like my above example.
      But I didn't refer any usecase of having large number of xattrs, this is just the assumption to consider a case. My biggest doubt is whether we will have such real usecases to have huge xattrs on a single INode.
      So, here is a thought on having OP_SET_XATTR for each setXAttr operation to be logged, When we replay them we need to consolidate. This is some initial thought we can think more if others also feel we need to consider this case to handle.

      Otherwise we endup storing Xattrs entries in editlog file as n(n+1)/2 where n is number xattrs for a file/dir. This may be issue only when we have large number configured max xattrs for inode.

      1. HDFS-6346.patch
        17 kB
        Yi Liu
      2. editsStored
        5 kB
        Yi Liu
      3. HDFS-6346.1.patch
        18 kB
        Yi Liu
      4. HDFS-6346.2.patch
        18 kB
        Yi Liu

        Activity

        Uma Maheswara Rao G created issue -
        Uma Maheswara Rao G made changes -
        Field Original Value New Value
        Description When a new xattrs set on an Inode, it may add this with OP_SET_XATTRS and let's say [USER.name1:value1]
        On a next call if we set another xattrs, then it may store along with older existing xattrs again. It may be like [USER.name1:value1, USER.name2:value2]
        So, on adding more xattrs on same Inode, that list may grow and we keep store the entries of already existing name, value fairs.
        Right now we defaulted the max Xattrs on an Inode to 32 and configured. If user modified it to much larger value and start setting xattrs, then edits loading may create issue like my above example.
        But I didn't refer any usecase of having large number of xattrs, this is just the assumption to consider a case. My biggest doubt is whether we will have such real usecases to have huge xattrs on a single INode.
        So, here is a thought on having OP_SET_XATTR for each setXAttr operation to be logged, When we replay them we need to consolidate. This is some initial thought we can think more if others also feel we need to consider this case to handle.

        Otherwise we endup storing Xattrs entries in editlog file as n(n+1)/2 where n is number xattrs for a file/dir. This may be issue only when we have large number configured max xattrs for inode.
        When a new xattrs set on an Inode, it may add this with OP_SET_XATTRS and let's say [user.name1:value1]
        On a next call if we set another xattrs, then it may store along with older existing xattrs again. It may be like [user.name1:value1, user.name2:value2]
        So, on adding more xattrs on same Inode, that list may grow and we keep store the entries of already existing name, value fairs.
        Right now we defaulted the max Xattrs on an Inode to 32 and configured. If user modified it to much larger value and start setting xattrs, then edits loading may create issue like my above example.
        But I didn't refer any usecase of having large number of xattrs, this is just the assumption to consider a case. My biggest doubt is whether we will have such real usecases to have huge xattrs on a single INode.
        So, here is a thought on having OP_SET_XATTR for each setXAttr operation to be logged, When we replay them we need to consolidate. This is some initial thought we can think more if others also feel we need to consider this case to handle.

        Otherwise we endup storing Xattrs entries in editlog file as n(n+1)/2 where n is number xattrs for a file/dir. This may be issue only when we have large number configured max xattrs for inode.
        Yi Liu made changes -
        Assignee Yi Liu [ hitliuyi ]
        Uma Maheswara Rao G made changes -
        Summary Optimize OP_SET_XATTRS by persisting single Xattr entry per setXattr api call Optimize OP_SET_XATTRS by persisting single Xattr entry per setXattr/removeXattr api call
        Yi Liu made changes -
        Attachment HDFS-6346.patch [ 12643914 ]
        Attachment editsStored [ 12643915 ]
        Yi Liu made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Yi Liu made changes -
        Attachment HDFS-6346.1.patch [ 12644110 ]
        Yi Liu made changes -
        Attachment HDFS-6346.2.patch [ 12644129 ]
        Uma Maheswara Rao G made changes -
        Status In Progress [ 3 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Yi Liu
            Reporter:
            Uma Maheswara Rao G
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development