Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5461

Move old objects to delete table on overwrite

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 1.3.0
    • OM

    Description

      HDDS-5243 was a patch for omitting key locations for clients on reading. But the same warning of large response size observed in our cluster for putting data. This is harmful in terms of retry storm, as hadoop-rpc handles this large-response-exception as retry-able exception. Thus, RetryInvocationHandler retries, despite it cannot be recovered by retry, for 15 times, receiving large response message exceeding default limit of RPC message size 128MB as follows.

      2021-06-21 19:23:10,717 [IPC Server handler 65 on default port 9862] WARN org.apache.hadoop.ipc.Server: Large response size 134538349 for call Call#2037538 Retry#15 org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 10.192.17.172:34070
      2021-06-21 19:23:10,722 [IPC Server handler 65 on default port 9862] WARN org.apache.hadoop.ipc.Server: IPC Server handler 65 on default port 9862, call Call#2037538 Retry#15 org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 10.192.17.172:34070: output error
      2021-06-21 19:23:10,722 [IPC Server handler 65 on default port 9862] INFO org.apache.hadoop.ipc.Server: IPC Server handler 65 on default port 9862 caught an exception
      java.nio.channels.AsynchronousCloseException
      at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
      at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:478)
      at org.apache.hadoop.ipc.Server.channelIO(Server.java:3642)
      at org.apache.hadoop.ipc.Server.channelWrite(Server.java:3594)
      at org.apache.hadoop.ipc.Server.access$1700(Server.java:139)
      at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1657)
      at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1727)

      Suggestion in HDDS-5393 was wrong and it shall be fixed by making old blocks eligible for deletion service, moving to deletion table. It is only needed for normal object-put, while not needed for MultipartUpload objects, if I understand correctly.

      Keeping old blocks and key locations after overwrite might be intended for supporting object versioning API, but IMO current design will not scale more than, say, thousands of objects. The order of the size of value in key table will be in O( n(version) * n(blocks) ), which might easily exceed current limit of RPC message (128MB by default) or intended value size in RocksDB. Although current implementation is effective for concurrency control, object versioning should be implemented in some different way.

      Attachments

        Issue Links

          Activity

            People

              kuenishi UENISHI Kota
              kuenishi UENISHI Kota
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: