Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-19056

Investigate multipart upload performance regression

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.12.0
    • None
    • Runtime / REST
    • None

    Description

      When using Netty 4.1.50 the multipart upload of files is more than a 100 times slower in the FileUploadHandlerTest.

      This test has traditionally been somewhat heavy, since it repeatedly tests the upload of 60mb files.

      On my machine this test currently finishes in 2-3 seconds, but with the upgraded Netty version it runs for several minutes instead. I have not verified yet whether this is purely an issue of the test, but I would consider it unlikely.

      This would make Flink effectively unusable when uploading larger jars or JobGraphs.

       

      My theore is that is due to this change in Netty.

      Before this change, the HttpPostMultipartRequestDecoder was always creating unpooled heap buffers for something; after the change the buffer type is dependent on the input buffer. The input buffer is a direct one, so my conclusion is that with the upgrade we ended up allocating more direct buffers than we did previously.

       

      One solution I found was to explicitly create an UnpooledByteBufAllocator for the RestServerEndpoint that prefers heap buffers, which results in the input buffer to be a heap buffer, and thus we are never allocating direct ones.

      However, this should also imply that we are creating more heap buffers than we did in the previously; I don't know how much of a problem that is. Maybe this is even a good thing if it means less copies from direct to heap memory, but this comment seems to indicate otherwise.

      +final ByteBufAllocator byteBufAllocator = new UnpooledByteBufAllocator(false);
      +
       bootstrap = new ServerBootstrap();
       bootstrap
      	.group(bossGroup, workerGroup)
      	.channel(NioServerSocketChannel.class)
      -	.childHandler(initializer);
      +	.childHandler(initializer)
      +	.option(ChannelOption.ALLOCATOR, byteBufAllocator)
      +	.childOption(ChannelOption.ALLOCATOR, byteBufAllocator);
      
      

       

      On a somewhat related note, we could think about increasing the chunkSize from 8kb to 64kb to reduce the GC pressure a bit, along with some arenas for the REST API.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              chesnay Chesnay Schepler
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: