Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-5539

Kylin4 JobServer在每次Cube构建后未关闭已打开的HDFS文件流,产生大量CLOSE_WAIT

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • v4.0.1, v4.0.2, v4.0.3
    • None
    • Job Engine
    • None

    Description

      1.初始发现:线上告警某节点存在大量的CLOSE_WAIT,通过 netstat -anp 发现来自于Kylin4 JobServer 进程,CLOSE_WAIT数达到9000多。并且 CLOSE_WAIT 来自的外部地址端口都是 50010,而该端口是 Hadoop DataNode 数据传输使用,故此怀疑是 JobServer在每次作业构建时 fileSystem.open() 一个流后没有进行close。
      2.模拟复现:在研测环境提交cube构建任务,并观察 CLOSE_WAIT 数及增长情况,发现每次cube构建结束后,CLOSE_WAIT 数增加1,至此可以确定是JobServer代码中未关闭流导致。
      3.定位代码:深入kylin4 构建代码进行debug,最终定位到 org.apache.kylin.engine.spark.utils.UpdateMetadataUtil#syncLocalMetadataToRemote 94行 (Apache Kylin main分支)

      String resKey = toUpdateSeg.getStatisticsResourcePath();
      String statisticsDir = config.getJobTmpDir(currentInstanceCopy.getProject()) + "/"
      + nsparkExecutable.getParam(MetadataConstants.P_JOB_ID) + "/" + ResourceStore.CUBE_STATISTICS_ROOT + "/"
      + cubeId + "/" + segmentId + "/";
      Path statisticsFile = new Path(statisticsDir, BatchConstants.CFG_STATISTICS_CUBOID_ESTIMATION_FILENAME);
      FileSystem fs = HadoopUtil.getWorkingFileSystem();
      if (fs.exists(statisticsFile)) {
      FSDataInputStream is = fs.open(statisticsFile); //未关闭流
      ResourceStore.getStore(config).putBigResource(resKey, is, System.currentTimeMillis());
      }

      CubeUpdate update = new CubeUpdate(currentInstanceCopy);
      update.setCuboids(distCube.getCuboids());
      List<CubeSegment> toRemoveSegs = Lists.newArrayList();
      4. 研测验证:
      Path statisticsFile = new Path(statisticsDir, BatchConstants.CFG_STATISTICS_CUBOID_ESTIMATION_FILENAME);
      FileSystem fs = HadoopUtil.getWorkingFileSystem();
      if (fs.exists(statisticsFile)) {
      try (FSDataInputStream is = fs.open(statisticsFile)) { // 关闭流
      ResourceStore.getStore(config).putBigResource(resKey, is, System.currentTimeMillis());
      }
      }
      修改代码后,在研测环境进行多轮cube构建测试,CLOSE_WAIT 均无增加,验证解决。

      Attachments

        Activity

          People

            Unassigned Unassigned
            zhaoliu4 Liu Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: