Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
v2.1.0, v2.2.0, v2.3.0, v2.3.1, v2.4.0
-
kylin v2.2.0 jdk7
-
Patch
Description
LOG:
2018-06-26 15:50:24,032 INFO [main] org.apache.kylin.dict.DictionaryManager: DictionaryManager(1499050426) loading DictionaryInfo(loadDictObj:true) at /dict/xxx.xxx/C7/036b7ca0-8733-4c0c-99f5-5122919fd3dd.dict 2018-06-26 15:50:25,586 ERROR [main] org.apache.kylin.engine.mr.KylinMapper: com.google.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232) at com.google.common.cache.LocalCache.get(LocalCache.java:3965) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829) at org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:118) at org.apache.kylin.cube.CubeManager.getDictionary(CubeManager.java:271) at org.apache.kylin.cube.CubeSegment.getDictionary(CubeSegment.java:320) at org.apache.kylin.cube.kv.CubeDimEncMap.getDictionary(CubeDimEncMap.java:86) at org.apache.kylin.cube.kv.CubeDimEncMap.get(CubeDimEncMap.java:65) at org.apache.kylin.cube.kv.RowKeyColumnIO.getColumnLength(RowKeyColumnIO.java:43) at org.apache.kylin.cube.kv.RowKeyEncoder.<init>(RowKeyEncoder.java:59) at org.apache.kylin.cube.kv.AbstractRowKeyEncoder.createInstance(AbstractRowKeyEncoder.java:48) at org.apache.kylin.engine.mr.common.BaseCuboidBuilder.<init>(BaseCuboidBuilder.java:84) at org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.doSetup(BaseCuboidMapperBase.java:70) at org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.doSetup(HiveToBaseCuboidMapper.java:36) at org.apache.kylin.engine.mr.KylinMapper.setup(KylinMapper.java:48) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1793) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769) at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744) at org.apache.kylin.common.persistence.FileResourceStore.getResourceImpl(FileResourceStore.java:123) at org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:154) at org.apache.kylin.dict.DictionaryManager.load(DictionaryManager.java:418) at org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:101) at org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:98) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at com.google.common.cache.LocalCache.get(LocalCache.java:3965) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829) at org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:118) at org.apache.kylin.cube.CubeManager.getDictionary(CubeManager.java:271) at org.apache.kylin.cube.CubeSegment.getDictionary(CubeSegment.java:320) at org.apache.kylin.cube.kv.CubeDimEncMap.getDictionary(CubeDimEncMap.java:86) at org.apache.kylin.cube.kv.CubeDimEncMap.get(CubeDimEncMap.java:65) at org.apache.kylin.cube.kv.RowKeyColumnIO.getColumnLength(RowKeyColumnIO.java:43) at org.apache.kylin.cube.kv.RowKeyEncoder.<init>(RowKeyEncoder.java:59) at org.apache.kylin.cube.kv.AbstractRowKeyEncoder.createInstance(AbstractRowKeyEncoder.java:48) at org.apache.kylin.engine.mr.common.BaseCuboidBuilder.<init>(BaseCuboidBuilder.java:84) at org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.doSetup(BaseCuboidMapperBase.java:70) at org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.doSetup(HiveToBaseCuboidMapper.java:36) at org.apache.kylin.engine.mr.KylinMapper.setup(KylinMapper.java:48) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
原因分析:
- C7是个高基数维度,字段平均字节较长,字典文件字节长度:1085484823 ;
- kylin load字典文件的实现见 FileResourceStore.getResourceImpl()方法,ByteArrayOutputStream的初始容量为1000,在copy时会不断扩容,逻辑如下(每次最少扩容2倍,最大值Integer.MAX_VALUE):
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = buf.length;
int newCapacity = oldCapacity << 1;
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity < 0)
{ if (minCapacity < 0) // overflow throw new OutOfMemoryError(); newCapacity = Integer.MAX_VALUE; }buf = Arrays.copyOf(buf, newCapacity);
}
3. JVM数组对数组长度有限制,不同环境上限可能不一样,可以通过 byte[] bytes = new byte[length] 测得具体是多少,一般是Integer.MAX_VALUE - 2。
修复建议:
1. ByteArrayOutputStream初始容量设置为文件字节长度,避免扩容,但是这依然会有jvm数组长度的限制;
2. 不再使用ByteArrayOutputStream拷贝文件,直接使用FileInputStream。
修复结果:
采用第二种方案修复。