Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
v2.5.2
-
Fusion Insight
Description
Hi dear team:
I'm developing OLAP Platform based on Kylin2.5.2. During my work, I build a streaming cube from Kafka source using kafka demo.
In my streaming project, I set country、currency as dimensions and userId as metrics. But the cube build failed in 3rd step("Extract Fact Table Distinct Columns"). The exception is java.lang.ArrayIndexOutOfBoundsException.
This is logs:
2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: Do cleanup, available memory: 1334m
2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: Total rows: 127
2019-03-02 14:21:01,492 INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 0
2019-03-02 14:21:01,492 INFO [main] org.apache.hadoop.mapred.YarnChild: Exception running child: java.lang.ArrayIndexOutOfBoundsException:2
2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: Do cleanup, available memory: 1334m
at org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper.doMap(FactDistinctColumnsMapper.java:177)
at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
at org.apache.hadoop.mapreduce.Mapper.run(MapperTask.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:187)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java;180)
Then I find that in Kafka datasource, some streaming data lack the userId column. Most of the streaming data(contry, currency,userId) is ("China","CNY","843c4d");but a small amount of data lack userId, some data is ("China","CNY"). so when run the 3rd step("Extract Fact Table Distinct Columns"),MR engine will throw exception if the streaming data lack userId.
The I check the source of Kylin, FactDistinctColumnsMapper.java:
public void doMap(KEYIN key, Object record, Context context) throws IOException, InterruptedException {
Collection<String[]> rowCollection = flatTableInputFormat.parseMapperInput(record);
for (String[] row : rowCollection) {
context.getCounter(RawDataCounter.BYTES).increment(countSizeInBytes(row));
for (int i = 0; i < allCols.size(); i++) {
String fieldValue = row[columnIndex[i]];
if (fieldValue == null)
continue;
final DataType type = allCols.get.getType();
...
I find that columnIndex[i] is equal with the size of row if the streaming data lack one column. So the row[columnIndex[i]] will throw the ArrayIndexOutOfBoundsException. So I change this code, check the columnIndex[i] and the size of row. If columnIndex[i] is equal with or larger than the size of row, I set fieldValue empty value. And After I change my code, the 3rd step("Extract Fact Table Distinct Columns") will run success.
Those are what I found, which will cause problem for developers.
How do you think?
Best regard
jintao
Attachments
Issue Links
- links to