Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-1834

java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • v1.5.2, v1.5.2.1
    • v1.6.0
    • None
    • None

    Description

      Getting exception in Step 4 - Build Dimension Dictionary:

      java.lang.IllegalArgumentException: Value not exists!
      at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
      at org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
      at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
      at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
      at org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
      at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
      at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
      at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
      at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
      at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
      at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
      at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
      at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
      at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
      at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)

      result code:2

      The code which generates the exception is:

      org.apache.kylin.dimension.Dictionary.java:

      /**

      • A lower level API, return ID integer from raw value bytes. In case of not found
      • <p>
      • - if roundingFlag=0, throw IllegalArgumentException; <br>
      • - if roundingFlag<0, the closest smaller ID integer if exist; <br>
      • - if roundingFlag>0, the closest bigger ID integer if exist. <br>
      • <p>
      • Bypassing the cache layer, this could be significantly slower than getIdFromValue(T value).
      • @throws IllegalArgumentException
      • if value is not found in dictionary and rounding is off;
      • or if rounding cannot find a smaller or bigger ID
        */
        final public int getIdFromValueBytes(byte[] value, int offset, int len, int roundingFlag) throws IllegalArgumentException
        Unknown macro: { if (isNullByteForm(value, offset, len)) return nullId(); else { int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag); if (id < 0) throw new IllegalArgumentException("Value not exists!"); return id; } }

      ==========================================================

      The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 mio rows. I have increased the JVM -Xmx to 16gb and set the kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube build doesn't fail (previously we were getting exception complaining about the 300MB limit for Dimension dictionary size (req. approx 700MB)).

      ==========================================================

      Before that we were getting exception complaining about the Dictionary encoding problem - "Too high cardinality is not suitable for dictionary – cardinality: 10873977" - this we resolved by changing the affected dimension/row key Encoding from "dict" to "int; length=8" on the Advanced Settings of the Cube.

      ==========================================================

      We have 2 high-cardinality fields (one from fact table and one from the big dimension (customer - see above). We need to use in distinc_count measure for our calculations. I wonder if this exception Value not found! is somewhat related ??? Those count_distinct measures are defined one with return type "bitmap" (exact precission - only for Int columns) and 2nd with return type "hllc16" (error rate <= 1.22 %)

      ==========================================================

      I am looking for any clues to debug the cause of this error and way how to circumwent this ...

      Attachments

        Issue Links

          Activity

            People

              liyang.gmt8@gmail.com liyang
              calaba@gmail.com Richard Calaba
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: