Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-3644

NumberFormatExcetion on null values when building cube with Spark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • v2.5.0
    • v2.5.1
    • Spark Engine
    • None

    Description

      We encounter an error any time we try to build a cube with the following steps :

      • upload a csv on AWS S3 with following characteristics : the column on which the measure will be defined has some null values (Cf. attachment)
      • create a hive table with spark
      • create a model on  top of this table,
      • create a cube with a SUM measure
      • chose Spark as Engine
      • Launch build

      Result : The build process fails at '#7 Step Name: Build Cube with Spark' with the following error :

       

      """"""

      18/10/23 09:25:39 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopDataset at SparkCubingByLayer.java:253, took 7,277136 s

      Exception in thread "main" java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkCubingByLayer. Root cause: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 4, ip-172-31-35-113.eu-west-1.compute.internal, executor 4): java.lang.NumberFormatException: For input string: "\N"

         at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)

         at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)

         at java.lang.Double.parseDouble(Double.java:538)

         at org.apache.kylin.measure.basic.DoubleIngester.valueOf(DoubleIngester.java:38)

         at org.apache.kylin.measure.basic.DoubleIngester.valueOf(DoubleIngester.java:28)

         at org.apache.kylin.engine.mr.common.BaseCuboidBuilder.buildValueOf(BaseCuboidBuilder.java:162)

         at org.apache.kylin.engine.mr.common.BaseCuboidBuilder.buildValueObjects(BaseCuboidBuilder.java:127)

         at org.apache.kylin.engine.spark.SparkCubingByLayer$EncodeBaseCuboid.call(SparkCubingByLayer.java:297)

         at org.apache.kylin.engine.spark.SparkCubingByLayer$EncodeBaseCuboid.call(SparkCubingByLayer.java:257)

         at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)

         at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)

      """""

      Note 1: the build  process is OK when run with Map/Reduce Engine.

      Note 2: the error doesn't seem to be related to AWS environment.

       

      Sample of csv :

      ID;CATEGORIE;TEL;MONTANT;MAGASIN;MATRICULE;VILLE;

      970;161;6-98-6-6-42;838.47034;Magasin_19;Client_Matricule_28;MARSEILLE;
      971;89;62-15-2-64-86;;;Client_Matricule_1;LYON;
      972;87;17-64-97-74-42;;;Client_Matricule_105;ORBEC;
      973;174;79-33-90-0-55;;Magasin_7;Client_Matricule_55;AJACCIO;
      974;172;89-95-71-6-49;141.64174;Magasin_9;Client_Matricule_105;BASTIA;
      975;83;7-27-95-28-7;897.28204;;Client_Matricule_199;AJACCIO;
      976;170;67-72-18-29-34;164.07967;Magasin_3;Client_Matricule_137;LILLE;
      977;130;14-69-4-23-27;1928.9557;Magasin_1;Client_Matricule_17;NOMNOM;
      978;43;55-91-84-98-49;891.2691;Magasin_0;Client_Matricule_22;NOMNOM;
      979;117;98-96-0-54-39;1636.3994;Magasin_9;Client_Matricule_142;MARSEILLE;
      980;163;37-55-76-53-38;;;Client_Matricule_64;NEWYORK;
      981;106;32-40-6-46-15;;Magasin_2;Client_Matricule_158;NOMNOM;
      982;56;95-60-83-89-90;;;Client_Matricule_102;NOMNOM;
      983;168;21-56-62-0-58;;;Client_Matricule_160;NOMNOM;
      984;154;92-67-37-94-60;;;Client_Matricule_137;PARIS;

       

       

      Attachments

        1. 03_measure_cube.jpg
          100 kB
          Hubert STEFANI
        2. 02_dimension_cube.jpg
          130 kB
          Hubert STEFANI
        3. 01_overview_table.jpg
          128 kB
          Hubert STEFANI
        4. sortieData.csv
          60 kB
          Hubert STEFANI
        5. 00_zeppelin_notebook.jpg
          275 kB
          Hubert STEFANI

        Issue Links

          Activity

            People

              Wayne0101 Chao Long
              hstefani Hubert STEFANI
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: