Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4299

Issue with building real-time segment cache into HBase when using S3 as working dir

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: v3.0.0-alpha2
    • Fix Version/s: v3.1.0
    • Component/s: Real-time Streaming
    • Labels:
      None

      Description

      We have an issue with using S3 as working dir for Kylin when using real-time streaming. The reason why we would like to do this is to have no state in HDFS, so the actual runtime environment running Kylin becomes stateless.
      We already have HBase data on S3, but there is persistent data also in kylin.env.hdfs-working-dir (cube dictionaries), so we need to have that in S3 as well to have a setup where it's possible to fail over to a new cluster without having to rebuild all cubes.

      We are using the real-time streaming feature in Kylin, which persists segment caches hourly and a MR job merges those hourly segments into HBase. In these MR jobs, we get the following exception:

      Error: java.lang.IllegalArgumentException: Wrong FS: s3://kylin-XXXXX/kylin-dev/hdfs-rootdir/kylin_metadata/stream/tops_jaywalks/20191206010000_20191206020000/1/1, expected: hdfs://ip-24-0-3-243.us-west-2.compute.internal:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:214) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:897) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1551) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1577) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1625) at org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:1808) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1807) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1785) at org.apache.hadoop.fs.FileSystem$6.<init>(FileSystem.java:1887) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1885) at org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.checkPath(ColumnarFilesReader.java:46) at org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.<init>(ColumnarFilesReader.java:41) at org.apache.kylin.engine.mr.streaming.DictsReader.<init>(DictsReader.java:43) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.init(ColumnarSplitDictReader.java:65) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.<init>(ColumnarSplitDictReader.java:52) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictInputFormat.createRecordReader(ColumnarSplitDictInputFormat.java:32) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
      

        Attachments

          Activity

            People

            • Assignee:
              hit_lacus Xiaoxiang Yu
              Reporter:
              ainagy Andras Istvan Nagy
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: