[SPARK-4040] Update spark documentation for local mode and spark-streaming. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Documentation
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: Documentation
Labels:
None

Description

Note: this JIRA has changed since its inception - its not a bug, but something which can be tricky to surmise from existing docs. So the attached patch is a doc improvement.

Below is the original JIRA which was filed:

Please note that Im somewhat new to spark streaming's API, and am not a spark expert - so I've done the best to write up and reproduce this "bug". If its not a bug i hope an expert will help to explain why and promptly close it. However, it appears it could be a bug after discussing with rnowling who is a spark contributor.

CC rnowling willbenton

It appears that in a DStream context, a call to MappedRDD.count() blocks progress and prevents emission of RDDs from a stream.

    tweetStream.foreachRDD((rdd,lent)=> {
      tweetStream.repartition(1)
      //val count = rdd.count()  DONT DO THIS !
      checks += 1;
      if (checks > 20) {
        ssc.stop()
      }
   }

The above code block should inevitably halt, after 20 intervals of RDDs... However, if we uncomment the call to rdd.count(), it turns out that we get an infinite stream which emits no RDDs , and thus our program runs forever (ssc.stop is unreachable), because forEach doesnt receive any more entries.

I suspect this is actually because the foreach block never completes, because count() is winds up calling compute, which ultimately just reads from the stream.

I havent put together a minimal reproducer or unit test yet, but I can work on doing so if more info is needed.

I guess this could be seen as an application bug - but i think spark might be made smarter to throw its hands up when people execute blocking code in a stream processor.

Attachments

Issue Links

is related to

SPARK-4381 User should get warned when set spark.master with local in Spark Streaming

Resolved

links to

[Github] Pull Request #2964 (jayunit100)

Activity

People

Assignee:: jay vyas

Reporter:: jay vyas

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/Oct/14 20:50

Updated:: 13/Nov/14 13:36

Resolved:: 05/Nov/14 23:45