Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.13.0
-
None
-
None
Description
This is a jira to submit the patch where we propose, PIG as the primary language for expressing realtime stream processing logic and provide a working prototype on Storm. This includes running the existing PIG UDFs, seamlessly on Storm. Though PIG or Storm do not take any position on state, this system also provides built-in support for advanced state semantics like sliding windows, global mutable state etc, which are required in real world applications.
Associated talk and slides:
https://www.youtube.com/watch?v=fd-I5EtxSuI&feature=youtu.be
http://www.slideshare.net/Hadoop_Summit/t-435p230-cjain
-----------
Setup
-----------
Prereq.
Storm and Hadoop23 are installed on the host
- Unpack PigOnStorm tarball (It packs pig as well)
- export PATH=<storm_home>/bin:<hadoop_home>/bin:$PATH
- export JAVA_HOME=<java_home>
- export PIG_HOME=<directory for PigOnStorm installation>
- export PIG_CLASSPATH=<storm_home>/conf:<hadoop_home>/conf (to get the storm configuration, e.g. nimbus host, hadoop site.xml etc)
- “pig -x storm_local” to launch pig with Storm Local mode (No Storm remote cluster needed)
- “pig -x storm” to launch pig with Remote storm cluster
Use Hbase
- export export HBASE_HOME=<hbase_home_dir>
-----------
Available UDFs
-----------
- File loader
load '/tmp/props' using org.apache.pig.backend.hadoop.executionengine.storm.emitters.POSFileEmitter;
- Hbase WindowStore
store A into 'testTable' using org.apache.pig.backend.hadoop.executionengine.storm.state.hbase.WindowHbaseStore('fam');
- Hbase WindowLoad
load 'testTable,-10,-1' using org.apache.pig.backend.hadoop.executionengine.storm.state.hbase.WindowHbaseStore('fam');