Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
0.9.1
-
OS -version
================================
Linux version 2.6.18-194.3.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48))software installed
=======================
hadoop-1.0.4
pig-0.9.1Hardware details
====================================
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5560 @ 2.80GHz
stepping : 4
cpu MHz : 2800.098
cache size : 8192 KB
fpu : yes
fpu_exception : yes
cpuid level : 11OS -version ================================ Linux version 2.6.18-194.3.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) software installed ======================= hadoop-1.0.4 pig-0.9.1 Hardware details ==================================== processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU X5560 @ 2.80GHz stepping : 4 cpu MHz : 2800.098 cache size : 8192 KB fpu : yes fpu_exception : yes cpuid level : 11
-
Patch Available
-
Urgent
Description
Hi ,
I have a use case in my project requirement,
The i/p file consist of the following pattern:-
192.168.90.36 - - [16/May/2012:16:00:11 -0700] "GET /img/explore/encyclopedia/characters/yoda_card.jpg HTTP/1.1" 200 22620 "http://www.starwars.com/explore/encyclopedia/characters/2/featured/" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" "Wookie-Cookie=474ca6b302a46696a1ec55f4b656f8c3; __utma=181359608.119611689.1337206567.1337206567.1337206567.1; __utmb=181359608.79.9.1337209104786; __utmc=181359608; __utmz=181359608.1337206567.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); JSESSIONID=aHX_NQheRq08" "-" 0
I want to run a aggregate function along with regex_extract_all to extract the desired data.
Even though the i/p file is parsing.I have issue with aggregate function working on it.
Please find the below pig script:-
**************Ip_adress-count***********************
Ip_adress_count.pig
A = LOAD 'starwar_log1' USING TextLoader AS (line:chararray);
B = FOREACH A GENERATE FLATTEN (REGEX_EXTRACT_ALL(line,'^(
S+) (
S+) (
S+) \\[(\\w:/+\\s[+\\-]
d
)
] "(.?)" (
S) (
S+) "([^"])" "([^"])" "([^"]*)" (
S+) ') ) AS
(
remoteAddr: chararray,
remoteLogname: chararray,
user: chararray,
time: chararray,
request: chararray,
status: int,
bytes_string: chararray,
referrer: chararray,
Mozilla: chararray,
wookie_cookie: chararray,
browser3: chararray,
acess_status:int
);
C = group B by remoteAddr;
D = foreach C generate COUNT(B) as ip_adress_count;
E = order D by ip_adress_count;
F = STORE E INTO ‘ip_adress_count/' using PigStorage(',');
Expected O/p
===========================
ip_adress_count
remoteAddr,ip_adress_count
192.168.90.36,19
192.168.90.37,1
There is no parsing issue but the aggregate function count() is not working over the regex_extract_all function for regular expression.
Please do the need.The requirement is I need the count of the ip adresses from the ip data.
thanks,
siddharth
contact -8763666372