[HIVE-672] Integrate weka with Hive - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Weka is one of the most popular data mining package on the planet. It's used by numerous people around the world. Since weka is in Java, it should be pretty straight-forward to integrate weka with Hive.

We just need to create some GenericUDAF functions that maps to Weka classifier training process. The output of the GenericUDAF can just be the serialized version of the trained classifiers.
We should add another GenericUDF to load the classifier to classify new instances.

The hive syntax can be as simple as this: (Note: In the example above, most of the "table." can be omitted. I put it there just for easier understanding of the query semantics.)

The query builds a model (logistic regression) for predicting the CTR of each link on each page, based on user information, and evaluates the model on some data.

SELECT logdata.pageid, logdata.linkid, LogisticRegression( logdata.clicked, userinfo.age, userinfo.gender, userinfo.country, userinfo.interests ) as model
FROM logdata JOIN userinfo
ON logdata.userid = userinfo.userid
GROUP BY logdata.pageid, logdata.linkid;

SELECT logdata.pageid, logdata.linkid, logdata.clicked, LogisticRegressionEvaluate(classifiers.model, userinfo.age, userinfo.gender, userinfo.country, userinfo.interests) AS predicted
FROM logdata JOIN userinfo
ON logdata.userid = userinfo.userid
JOIN classifiers
ON logdata.pageid = classifiers.pageid AND logdata.linkid = classifiers.linkid

References:
Use Weka in your Java Code: http://weka.wiki.sourceforge.net/Use+Weka+in+your+Java+code

Note:
Weka is under GPL license. We won't be able to include the code directly into Hive, but we can keep the discussions here.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-672.1.not.to.be.included.patch
28/Jul/09 06:05
20 kB
Zheng Shao
HIVE-672.2.not.to.be.included.patch
20/Aug/09 21:02
12 kB
Zheng Shao
weka.jar
20/Aug/09 21:02
5.09 MB
Zheng Shao

Activity

People

Assignee:: Zheng Shao

Reporter:: Zheng Shao

Votes:: 3 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 22/Jul/09 06:17

Updated:: 20/Aug/09 21:02