[SPARK-7559] Bucketizer should include the right most boundary in the last bucket. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.4.0
Fix Version/s: 1.4.0
Component/s: ML
Labels:
None

Target Version/s:

1.4.0

Description

Now we use special treatment for +inf. This could be simplified by including the largest split value in the last bucket. E.g., (x1, x2, x3) defines buckets [x1, x2) and [x2, x3]. This shouldn't affect user code much, and there are applications that need to include the right-most value. For example, we can bucketize ratings from 0 to 10 to bad, neutral, and good with splits 0, 4, 6, 10.

Attachments

Issue Links

links to

[Github] Pull Request #6075 (mengxr)

Activity

People

Assignee:: Xiangrui Meng

Reporter:: Xiangrui Meng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/May/15 07:40

Updated:: 13/May/15 00:38

Resolved:: 13/May/15 00:38