[JENA-1313] Language-specific collation in ARQ - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: Jena 3.2.0
Fix Version/s: Jena 3.3.0
Component/s: ARQ
Labels:
None

Description

As discussed on the users mailing list in October 2016, I would like to change ARQ collation of literal values to be language-aware and respect language-specific collation rules.

This would probably involve changing at least the NodeUtils.compareLiteralsBySyntax method.

It currently sorts by lexical value first, then by language tag. Since the collation order needs to be stable across all possible literal values, I think the safest way would be to sort by language tag first, then by lexical value according to the collation rules for that language.

But what about subtags like @en-US or @pt-BR? Can they have different collation rules than the main language? It would be a bit strange if all @en-US literals sorted after @en literals...

It would be good to check how Dydra does this and possibly take the same approach. See the message linked above for further backgound.

I've been talking with kinow about this and he may be interested in implementing it.

Attachments

Issue Links

links to

GitHub Pull Request #237

GitHub Pull Request #262

Activity

People

Assignee:: Bruno P. Kinoshita

Reporter:: Osma Suominen

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 30/Mar/17 07:49

Updated:: 21/Jul/17 09:29

Resolved:: 08/Jul/17 20:28