 # simplify or alternative Similarity arithmetic(AbstractDistributedVectorSimilarity) for boolean data

XMLWordPrintableJSON

#### Details

• Improvement
• Status: Closed
• Major
• Resolution: Not A Problem
• 0.4
• None

#### Description

For boolean data ,the prefValue is always 1.0f, We need simplify Similarity arithmetic

for example:
1) DistributedEuclideanDistanceVectorSimilarity

/**

• distributed implementation of euclidean distance as vector similarity measure
*/
public class DistributedEuclideanDistanceVectorSimilarity extends AbstractDistributedVectorSimilarity {

@Override
protected double doComputeResult(int rowA, int rowB, Iterable<Cooccurrence> cooccurrences, double weightOfVectorA,
double weightOfVectorB, int numberOfColumns) {

double n = 0.0;
double sumXYdiff2 = 0.0;

for (Cooccurrence cooccurrence : cooccurrences)

{ double diff = cooccurrence.getValueA() - cooccurrence.getValueB(); sumXYdiff2 += diff * diff; n++; }

return n / (1.0 + Math.sqrt(sumXYdiff2));
}

}

this one is always return n (=cooccurrence.size())
2) DistributedUncenteredCosineVectorSimilarity
/**

• distributed implementation of cosine similarity that does not center its data
*/
public class DistributedUncenteredCosineVectorSimilarity extends AbstractDistributedVectorSimilarity {

@Override
protected double doComputeResult(int rowA, int rowB, Iterable<Cooccurrence> cooccurrences, double weightOfVectorA,
double weightOfVectorB, int numberOfColumns) {

int n = 0;
double sumXY = 0.0;
double sumX2 = 0.0;
double sumY2 = 0.0;

for (Cooccurrence cooccurrence : cooccurrences)

{ double x = cooccurrence.getValueA(); double y = cooccurrence.getValueB(); sumXY += x * y; sumX2 += x * x; sumY2 += y * y; n++; }

if (n == 0)

{ return Double.NaN; }

double denominator = Math.sqrt(sumX2) * Math.sqrt(sumY2);
if (denominator == 0.0)

{ // One or both vectors has -all- the same values; // can't really say much similarity under this measure return Double.NaN; }

return sumXY / denominator;
}

}

this one will always return 1.0
3) DistributedUncenteredZeroAssumingCosineVectorSimilarity
If n users like ItemA, m users like ItemB,p users like both ItemA and ItemB,

DistributedUncenteredZeroAssumingCosineVectorSimilarity return p/(m*n).

it also can use for Boolean data, but we can provide a simple one , return (p*p)/(m*n),no so much computing.

#### People Sean R. Owen Han Hui Wen