Finding repeated images - Part 2 - More on vector similarity
Saturday, July 19th, 2008This post took more time to write than the previous ones. Reason is when we write down our ideas, we see where are the problems, and that’s what exactly happened here. After writing what I was doing I figured out mistakes, which could lead to enhancing my current implementation.
Objective: Our objective is to figure out a way to compare elements of two vectors A & B. We have no knowledge about the range occupied by those elements. They can be positive or negative, and can be of different orders of magnitude.
Problem: Tanimoto’s coefficient discussed in Part 1 will not work due to the difference in orders of magnitude. Bigger elements will mask smaller ones.
Possible solution: We should normalize the value of elements to something that we know and is of the same order of magnitude. Mahalanobis distance accomplishes this by dividing the difference between a point and the mean by the variance. An approach similar to this would work, it is just that we have only two points to compare and thus the variance is 0.25 the square of the distance. So if the two elements were
, the the variance is
, which results in a constant Mahalanobis distance of 0.5.
Old and wrong ways of comparison
So the old and wrong idea was to divide the modulus of the difference by the modulus of the mean. By this way if the two values are similar to each other, the metric is small and if they are different the metric is big and is normalized against their order of magnitude. In order to make the value of this similarity metric between 0 and 1, we do this:

Actaully I used this similarity measure, and it seemed to improve my results than before. However, after I plotted it I saw a big flaw.
















