corner image corner image
corner image corner image
corner image corner image
corner image corner image
corner image corner image
corner image corner image
corner image corner image

Archive for the ‘Sci/Research’ Category

Finding repeated images - Part 2 - More on vector similarity

Saturday, July 19th, 2008

Series Part 1, Part 2, Part 3

This post took more time to write than the previous ones. Reason is when we write down our ideas, we see where are the problems, and that’s what exactly happened here. After writing what I was doing I figured out mistakes, which could lead to enhancing my current implementation.

Objective: Our objective is to figure out a way to compare elements of two vectors A & B. We have no knowledge about the range occupied by those elements. They can be positive or negative, and can be of different orders of magnitude.

Problem: Tanimoto’s coefficient discussed in Part 1 will not work due to the difference in orders of magnitude. Bigger elements will mask smaller ones.

Possible solution: We should normalize the value of elements to something that we know and is of the same order of magnitude. Mahalanobis distance accomplishes this by dividing the difference between a point and the mean by the variance. An approach similar to this would work, it is just that we have only two points to compare and thus the variance is 0.25 the square of the distance. So if the two elements were a_1, b_1, the the variance is 0.25(a-b)^2, which results in a constant Mahalanobis distance of 0.5.

Old and wrong ways of comparison

So the old and wrong idea was to divide the modulus of the difference by the modulus of the mean. By this way if the two values are similar to each other, the metric is small and if they are different the metric is big and is normalized against their order of magnitude. In order to make the value of this similarity metric between 0 and 1, we do this:

Sim(a_i,b_i)=\left(1+\frac{|a_i-b_i|}{(|a_i+b_i|)/2}\right)^{-1}

Actaully I used this similarity measure, and it seemed to improve my results than before. However, after I plotted it I saw a big flaw.

(more…)

Finding repeated images - part 1 - Vector Similarity Measures

Wednesday, July 16th, 2008

Series Part 1, Part 2, Part 3

I’ve been playing around with spatial matching on clker.com . My goal was to figure out whether an image being submitted already exists or not, and to do that very fast. Titles, tags and all information in the image can change, so basically they are useless when it comes to know whether an image is repeated with high confidence. What is really needed is a set of features, that can be extracted fast enough and stored in the database, and indexed in a practically searchable manner.

(more…)

corner image corner image
3,081 spam comments
blocked by
Akismet