|
Archive for the ‘development’ Category
Saturday, July 19th, 2008
Series Part 1, Part 2
This post took more time to write than the previous ones. Reason is when we write down our ideas, we see where are the problems, and that’s what exactly happened here. After writing what I was doing I figured out mistakes, which could lead to enhancing my current implementation.
Objective: Our objective is to figure out a way to compare elements of two vectors A & B. We have no knowledge about the range occupied by those elements. They can be positive or negative, and can be of different orders of magnitude.
Problem: Tanimoto’s coefficient discussed in Part 1 will not work due to the difference in orders of magnitude. Bigger elements will mask smaller ones.
Possible solution: We should normalize the value of elements to something that we know and is of the same order of magnitude. Mahalanobis distance accomplishes this by dividing the difference between a point and the mean by the variance. An approach similar to this would work, it is just that we have only two points to compare and thus the variance is 0.25 the square of the distance. So if the two elements were , the the variance is , which results in a constant Mahalanobis distance of 0.5.
Old and wrong ways of comparison
So the old and wrong idea was to divide the modulus of the difference by the modulus of the mean. By this way if the two values are similar to each other, the metric is small and if they are different the metric is big and is normalized against their order of magnitude. In order to make the value of this similarity metric between 0 and 1, we do this:

Actaully I used this similarity measure, and it seemed to improve my results than before. However, after I plotted it I saw a big flaw.
(more…)
Posted in Internet general, Internet programming, Sci/Research, clker.com, development, linux | 1 Comment »
Friday, July 18th, 2008
Plots GNUPlot charts without GNUPlot on your server. This plugin communicates with our custom version of GNUPlot hosted at clker.com, and responds with a PNG chart or errors in case of errors.
Write your GNUPlot code between [ gplot] and [ /gplot] (without spaces). Maximum chart size is 1×1.
To install
- Copy the file ( gnuplot plugin ) in you wp-content/plugins directory, and rename to .php instead of .phptxt.
- Create wp-content/cache directory, and make sure it is write able to the webserver
- Activate the plugin from the plugins tab inside wordpress
Example:
[ gplot]
set size 1,0.7
set dummy u,v
unset key
set parametric
set view 60, 30, 1.1, 1.33
set isosamples 50, 20
set title "Interlocking Tori - PM3D surface with depth sorting"
set urange [ -3.14159 : 3.14159 ] noreverse nowriteback
set vrange [ -3.14159 : 3.14159 ] noreverse nowriteback
set pm3d depthorder
splot cos(u)+.5*cos(u)*cos(v),sin(u)+.5*sin(u)*cos(v),.5*sin(v) with pm3d,\
1+cos(u)+.5*cos(u)*cos(v),.5*sin(v),sin(u)+.5*sin(u)*cos(v) with pm3d
[/ gplot]
would produce this:
… Enjoy
Tags: GNUPlot plugin Posted in GNUPlot plugin, Internet general, Internet programming, development, linux | No Comments »
Wednesday, July 16th, 2008
While I was writing the repeated images identification post, I modified the mimetex wordpress plugin to be the GNUPlot wordpress plugin.
The plugin executes GNUPlot over any portion of the text enclosed between [ gplot] and [/ gplot] tags, without the spaces of course.
Example:
[ gplot]
set size 0.75, 0.3
set xrange[0:5]
plot sin(x) title “sin(x)”, sin(2*x) title “sin(2x)”
[/ gplot]
would generate:
Download: Download the GNUPlot plugin for wordpress
Installation:
- Make sure that your server has gnuplot installed
- Create the directory <wordpress>/wp-content/cache, and make sure it is writable by the web server
Enjoy
Technorati Tags: wordpress, chart, gnuplot, plugin
Tags: chart, gnuplot, GNUPlot plugin, plugin, wordpress Posted in GNUPlot plugin, Internet general, Internet programming, clker.com, development | No Comments »
Wednesday, July 16th, 2008
Series Part 1, Part 2
I’ve been playing around with spatial matching on clker.com . My goal was to figure out whether an image being submitted already exists or not, and to do that very fast. Titles, tags and all information in the image can change, so basically they are useless when it comes to know whether an image is repeated with high confidence. What is really needed is a set of features, that can be extracted fast enough and stored in the database, and indexed in a practically searchable manner.
(more…)
Tags: images, indexing, media, pictures, repeated images, search, similarity, spatial comparison, spatial search Posted in Sci/Research, Uncategorized, clker.com, development, linux | 2 Comments »
Monday, May 12th, 2008
I’ve been looking around lately on the best way to cache SQL results in PHP. I found some interesting articles posted in lots of places, but I didn’t find any that exactly matches my needs. The problem I have on hand is basically the same every growing website faces: decreasing mean resource usage per page request.
Now - this is my plan A to keep up with the website’s growth without a lot of hardware upgrades. There is a plan B, but I will keep that to a later post.
(more…)
Tags: apache, bench, cache, optimization, php, server, sql Posted in development | No Comments »
Thursday, March 27th, 2008
A while ago, I thought about creating a tar.gz file for every download, so that if someone runs a search, he/she then can download all the images in the results. After a little bit of research, I found that PHP has a function for gzip. I also knew that the tar format just sticks files after one another, so if I can implement the tar format in PHP then I can gzip all images in the results.
I found this LGPL code that implemented the tar format. I used it (and modified it a little bit) to produce the online tar.gz functions:
-
// Computes the unsigned Checksum of a file’s header
-
// to try to ensure valid file
-
// PRIVATE ACCESS FUNCTION
-
function __computeUnsignedChecksum($bytestring)
-
{
-
for($i=0; $i<512; $i++)
-
$unsigned_chksum += ord($bytestring[$i]);
-
for($i=0; $i<8; $i++)
-
$unsigned_chksum -= ord($bytestring[148 + $i]);
-
$unsigned_chksum += ord(" ") * 8;
-
-
return $unsigned_chksum;
-
}
-
-
// Generates a TAR file from the processed data
-
// PRIVATE ACCESS FUNCTION
-
function tarSection($Name, $Data, $information=NULL)
-
{
-
// Generate the TAR header for this file
-
-
-
$header .= str_pad("777", 7, "0",STR_PAD_LEFT ) . chr(0);
-
$header .= str_pad(decoct($information["user_id"]), 7, "0",STR_PAD_LEFT ) . chr(0);
-
$header .= str_pad(decoct($information["group_id"]), 7, "0",STR_PAD_LEFT ) . chr(0);
-
-
-
-
$header .= "0";
-
-
-
-
$header .= str_pad($information["user_name"], 32, chr(0));
-
$header .= str_pad($information["group_name"], 32, chr(0));
-
-
-
-
-
-
// Compute header checksum
-
$checksum = str_pad(decoct(__computeUnsignedChecksum ($header)), 6, "0",STR_PAD_LEFT );
-
for($i=0; $i<6; $i++) {
-
$header[(148 + $i)] = substr($checksum, $i, 1);
-
}
-
-
-
-
// Pad file contents to byte count divisible by 512
-
-
-
// Add new tar formatted data to tar file contents
-
$tar_file = $header . $file_contents;
-
-
return $tar_file;
-
}
-
-
function targz($Name, $Data)
-
{
-
return gzencode(tarSection ($Name, $Data), 9);
-
}
-
To use those functions all you have to do is send a header with the mime type for the tar gz ( application/x-gzip ) using the php header function. To add a tar/gz section for a file, read the file in an array using filegetcontents and pass the filename and data to the targz function. Echo what is returned. That’s it!
So why is it not active on clker.com website? I actually tried it and found that compression consumes a lot of CPU. In the first 20 minute I had more than one hundred connections for different users downloading their results and the CPU was saturated. This basically left no CPU for searching. So use it carefully, and only if you really need that functionality.
Technorati Tags: tar, gz, compress, online, php
Tags: compress, gz, online, php, tar Posted in Internet general, development, linux | No Comments »
Monday, March 3rd, 2008
Many websites today rely on media be it images or videos. One reason is humans are visual creatures, they like seeing things versus reading long articles.
Images on websites can be divided into two different types: Images used in the website theme including logos, rounded corners, backgrounds …etc. and images used as content. Obviously, the ones discussed here are the content images. Images used in the website theme will be accessed frequently and almost with every page view, and usually they are very few and better managed by storing them in a directory.
(more…)
Tags: access speed, binary storage, database server, file system, images, pictures, sql server, storage Posted in Internet general, development | No Comments »
Saturday, March 1st, 2008
I love using Linux to do my work. My best usage of Linux is my web server, although I recall I read once that Linus never intended for the kernel to be used as a server. He was more focused on using the kernel in desktops. I’ve been running my own web server for almost a year now, which runs two websites mibrahim.net my real estate website, and clker.com a to be online clipart website - we’re halfway there.
The fun part is simply everything just works. You’ll have all the tools you need starting from database engines like postgres, mysql to scripting languages like php, ruby with different types of webservers apache, lighthttpd and others. All the tools you might think of are there and under your own hand. Building your own server is not expensive - around $100 will do it. You don’t need a super quad core machine to produce extremely fast websites, unless you are already getting more than 50 page requests per second and at that point you will need something faster.
The performance bottle neck is never the CPU, it’s the hard drives read or write speeds. You can improve on that using fake RAIDs. Almost all Linux distros offer fake RAIDs and that is the cheapest way to improve the read speed.
Setting up your server is not a hard process. The best distributions that I recommend are Debian and Ubuntu. The reason is the very large library of software that comes with each. I believe that now the full distribution has grown more than 11 CDs. I used to run Debian and switched to Ubuntu a year ago and the reason behind the switch is the faster updates I get from Ubuntu, which enables me to use more recent and updated versions of PHP and the database engines.
The easiest setup is using the Ubuntu server CD, which is not any different from the desktop CD in terms of binaries. The only difference is that it won’t install the X11-server (GUI) and the window managers (gnome or kde) and the install program itself runs over the console and not VGA graphics. I use the server installation, and connect to my server using ssh. I have another old machine that runs Ubutu as well, and is used to run freenx. By that way I keep the server’s memory for the services running, and I can add all the GUI programs I want on this old machine.
Since I greatly benefited from running my own web server, I will share my experiences every now and then when I’ve got time to write.
Technorati Tags: ubuntu, linux, server, apache, database, sql server, database server, web server
Tags: apache, database, database server, linux, server, sql server, ubuntu, web server Posted in Internet general, development, linux, server | No Comments »
|
|