Many websites today rely on media be it images or videos. One reason is humans are visual creatures, they like seeing things versus reading long articles.
Images on websites can be divided into two different types: Images used in the website theme including logos, rounded corners, backgrounds …etc. and images used as content. Obviously, the ones discussed here are the content images. Images used in the website theme will be accessed frequently and almost with every page view, and usually they are very few and better managed by storing them in a directory.
However, when dealing with images that build the content of the websites developers will tend sometimes to store them in their sql databases rather than just storing references to them. There are pros and cons for each approach:
Storing the images outside the database:
For a very long time, this approach has been considered the most appropriate approach. The binaries itself gets stores outside the database, while references are stored inside the database. This type of implementation is driven by the ideas that:
- It will be faster to access images off the hard drives, than through the database.
- It will take less space, since the database servers will usually add more records for locking and several other extra functionalities.
While actually true, problems start once the number of images grow. Although one can maintain referential integrity inside the database, there is nothing that will stop an image from being deleted from the directory structure. Also, nothing will stop and image from being deleted while its record still exists in the database resulting in dangling pointers.
A third and more serious problem is enforcing access rules. One of the biggest implementers of this approach is the famous image sharing website flickr.com . Flickr uses mysql to store image pointers, while the images itself are stored on disk and can be accessed through the lighthttpd server. While flickr enforces several rules to filter the pictures returned in its search results using geographic and login usernames/password any image can be viewed without logging in using its URL regardless of the fact that it needs to be filtered.
Storing images inside the sql database server:
That approach solves the previous problems. Since the image is stored inside the database:
- Authentication and access rights can be verified before the image is returned.
- On the fly thumbnails can be created thus more space saved.
- Referential integrity can be maintained.
- Cascaded deletions can be implemented. So when a user deletes an album, all the images inside the album will also be deleted.
- The whole application will be less complex.
Database engines use several types to store binary data. Postgresql uses byta, mysql uses blob types. Both achieve the same objective of storing binary object. This approach has its own problems as well. The database will now occupy a larger portion of the hard drive, larger than the size of the binary images itself. Retrieving the images will be slightly slower. My experiments in my other website (mibrahim.net) showed that retrieving images from postgresql was actually not bad, and for users accessing the website they won’t even feel the difference. One major disadvantage is database backup/restore. Since the size of the database is larger, dumping the database for backup can sometimes take days. If you are using WALs then most likely your WALs will be bloated and big from binary inserts. Bottom line, you’ll gain a lot but you’ll have to pay more money and buy larger drives to store the larger data.
There is a nice article written by Oracle having more details about this issue.
So in terms of clker.com, this is a planned functionality – to move all images and pictures inside the postgresql database. I believe that my server has the cpu and HD cycles to carry this load, and I know from my previous experience it will definitely help in decreasing the complications I have now.