Our highest priority is to satisfy the customer through early and continuous delivery of valuable and working software.

Wednesday, April 14, 2010

Webpage screen capturing using khtml2png

Recently, we were working on one PHP project, where we required to have "webpage screen capturing" functionality. I googled on net and found some tools... some window based, some paid... obviously I was looking for *FREE* tool :). As we are working on Lamp (Linux, Apache, MySQL & PHP), I was wondering if I get some Linux based tool.

One solution I found was, using html2ps and then ps2png/ps2jpg/ps2gif to convert it to image. Then ImageMagicK for image manipulation. Somehow I stuck with some weird memory related errors, some package conflicts, some formatting issues etc. So, after spending one day on nothing; I dropped this solution.

Then I tried khtml2png (http://khtml2png.sourceforge.net) and after some r&d, it worked for us.

Some points to remember...
- You need to have VPS/dedicated hosting to setup these tools. On shared hosting, its not possible to install due to various restrictions by hosting providers.

- This tool requires, some libraries and tools: g++, KDE 3.x, kdelibs for KDE 3.x, zlib (zlib1g-dev) and cmake

- This tool uses KDE (K Desktop Environment), that means whenever you use khtml2png tool, it will open one window for *a while* at time of capturing webpage screenshot. We can remove this by using "Xvfb". We will see how to install and configure it later.

- These links will be helpful, if you are planning to develop web application with webpage screen capturing using khtml2png
http://khtml2png.sourceforge.net/index.php?page=faq
http://www.mysql-apache-php.com/website_screenshot.htm

Here is step by step guide to install various dependencies and packages. (I installed these tools on Fedora7 & RHEL5 successfully)

I used "yum" command to install and auto-configure these tools. If "yum" is not available on your machine, get if from http://yum.baseurl.org/ and install it.

Step:1

yum install ImageMagick

yum install Xvfb

yum install gcc gcc-c++ automake autoconf nano zlib zlib-devel

yum groupinstall "X Window System" "KDE (K Desktop Environment)"

yum install kdelibs kdelibs-devel

yum install Xvfb xorg xorg-x11-font*

Step:2 Install *cmake*
Go to share directory by typing command
cd /usr/local/share/
or any preferred directory where you want to download package. (check http://www.cmake.org for latest "cmake" version)

wget http://www.cmake.org/files/v2.8/cmake-2.8.1.tar.gz

tar -xzvf cmake-2.8.1.tar.gz

cd cmake-2.8.1

./bootstrap

make

make install

Step:3 Download & Install *khtml2png* on your server as per instructions in this link.
http://khtml2png.sourceforge.net/index.php?page=download

Step:4 Check if *khtml2png* is working

/usr/local/bin/khtml2png2 'http://www.yahoo.com' yahoo.png

(this will capture yahoo homepage in yahoo.png)

Step:5 Install *khtmld* (a daemon which will be required to run khtml2png in background)
http://wiki.goatpr0n.de/projects/khtmld

I faced couple of problems while setting up *khtmld*, but it got solved by reading suggestions from above link.

I installed above all tools as *root* user.

Once you are done with above steps, lets play with *khtml2png*

How to start?
Run following command to run khtml2png without a visible X session

Xvfb :2 -screen 0 1024x768x24&
export DISPLAY=localhost:2.0
(you can put above 2 lines in rc.local so it will start automatically whenever server restarts)

Then start *khtmld* daemon as your webserver user (for me it is *apache*) so that PHP script can have permission to talk with this daemon. (run below command after login as *root* user)

khtmld -K /usr/local/bin/khtml2png2 -c /etc/khtmldrc --user apache&

"-K /usr/local/bin/khtml2png2" is path to khtml2png2 as by default "khtmld" will look for old "khtml2png" (khtml2png2 is latest version). Find khtml2png2 path using

whereis khtml2png2

"-c /etc/khtmldrc" is config file path for khtmld (you can create this config file if its not already there)
Sample content for khtmldrc

width=1024
height=768
display=:0.0

Capture image using *khtmld*

echo "http://www.yahoo.com /tmp/yahoo.png" >/tmp/khtmldspool
(for more details - http://wiki.goatpr0n.de/projects/khtmld)

We have also used ImageMagicK command "convert" (http://www.imagemagick.org/script/convert.php) to trim the image for removing whitespace.

convert /tmp/yahoo.png -fuzz 1% -trim /tmp/new.yahoo.png

Sample PHP code for capturing & displaying PNG image using "khtml2png"

<?php
ob_clean();
header("Cache-Control: no-cache");
header("Pragma: no-cache");
header("Content-type: image/png");

$webpage_url= "http://www.yahoo.com";

$out_put_file = "/tmp/yahoo.png"; //captured screen
$new_out_put_file = "/tmp/new.yahoo.png"; //whitespace removed

$cmd = "echo '".$webpage_url." ".$out_put_file."' >/tmp/khtmldspool";
exec("$cmd");

// some delay till khtml2png capture screen
while(!file_exists($out_put_file)) { sleep(3); }

exec("convert $out_put_file -fuzz 1% -trim $new_out_put_file");

while(!file_exists($new_out_put_file)) { sleep(1); }

// display image on browser
echo file_get_contents($new_out_put_file);

unlink($out_put_file);
unlink($new_out_put_file);
exit;
?>


Hope this will be helpful.

That's all for now.