Sunday 16 December 2012

PHP, Wkhtmltopdf and Win 7: Eleven steps

Proof of Concept tutorial

This is a barebones guide on getting HTML to PDF / HTML to Image conversion using Windows 7, PHP and Wkhtmltopdf. This is a POC only; this tutorial does not describe server installation or configuration (using a package like WAMP makes it easy anyway). I might add those in a later revision or an extended version of this guide.

The steps
  1. Download wkhtmltopdf
  2. Installation step 1
  3. Installation step 2
  4. Installation step 3
  5. Start WampServer
  6. Open www-directory
  7. Create a test file
  8. Call the test file
  9. Examine generated
  10. Compare file contents

Download wkhtmltopdf


Download and run the "wkhtmltox" installer, that contains both wkhtmltopdf and wkhtmltoimage.

Installation step 1


After running the executable read the license agreement and click "I agree" if you do.

Installation step 2


Choose the installer to install both wkhtmltopdf and wkhtmltoimage but don't let it modify path because it breaks the PATH variable. If you want it to be in %PATH%, add it yourself.

Installation step 3


Choose the destination folder. I like the simple custom path C:\wkhtmltopdf.

Start your WWW Server

I use WAMP. It is very simple, easy to install, has sensible defaults but it's also very configurable.

Open www-directory


Wampserver opens with an icon in the system tray. Use it or otherwise navigate to your www root directory.

Create a test file


Create the following or similar test page and save it into your www root directory. The only PHP function used here is shell_exec(...); that executes the command given. The 2>>err1.txt forwards the standard error stream (stderr) to a file called err1.txt, which will be created if it doesn't exist. The 1>>out1.txt likewise forwards standard output stream (stdout) to the out1.txt file. Capturing both errors and normal output is useful in debugging. You need to call wkhtmltopdf but it's full path because it's not included in your environment PATH variable. Add it there and you can omit the folder.


<?php
// Test correct and failed output
shell_exec('c:\wkhtmltopdf\wkhtmltopdf --asdasdsadsad 2>> err1.txt 1>> out1.txt');
shell_exec('c:\wkhtmltopdf\wkhtmltopdf --version 2>> err2.txt 1>> out2.txt');
?>
<html>
<head>
</head>
<body>
<p>Magical ponies!</p>
</body>
</html>

Here we are intentionally forcing an error with the bogus parameter -asdasddas so we can test that we indeed do get stderr output.

Call the test file


Examine generated


Compare file contents


Examine your various outputs. The first test generated an almost empty file from standard output, but did generate some text in the error log err1.txt.
shell_exec('c:\wkhtmltopdf\wkhtmltopdf --asdasdsadsad 2>> err1.txt 1>> out1.txt');

The second test generated the standard version output as expected and nothing to standard error output.
shell_exec('c:\wkhtmltopdf\wkhtmltopdf --version 2>> err2.txt 1>> out2.txt');

If this is not the case, you need to check that you are pointing to the right path when calling wkhtmltopdf and that your PHP server is configured correctly. If you get no errors, check that PHP error reporting is on and check what the Apache and PHP error logs say. If your WAMP is installed into C:\wamp, your Apache log will by default be at C:\wamp\logs\apache_error.log and your PHP log will be at C:\wamp\logs\php_error.log.

Finally

This sort of testing is a proof of concept and is not directly usable in a production environment. Remember to validate your inputs and to keep a tight watch on security but this is a method that will get you wkhtmltopdf up and running. You will likely need to play with the stderr and stdout streams a little to get it to work like you wish but after you do a few experiments it really isn't that complicated. Working with streams in PHP is a different issue.

Tuesday 11 December 2012

0x1F, 0x1F, wherefore art thou 0x1F?

Playing with XML in .net you might run into some pesky characters that throw errors during conversions. One offender is 0x1F which can be hard to remove. Replacing the character can lead to surprising issues, check the variables below and their values:

// string x contains the bad char.
var a = x.IndexOf('\u001f');                     // 513
var b = x.IndexOf(Convert.ToString((byte)0x1F)); // -1
var c = x.Contains(Convert.ToChar((byte)0x1F));  // true
var d = x.Contains('\u001f');                    // true
var e = x.Contains(Convert.ToString((byte)0x1F));// false
var f = x.Contains(Convert.ToChar((byte)0x1F));  // true


If you simply want to remove the character:

x = x.Replace(Convert.ToChar((byte)0x1F), ' '); // Works
x = x.Replace(Convert.ToString((byte)0x1F), "");// Fails

Related
  • http://stackoverflow.com/questions/9949921/
  • http://stackoverflow.com/questions/6728329/