Wednesday, March 10, 2010

Fixing PERL script saved in UTF-8 Format using Notepad

If your PERL script needs to display non-English texts, say Japanese or Chinese, usually you will save your script in UTF-8 format. Problem arises if you are using Notepad to save the UTF-8 format as it adds the BOM (Byte Order Mark) header to the file.

The problem is that PERL won’t run the UTF-8 file with BOM header once it is uploaded to the server. However, it will run fine if you are running with later version of Apache locally using your Windows based PC.
To solve this problem (if you are not running locally with latest Apache), the following is a PERL script to rid of the first three bytes (BOM header) of the UTF-8 file saved using notepad:

#!c:\perl\bin\perl.exe #or use path to your perl executable
$input = “”;
$output = “”;
print "Content-type: text/html\n\n";
binmode(STDOUT, ":utf8");
open(IN, "$input");
@ALL = <IN>;
$all = join("",@ALL);
$BOM_removed = substr $all, 3;
open(OUT, ">$output");
print OUT $BOM_removed;
print “Conversion done! File is output to $output.”;

You can assign $input and $output variables to the file names you desire. These PERL codes can be saved in normal ANSI format.

After the conversion, if the output file doesn’t contain any non-English characters, Notepad will assume it is a ANSI file. Don’t worry. Try to put some non-English characters in your $input file and run this conversion, the $output file will show up as UTF-8 file using Notepad (when you do a Save As…, you will see the format).

Upload the $output file to your server using ASCII transfer mode (not Binary mode), your $output file will run without any Internal Server Error message after chmod (change file permission) $output file to 755.

Please also make sure your $input/$output PERL script prints "Content-type: text/html; charset=utf-8\n\n" HTML header for proper UTF-8 content output.

Note: Do not save the $output file again with notepad, it will lose its "charm". :)

No comments:

Post a Comment