phparchitect - 01.(38).2006

Transcription

phparchitect - 01.(38).2006
PDFLib’s
BLOCK
TOOL
hp
p
<?
Templating PDFs for maximum reusability
FPDI in Detail
Importing existing documents with Free PDF Import
2005
Look
Back
Reflecting on last year’s events in the PHP world
with PHP guru Derick Rethans
i18n
Internationalize your web application
with less PHP code
VOLUME 5 ISSUE 1
This copy is registered to:
Rodney Burruss
tomalinux@yahoo.com
Secure your applications against Email Injection
Tips on Output Buffering
KOMODO - reviewed
and much more...
NEXCESS.NET Internet Solutions
304 1/2 S. State St.
Ann Arbor, MI 48104-2445
http://nexcess.net
PHP / MySQL
SPECIALISTS!
Simple, Affordable, Reliable PHP / MySQL Web Hosting Solutions
P O P U L A R S H A R E D H O S T I N G PAC K A G E S
MINI-ME
$
6 95
SMALL BIZ $ 2195/mo
/mo
500 MB Storage
15 GB Transfer
50 E-Mail Accounts
25 Subdomains
25 MySQL Databases
PHP5 / MySQL 4.1.X
SITEWORX control panel
2000 MB Storage
50 GB Transfer
200 E-Mail Accounts
75 Subdomains
75 MySQL Databases
PHP5 / MySQL 4.1.X
SITEWORX control panel
POPULAR RES ELLER HO ST I NG PAC KA G ES
NEXRESELL 1 $16 95/mo
900 MB Storage
30 GB Transfer
Unlimited MySQL Databases
Host 30 Domains
PHP5 / MYSQL 4.1.X
NODEWORX Reseller Access
NEXRESELL 2 $ 59 95/mo
7500 MB Storage
100 GB Transfer
Unlimited MySQL Databases
Host Unlimited Domains
PHP5 / MySQL 4.1.X
NODEWORX Reseller Access
: CONTROL
PA N E L
All of our servers run our in-house developed PHP/MySQL
server control panel: INTERWORX-CP
INTERWORX-CP features include:
- Rigorous spam / virus filtering
- Detailed website usage stats (including realtime metrics)
- Superb file management; WYSIWYG HTML editor
INTERWORX-CP is also available for your dedicated server. Just visit
http://interworx.info for more information and to place your order.
WHY NEXCESS.NET? WE ARE PHP/MYSQL DEVELOPERS
LIKE YOU AND UNDERSTAND YOUR SUPPORT NEEDS!
NEW! PHP 5 & MYSQL 4.1.X
php 5
4.1.x
We'll install any PHP extension you
need! Just ask :)
PHP4 & MySQL 3.x/4.0.x options also available
php 4
3.x/4.0.x
128 BIT SSL CERTIFICATES
AS LOW AS $39.95 / YEAR
DOMAIN NAME REGISTRATION
FROM $10.00 / YEAR
GENEROUS AFFILIATE PROGRAM
UP TO 100% PAYBACK
PER REFERRAL
30 DAY
MONEY BACK GUARANTEE
FREE DOMAIN NAME
WITH ANY ANNUAL SIGNUP
ORDER TODAY AND GET 10% OFF ANY WEB HOSTING PACKAGE
VISIT HTTP://NEXCESS.NET/PHPARCH FOR DETAILS
De dicat ed & M an ag ed D edic at e d s e rv e r so lu t io ns a ls o av a ila ble
Serving the web since Y2K
TM
CONTENTS
Columns
6 EDITORIAL
8 php|news
Features
10
48 TEST PATTERN
Why is it Taking so Long?
Lead times and the rationale behind them
2005 Look Back
Reflecting on last year’s
events in the PHP world
by DERICK RETHANS
18
by MARKUS BAKER
53 SECURITY CORNER
Email Injection
by CHRIS SHIFLETT
PHPLib’s Block Tool
Templating PDF’s for Maximum Reusability
by RON GOFF
56 TIPS & TRICKS
Output Buffering
by BEN RAMSEY
26
FPDI in Detail
Importing existing documents
with Free PDF Import
60 PRODUCT REVIEW
Komodo
The Web Development IDE for All Platforms?
by JAN SLABON
by PETER MacINTYRE
38
i18n
Internationalize Your Web applications
with less PHP code
by CARL McDADE
64 exit(0);
2006: A Look Forward
by MARCO TABINI
Download this month’s code at: http://www.phparch.com/code/
WRITE FOR US!
If you want to bring a php-related topic to the attention of the professional php community, whether it
is personal research, company software, or anything else, why not write an article for php|architect?
If you would like to contribute, contact us and one of our editors will be happy to help you hone
your idea and turn it into a beautiful article for our magazine. Visit www.phparch.com/writeforus.php
or contact our editorial team at write@phparch.com and get started!
EDITORIAL
PLATFORM
DIVERSITY
I
n the past five (or so) years, especially, the desktop landscape has changed,
severely. Desktops have traditionally been dominated by Windows, but
alternatives are making their way into both the office and home.
Apple’s hit operating systems in the OS X series, and other chic products
(like the iPod) have not only fueled the sales of Macintosh computers, but
have opened consumers’ minds to the reality that there are alternatives to Windows.
The market is still strongly clutched by Microsoft, but more and more users are
making the “switch” to Mac (and to a much lesser extent, alternatives like Linux).
This diversity, while good, can cause portability problems, and as I’ve touched
on in past issues, developers can no longer target a single browser, but must become
more and more aware of standards and cross-browser/cross-platform compatibility
issues.
For the most part, developers seem to have the browser issue under control. I
personally never use Internet Explorer for anything but testing (I’m a Firefox fanboy),
and it’s very rare that I still run into sites that simply won’t work with FF. Even in
cases where it seems I’m out of luck, I can often spoof the User-Agent header, and
get a working site. Since Firefox is available on many platforms, it seems that the
HTML issue is (mostly) behind us—I say “mostly” because standards-compliance
and portability are things that we always need to strive for.
If you’ve tried to distribute a printable, offline-viewable, and well laid out
document, in the past, you know that HTML doesn’t cut it. There’s little provision
for the features that are necessary to build a professional document (there is hope
with CSS, though). This often leaves websites delivering “richer” documents, such
as MS Word documents or RTF files.
The distribution of proprietary format documents leads to its own set of
problems, primarily: document creation and portability. Have you tried to build a
Word document from your non-Windows Web server? It’s not fun. Equally tedious
is trying to get that document to render properly in different versions of Word, on
different platforms—worse is the rendering in non-Microsoft applications, such as
OpenOffice. Enter PDF.
Now, PDF is certainly not new technology. It does, however, seem to be becoming
more and more the de facto standard for document distribution. PDF is no stranger
to php|architect readers: if you’re not reading this on paper, you’re reading a PDF,
and we’ve brought you much PDF-centric content in the past, but we’ve certainly not
drained the PDF knowledge pool.
This month, we’re happy to focus on PDF, once again, but this time with a twist:
using PHP to modify existing PDFs, through various means.
It’s also our pleasure to be running Derick Rethans’ PHP Lookback, 2005. Marco
will touch more on this in exit(0).
On that note, we at php|architect wish you and your business a happy and
successful 2006. Here’s to another great year of PHP!
Volume 5 - Issue 1
Publisher
Marco Tabini
Editor-in-Chief
Sean Coates
Editorial Team
Arbi Arzoumani
Peter MacIntyre
Eddie Peloke
Graphics & Layout
Aleksandar Ilievski
Managing Editor
Emanuela Corso
News Editor
Leslie Hill
news@phparch.com
Authors
Marcus Baker, Ron Goff,
Peter B. MacIntyre, Carl McDade,
Ben Ramsey, Derick Rethans,
Chris Shiflett, Jan Slabon
php|architect (ISSN 1709-7169) is published
twelve times a year by Marco Tabini & Associates,
Inc., P.O. Box 54526, 1771 Avenue Road, Toronto,
ON M5M 4N5, Canada.
Although all possible care has been placed in
assuring the accuracy of the contents of this
magazine, including all associated source code,
listings and figures, the publisher assumes
no responsibilities with regards of use of the
information contained herein or in all associated
material.
php|architect, php|a, the php|architect logo, Marco
Tabini & Associates, Inc. and the Mta Logo are
trademarks of Marco Tabini & Associates, Inc.
Contact Information:
info@phparch.com
General mailbox:
Editorial:
editors@phparch.com
Sales & advertising:
sales@phparch.com
Printed in Canada
Copyright © 2003-2006
Marco Tabini & Associates, Inc.
All Rights Reserved
6 • php|architect • Volume 5 Issue 1
news
eZ components
ez components 1.0 beta2
PHP 5.1.2 RC1
Ilia Alshanetsky announces the release of php
5.1.2 RC1.
“I’ve just packaged PHP 5.1.2RC1, the first
release candidate for the next 5.1 version. A
small holiday present for all PHP users, from
the PHP developers. This is primarily a bug
fixing release with its major points being:
• Many fixes to the strtotime() function,
over 10 bugs have been resolved.
• A fair number of fixes to PDO and its
drivers
• New OCI8 that fixes large number of
bugs backported from head.
• A final fix for Apache 2 crash when
SSI includes are being used.
• A number of crash fixes in extensions
and core components.
• XMLwriter & Hash extensions were
added and enabled by default.”
Get all the info at http://ilia.ws/archives/
97-PHP-5.1.2RC1-Released!.html
FUDforum 2.7.4RC1
Released
The FUDforum team has announced the latest
release of their open source forum package,
version 2.7.4 RC1. Some of the new features
include:
• Added subscribed forum filter to
message navigator
• Added handling for in-lined
attachments in mailing list
import
• Added the ability to supply
custom signature to message
synchronized from the forum
back to mailing list or a news
group
• Added support for allowing the
user to select how many threads
they want to see per page
• Much more…
Visit FUDforum.org for all the latest info.
8 • php|architect • Volume 5 Issue 1
ez.no is proud to announce the release
of ez components. ez.no announces: ”Ez
components is an enterprise ready, general
purpose PHP platform. As a collection of high
quality independent building blocks for PHP
application development, ez components
will both speed up development and reduce
risks. An application can use one or more
components effortlessly, as they all adhere to
the same naming conventions and follow the
same structure. All components are based on
PHP 5.1, except for the ones that require the
new Unicode support that will be available
from PHP 6 on.”
Need to speed up your development?
Check out ez.no for more info.
xajax 0.2
xajaxproject.org announces the release of
version 0.2. What is it? The site describes it
as:” an open source PHP class library that
allows you to easily create powerful, webbased, Ajax applications using HTML, CSS,
JavaScript, and PHP. Applications developed
with xajax can asynchronously call server-side
PHP functions and update content without
reloading the page.”
To start working with xajax, visit
xajaxproject.org.
SQLiteManager 1.2.0RC2
If SQLite is the db of choice for your PHP
application, you may be interested in the latest
release of SQLiteManager. SQLiteManager.
org lists the features as:
• Management of several
databases (creation, access or
upload)
• Management of the attached
databases
• Create, edit and delete tables
and indexes
• Insert, edit, delete records in
these tables
• Management of views; create
views from SELECTs
• Management of triggers
• Management of user defined
functions
• Manual request and from file,
it is possible to define the
format of the requests, sqlite or
MySQL; a conversion is done in
order to directly import a MySQL
database in SQLite
• Importing of records from a
formatted text file
• Export of structure and the data
• Choice of several display skins
Check out SQLiteManager.org to start managing
your SQLite DB, today.
php|architect Releases New
PDFlib Book
We are proud to announce the release of our latest book
in the “Nanobooks” series called Beginning PDF Programming
with PHP and PDFlib.
Authored by Ron Goff, this book provides a thorough
introduction to the great capabilities provided by the PDFlib
library for the creation and manipulation of PDF files.
The book features a foreword by Thomas Merz, the
original author of PDFlib and founder of PDFlib GmbH, and
tackles topic like PDF file creation, fonts, text, shapes and
much more, including PDFlib’s Block Tool, which allows for
the manipulation of existing PDF documents.
For more information, http://www.phparch.com/pppp
MDB2_Drivers
Check out the hottest new releases from PEAR.
Image_Color2 0.1.4
PHP 5 color conversion and basic mixing.
Currently supported color models:
• CMYK - Used in printing
• Grayscale - Perceptively weighted
grayscale
• Hex - Hex RGB colors i.e. #abcdef
• HSL - Used in CSS3 to define colors
• HSV - Used by Photoshop and other
graphics packages
• Named - RGB value for named colors
like black, khaki, etc.
• WebsafeHex - Just like Hex but rounds
to websafe colors
Config 1.10.5
The Config package provides methods for
configuration manipulation.
• Creates configurations from scratch
• Parses and outputs different formats
(XML, PHP, INI, Apache...)
• Edits existing configurations
• Converts configurations to other
formats
• Allows manipulation of sections,
comments, directives...
• Parses configurations into a tree
structure
• Provides XPath like access to
directives
MDB2 drivers where released for:
• SQLite
• postgreSQL
• mysqli
• mysql
• Oracle
MDB2 2.0.0RC3
PEAR MDB2 is a merge of PEAR DB and
Metabase php database abstraction layers.
Note that the API will be adapted to better fit
with the new PHP 5-only PDO before the first
stable release.
It provides a common API for all supported
RDBMS. The main difference to most other
DB abstraction packages is that MDB2 goes
much further to ensure portability. Among
other things MDB2 features:
• An OO-style query API
• A DSN (data source name) or array
format for specifying database
servers
• Datatype abstraction and on demand
datatype conversion
• Various optional fetch modes to fix
portability issues
• Portable error codes
• Sequential and non sequential row
fetching as well as bulk fetching
• Ability to make buffered and
unbuffered queries
• Ordered array and associative array
for the fetched rows
• Prepare/execute (bind) emulation
• Sequence emulation
• Replace emulation
• Limited sub select emulation
Fileinfo 1.0.3
GDChart 0.2.0
The GDChart extension provides an interface
to the bundled gdchart library. This library
uses the (bundled) GD library to generate 20
different types of graphs, based on supplied
parameters.
The extension provides an OO interface
to gdchart exposing majority of options via
properties and complex (array) options via a
series of methods.
To use the current version of the extension
PHP 5.0.0 is required, and older PHP 4 only
version can be downloaded from CVS, by
checking out the extension with PECL_4_3
tag.
yaz 1.0.6
This extension implements a Z39.50 client for
PHP using the YAZ toolkit.
This extension allows retrieval of information
regarding vast majority of files. This
information may include dimensions, quality,
length etc...
Additionally, it can also be used to retrieve
the mime type for a particular file and for text
files, the proper language encoding.
pecl_http 0.21.0
It eases handling of HTTP URLs, dates,
redirects, headers and messages, provides
means for negotiation of clients preferred
language and charset, as well as a convenient
way to send any arbitrary data with caching
and resuming capabilities.
It provides powerful request functionality,
if built with CURL support. Parallel requests
are available for PHP-5 and greater.
PHP-5 classes: HttpUtil, HttpMessage,
HttpRequest,
HttpRequestPool,
HttpDeflateStream, HttpInflateStream
PHP-5.1 classes: HttpResponse
•
•
•
•
•
•
•
•
•
•
•
•
Row limit support
Transactions support
Large Object support
Index/Unique
Key/Primary
Key
support
Autoincrement emulation
Module framework to load advanced
functionality on demand
Ability to read the information
schema
RDBMS management methods
(creating, dropping, altering)
Reverse engineering schemas from
an existing DB
SQL function call abstraction
Full integration into the PEAR
Framework
PHPDoc API documentation
MDB2_Schema 0.4.1
PEAR::MDB2_Schema enables users to
maintain RDBMS independent schema files
in XML that can be used to create, alter and
drop database entities and insert data into
a database. Reverse engineering database
schemas from existing databases is also
supported. The format is compatible with
both PEAR::MDB and Metabase.
Validate_ptBR 0.5.2
Package contains locale validation for
ptBR such as:
• Postal Code
• CNPJ
• CPF
• Region (brazilian states)
• Phone Number
• Vehicle plates
Xdebug 2.0.0beta5
The Xdebug extension helps you debugging
your script by providing a lot of valuable
debug information. The debug information
that Xdebug can provide includes the
following:
• stack and function traces in error
messages with:
• full parameter display for user defined
functions
• function name, file name and line
indications
• support for member functions
• memory allocation
• protection for infinite recursions
Xdebug also provides:
• profiling information for PHP scripts
• script execution analysis
• capabilities to debug your scripts
interactively with a debug client
Volume 5 Issue 1 • php|architect •9
F EATU RE
2005
PHP
A new year is upon us, and
as is customary in the PHP
world, it is time to reflect
LOOK BACK
on the events of the past
year. Derick Rethans, a PHP
internals developer, has been
publishing a PHP Look Back
for a few years, now, and
this year, we saw it fitting to
publish it, here. Happy 2006!
by D E RIC K RE THAN S
W
elcome to the fourth installment of the PHP
Look Back. Just as in previous years, we’ll
look back on PHP development discussions,
bloopers and accomplishments of the last
year. This is not supposed to be a fully
objective review of last year—note that the opinions in
this article are that of the author, and not of the PHP
development team (nor of php|architect).
January
January was a quiet month, with not much going on.
After about 8 months [001], we finally added [002] a PIC/nonPIC detection mechanism to the configure script, that
will select non-PIC object generation for supported
platforms (Linux and FreeBSD). Non-PIC code is about
30% faster, as measured in earlier benchmarks.
10 • php|architect • Volume 5 Issue 1
TO DISCUSS THIS ARTICLE VISIT:
http://forum.phparch.com/281
A week later, Leonardo [003] was wondering whether we
planned on adding type hints for scalar types to PHP. As
PHP is a weakly-typed language, this is not something
we wanted to add, although we did add support for an
“array” type hint, later in the year. With PHP 5.1’s new
GOTO execution method (added last August), variable
name lookups are cached internally. This caused some
problems for Xdebug [004], as it needs some information to
find out which variables are used in a specific scope. Andi
committed [005] a patch that made Xdebug work properly,
again.
Michael started working on his HTTP extension (which
2005 Look Back
generates way too many commit mails ;-) and encountered
a problem with a naming clash [006] between PEAR’s HTTP
class and his PECL extension. Greg responded [007], and said
that this problem will be solved when PEAR 1.4 comes
out, with its channel support.
February
Andi started discussions in February by pointing out a
date for the first beta of PHP 5.1: March 1st. He declared
that “both PDO and Date should be included in the default
distribution”[008] and others suggested that XML Reader[009]
should be included by default, as well. In reply to Andi,
Rasmus mentioned [010] that he would like to see the
issue that—later in the year—warranted a new PHP
release, and Greg introduced [027] PEAR 1.4, with channel
support.
Halfway through the month, Marcus [028] mentioned a
few things that should go into PHP 5.1; most notably the
__toString() fix, which unfortunately, did not actually
make it into the release. Type hinting with “= NULL” did,
make it in [029], though.
Martin Sarsale reported [030] an issue with references
and segfaults, something which had been annoying us
at eZ systems [031] for quite some time, too. This issue got
fixed in PHP 4.4, albeit not without a little bickering
(more about that later).
Luckily, Debian’s PHP packages got rid of
some of the insanity that was present
in previous releases.
filter extension included, as well. The discussion about
this extension quickly transitioned to data mangling
of input request variables, and how they could not be
influenced by the script authors, but only by the system
administrator. In the end, this discussion made place
for the topic of Operator overloading [011], where certain
people kept reiterating that operator overloading is a
“good thing. [012]”
Andrei tried to stop this discussion by being funny [013],
but it didn’t work very well [014]. Around the same time, Wez
announced [015] the first beta of PDO—PHP Data Objects.
Wez wanted people to test [016] PDO, and of course, over the
next couple of months, there were various PDO-related
concerns [017] and issues raised.
Another discussion in February was about auto
boxing [018] in PHP. Auto boxing is the encapsulation of
all primitive types as objects. Naturally, people asked
why [019] we would want to have this, and no sound
reason was given. In the end, this discussion suggested
that phpDocumentor[020] should handle type determining,
instead. Having a doc block [021] parsing extension to the
reflection API would be nice, although a bit hard.
We also had an often-recurring discussion [022] on why
the GPL[023] is a bad idea for PECL[024] extensions.
John added the first version [025] of XMLRPCi to CVS;
why he chose this silly name is still unknown. Jani
wrote about a problem with overwriting globals [026], an
March
In March, Ilia proposed [032] a patch that adds a special
token that tells PHP’s parser to abort parsing when
the token is encountered. This allows us to attach
binary data to the end of a PHP script, which is highly
useful for one-script installers, such as the one that
FUDForum [033] uses.
On the 14th of the month, Zeev released the first
RCs [034] of both 5.0.4 and 4.3.11. We also encountered
further reference issues [035].
The same guy that mailed tons of “fixes” to the
internals list, last June [036], was back with more [037]
patches. Andrei, once again, pointed out [038] that it is
a good idea to check with an extension’s maintainer
before applying patches, and Greg published [039] the
package2.xml documentation.
Lukas, once more, pointed out [040] the weird naming
scheme that new extensions seem to be getting, and
luckily Debian’s PHP packages got rid [041] of some of the
insanity that was present in previous [042] releases by
not always building in ZTS mode. Unfortunately, their
packages still force PIC mode for the libraries.
A user brought up the idea of an upload meter
patch [043], again, and although we all seemed to
remember[044] that the original patch was rejected [044],
no one could find the original thread [046] where this was
discussed. Last year’s Look Back discussed this too, and
Volume 5 Issue 1 • php|architect • 11
2005 Look Back
there, the reason was mentioned [047].
In the last week of the month, we had some fuss [048]
about “FreeBSD doing stupid things [049]” regarding their
naming of auto tools executables [050].
April
April started with a suggestion [051] by Zeev to change
the way that __autoload() works, by allowing multiple
instances of this magic function. In the end we, didn’t
end up implementing this, and as Lukas described [052],
“Frameworks should provide __autoload() helper
methods, but should never implement the function itself.
It’s up to the end user to do this.” (This is exactly how
we implemented it for the eZ components [053]).
Andi wanted to release PHP 5.1 Beta 1[054] really soon,
but, as Jani mentioned [055], there were quite a few things
that were still not fully ready, and thus the suggestion to
call it “Alpha”[056] was made, instead. During this thread,
some pet-features [058] were brought up [059].
Kamesh, from the Netware porting team, found
another reference issue [060]. Marcus added the File [061] class
to his SPL extension, causing a small stir—the new class
clashed with any application that already defines its
own File class. Although this is a valid point, projects
defining a “File” class should know better, and would be
wise to prefix their class names. This same issue will pop
up later in the year.
A last, somewhat larger, discussion erupted when
a question [062] about whether APC could be used as a
content cache was posted to the list. Rasmus found it an
interesting idea [063], although this functionality can also
be accomplished in user space. In the last point of the
thread, Rasmus mentioned [064] that APC will soon support
PHP 5.
May
May had a slow start, and things only got interesting
at the end of the month. The first discussion that came
up was Ilia’s removal of dangling commas from enums,
something that “was in c language from the first day [065].”
Apparently, GCC 4 is “becoming worse and worse [066],” but
luckily, we can still just ignore the warnings [067].
After a small private discussion with Dmitry about
Marcus’ and my reference fix patch [068], he came to the
conclusion that this patch breaks binary compatibility
and that this problem warrants a PHP 4.4 release. As this
reference problem has been affecting many users, and
definitely eZ over the past months, I wrote an email [069]
to the list stating that it is “totally irresponsible” not to
release a fix for such a grave bug. Zeev[070] also said that
“we should probably not fix this at all in the 4.x tree”
because of the hassles that accompany “breaking module
12 • php|architect • Volume 5 Issue 1
binary compatibility.” He also seemed to think that the
bug can easily be worked around.
Other users were a bit happier[071] that we finally nailed
this bug, and Jani replied to Zeev that the magnitude [072]
of this bug is pretty high. Rasmus added that he “will
be deploying the patch and happily breaking binary
compatibility [073]” as soon as the patch is ready. Breaking
binary compatibility is only a “burden on the maintainers
of these packages” (of the various distributions). Wez
thought that “the only logical move forward is a 4.4
branch and release [074].” In the end, the Zeev almighty was
“tired of going through the reasons again and again [075]”
and noted that “everyone appears to prefer the upsides
to the downsides.” This resulted in the creation of the
PHP_4_4 branch [076] in the first week of June.
June
Wez added a new patch to our CVS server that allows
us to block access [077] to specific branches—with this,
we closed the PHP_4_3 branch for good. A week later,
I announced 4.4.0RC1[078], which features the reference
bug fix.
Andi wrote another PHP 5.1 mail [079], which spawned
a nice long discussion on adding goto [080] to PHP, and
comparing goto to exceptions. Magnus smartly added [081]
that “people are talking about hypothetical messy code
because of goto” and that they forget that you don’t
have to use a language construct simply because it is
available.
The same thread also went into a branch that
discussed [082] the ifsetor() language construct. After
Andi returned, he decided not to do anything with
goto or ifsetor()[083], and that it was now the time to
branch, so that we can merge the Unicode support that
was developed in parallel by mostly Andrei and Dmitry,
although Rasmus was “pretty sure the current discussions
will pale in comparison to the chaos that will be created
when the Unicode stuff goes into HEAD![084]”
Johannes wondered when the new date stuff[085] was
going in; it was added a week later, just before PHP 5.1
beta 2. Lukas suggested that we add [086] the public keyword
to PHP 4.4 for forward compatibility. Rasmus again
wondered about “the reasoning ... for not having var be
a synonym for public in PHP 5 [087].”. Andi mentioned [088]
that this “was meant to help people find vars so that
they can be explicit about the access modifiers” when
moving to PHP 5.
A few days later, Andi read a blog posting [089] which
described how PHP 4.4 is breaking backwards compatibility
by issuing an E_STRICT in cases where developers abuse
return-by-reference. This, however, was not actually the
case [090].
2005 Look Back
Yasuo started a long thread [091] on allow_url_fopen()
and claimed it was dangerous [092]. The main result of
this thread seemed to be that we wanted to split the
setting into two different privileges: one that allows
remote opening of URLs and one to allow include() on
remote URLs. However, this is something we could not
yet change.
The last thread of the month was by Andi, writing
about the PHP 5.1 release process [093]
July
In July, Jessie suggested [094] a String extension that
declares only one class: String. This class is meant to
prevent copying of the string’s data for most operations
(which is currently done with PHP’s string functions).
Most of the other developers where against it, for
where some people didn’t see [108] why we had to implement
this fix. Unfortunately, there were some quirks [109] that we
still had to sort out.
In this same month, Rasmus released APC 3.0.0 [110]
which came with PHP 5.1 support and numerous fixes.
August
August started with a discussion on instanceof[111] being
“broken,” as it raises a fatal error in the case where
the class that is being checked for doesn’t exist. Andi
declared “if you’re referencing classes/exceptions in
your code that don’t exist, then something is very bogus
with your code [112]” and “the only problem is if the class
does not exist in your code base, in which case, your
application should blow up![113]”
I raised a question about whether the new PHP with
If you’re referencing classes/exceptions
in your code that don’t exist,
then something is very bogus with your code.
different reasons: “String is such a generic name for a
non-core class [095]” and “the savings gained by this will be
more than offset by OO overhead [096],” so we will not let
“this get anywhere near the core [097].”
In the same week, I made more changes to the date
extension [098] that allows users to more easily select the
timezone that they want, instead of having to rely on
the TZ environment variable. This is also needed because
the TZ environment variable [099] can most likely not
be used in a thread safe way, and it is certainly not
portable [100]. Also in the same week, I proposed an API
for new Date and Timezone functionality [101]. After some
pressure [102], I added [103] an OO API, too. Near the end of
the month, I committed the implementation of the new
date functionality [104]. It was, however, #ifdef-ed out to
facilitate discussions at a later date.
Jessie came up with Yet Another Namespace
Proposal [105], and tried to come up with a solution for all
the previous problems we had with the implementation.
He also made several patches [106] that added namespaces
to PHP.
We had some more fuss [107] about PHP 4.4 breaking BC,
Unicode should be called PHP 5.5 or PHP 6.0 [114]. Andi
(amd the majority) wanted to go “with PHP 6 and aim to
release it before Perl 6 [115].”
After PHP_5_1 was branched, Andrei merged the
Unicode branch and gave us some instructions on how
to get started with it [116]. He also introduced the general
ideas behind the implementation [117].
PHP 5.1 RC1 was finally rolled, about half way through
the month, followed by PHP 5.0.5 RC2[118], a week later.
During the development of the eZ components [119],
we discovered various things in PHP’s OO model that we
wanted to see changed. One of those issues was described
in the Property Overloading RFC [120]. Unfortunately, not
everybody could be convinced [121], and no changes were
made. I will try again though :).
The other issue that we raised was that failed
typehints throw a fatal error[122], while that is not strictly
necessary. Instead of throwing exceptions [123] in this case,
the discussion turned towards adding a new error mode [124]
(E_RECOVERABLE[125]) that will be used for non-enginecorrupting fatal errors at the language level—this is
exactly the case with failed typehints.
Volume 5 Issue 1 • php|architect • 13
2005 Look Back
The longest thread of the month, was started by
Rasmus when he posted his PHP 6 [126] wish list, which
featured controversial changes such as “removing
magic_quotes” and “making identifiers case-sensitive,”
attempt detection in favour of the new date.timezone
setting [147]. After some discussion, we came up with a
solution [147], which was then implemented. It should
guess the timezone correctly in most cases, even on
The filter extension, which I’ve been
developing for quite some time,
did not make it into PHP 5.1...
to which most developers quickly agreed [127]. Following
his initial wish list, the crowd went wild and started
suggesting all kinds of weird changes, such as “Radically
change all of the operator syntaxes [128],” adding <?php6 [129]
as a BC breaking mode, and “Named parameters [130].”
Marcus made a list of his own [131] which would later
become the first draft of the meeting agenda for a PHP
Developers Meeting.
September
In September, Antony committed [132] an upgraded OCI8
extension which fixes a lot of bugs [133]. We also decided
to play a bit nicer with version_compare(), regarding
naming [134] release candidates.
Zeev wanted to roll [135] PHP 5.0.5 but there was an
issue [136] with the shutdown order. The reference issues
returned, too. The first one [137] turned out to be an
incorrect merge to the PHP 5.0 branch, where suddenly
some of the notices turned into errors [138]. The second
one [139] is simply a small change in behaviour, which
previously created memory corruption. Rasmus explained
the issue a bit more [140], once again.
Ilia tried to implement a clever fix [141] which turned
out to be a problem later on. Pierre started a discussion
on supporting Unicode in identifiers, something he didn’t
want to see. PHP already supports using UTF-8 encoded
characters [142] in identifiers, so removing this feature
will break BC unnecessarily. Besides breaking BC, many
people simply want to use their own language for writing
code, as Tex [143] writes.
Zeev made another attempt at PHP 5.1.0 RC2[144] with
the latest PEAR being the only thing missing. Marcus
brought up the issue of __toString() again, and finally
managed to get it into CVS, but unfortunately not in time
for PHP 5.1.
Stanislav[146] noticed some problems with detecting
time zones, as the new date/time code did not try to
14 • php|architect • Volume 5 Issue 1
Windows. I also added support for an external timezone
database [149].
October
In October, I noticed some weird notices [150] with
“make install-pear,” without a clue as to why they were
showing up. This discussion turned into a “why does
PEAR not support PHP 5.1” thread [151]. In the end, Greg
managed to nail down the weird notices, though.
I also noticed a commit by Dmitry [152] that ignores “&”
when $this is passed. I pointed out that this should
not be supported (in PHP 5), as it doesn’t make really
sense that people won’t see a warning/notice/error when
they’re doing something silly. Dmitry explained [153] that
disallowing it would break code, but he also writes that
by “using ‘=& $this’, a user can break the $this value”—
which is something we definitely should prevent. He
suggested [154] we make this an E_STRICT warning, and Andi
suggested [155] we escalate this to an E_ERROR in PHP 6, but
neither of those things happened.
A week later, Piotr[156] asked for a tarball of our CVS to
make it “possible to convert it to Subversion repository
... so browsing the repositories would be much easier.”
We wondered [157] why he needed that, as we offer our own
browser[158], already.
Matthias [159] said that we “do not want to set off yet
another discussion about the changes 4.4 brought,” but
that is exactly what he did. Again, there was something
wrong with his code, and thus the warning is legal.
After resolving the timezone issues, last month,
we were surprised by a message from Zeev. He simply
missed [161] the conclusion in the “lengthly thread.”
As a result of the negative comments on the PHP 4.4.0
release, Lukas, Ilia and I set up a routine [162] for involving
some of the more known projects to the PHP 4 [163] and
PHP 5 [164] release processes. As part of this effort, we send
out [165] a mail to all participating projects whenever we
2005 Look Back
have a release candidate to test.
I raised [166] some concern regarding our current
Unicode implementation because of maintenance issues.
In part of my mail, I also indicated that I wanted “to
clean up PHP 6 for real, [167]” after private discussions
with Marcus and Ilia. Behind the scenes, we prepared
some material to organize a PHP Developers Meeting to
discuss the Unicode implementation and the extended
“PHP 6 Wishlist.” I also committed [168] a patch that allows
typehints for classes to work with = NULL[169].
Another guy raised the issue of “that new isset()-like
language construct, [170]” but this ended up going nowhere,
as people were suggesting very Perl-like [171] operators.
Jani replied to this thread with “How about a good ol’
beating with a large trout?[172]”
On the last day of the month, we released PHP 4.4.1[173]
which addresses some of the reference issues we’ve seen
in PHP 4.4.0.
November
In November, we prepared to finally release PHP 5.1,
and one of the efforts was to make an upgrade guide [174]
for people switching to PHP 5.1. Sean noticed [175] a
problem with the parameter parsing API’s automatic
type conversion. Like Andrei [176], many people think that
“passing ‘123abc’ and having it interpreted as 123” is still
wrong.
Dmitry implemented [177] support for “= null” as default
to array type hinting, something that I did not do [178] on
purpose because “= array()” is the logically correct way
of doing this. Andi agreed [179] with me on this.
Ilia implemented, in PHP 5.1RC5 [180], one of the items
that was on the outcome list of the PHP Developers
Meeting: adding a notice that warns people that curly
braces [181] for addressing a character in a string is now
deprecated in favour of the [] operator—contrary to the
current explanation in our manual. {} and [] are exactly
the same thing [182] and “having two constructs for the same
behaviour is silly and leads to confusing, hard to read
code.” The outcome of this discussion was the removal
of the notice in PHP 5.1 and the likely conclusion is that
it is not going to get removed.
Another change that as made PHP 5.1RC6 was the
creation of the “Date” class, which caused quite a stir
after the release of PHP 5.1[183]. The reason to introduce it
in 5.1 was simply to make sure that no applications were
going to break if we introduced the Date class later in the
5.1.x series. Unfortunately a lot of projects, including
PEAR, never heard of “prefixing” class names, causing
class name clashes. Marcus described the problem as
“PEAR ignores coding standards, [184]” but others suggested
that we renamed the internal class [185] to something silly
like php_date. Andrei [186] asked “what does renaming really
buy us? The only purpose of introducing this class in RC6,
as far as I can tell, was to reserve the ‘Date’ name for
future use.” Now that we know about this issue, it’s time
for PEAR to start prefixing its classes, so that we finally
can do the right thing and add our Date (and Timezone)
classes, code that has been around for months, now,
and I’m quite tired of waiting for it to be in a release
where I can use it. We ended up reverting the change
that claimed the Date and Timezone classes, and released
5.1.1 with this change.
After the PDM I posted [187] the meeting notes [188] to the
list. Most of the outcome was well appreciated, except
the curly braces idea which has already been discussed.
With these notes, we hope to make PHP 6 a success. The
notes also spawned numerous [189] polls [190] on the symbol to
use for separating namespaces from class names/function
names. We also discussed our version of a goto: labeled [191]
breaks [192].
The filter extension [193], which I’ve been developing for
quite some time, did not make it into PHP 5.1, although
it is a good idea [194] to add it, now, with an “experimental”
status, so that this wanted extension gets more testing.
Perhaps for PHP 5.1.2…
December
December was a quiet month with little action. Ilia
proposed [195] a plan for PHP 5.1.2 and released PHP
5.1.2RC1[196], Zeev committed [197] Dmitry’s re-implementation
of the FastCGI API and some user[198] was whining about
our “official” IRC channel (which doesn’t exist).
That was it for 2005 (as far as PHP internal
development is concerned)! I hope you enjoyed reading
this, and have a happy new year. Extra thanks go to Ilia,
for being the release master, Dmitry for maintaining the
engine, Jani for hunting down bug reports, Andrei for
his work on Unicode, Mike for his enormous stream of
useless commit messages ;-), and to all others who made
PHP happen this year. 
DERICK RETHANS provides solutions for Internet related problems. He
has contributed in a number of ways to the PHP project, including
the mcrypt, date and input-filter extensions, bug fixes, additions
and leading the QA team. He now works as project leader for the
eZ compoments project for eZ systems A.S. In his spare time he likes
to work on, xdebug watch movies, travel and practice photography.
You can reach him at derick@derickrethans.nl.
Volume 5 Issue 1 • php|architect • 15
2005 Look Back
046 http://beeblex.com/php.internals/15567
047 http://beeblex.com/php.internals/13792
FOOTNOTES:
001 http://beeblex.com/php.internals/14013
002 http://beeblex.com/php.cvs/29839
003 http://beeblex.com/php.internals/14329
004 http://xdebug.org
005 http://beeblex.com/php.zend-engine.cvs/3321
006 http://beeblex.com/php.pecl.dev/1847
007 http://beeblex.com/php.pecl.dev/1852
008 http://beeblex.com/php.internals/14469
009 http://beeblex.com/php.internals/14692
010 http://beeblex.com/php.internals/14474
011 http://beeblex.com/php.internals/14558
012 http://beeblex.com/php.internals/14701
013 http://beeblex.com/php.internals/14713
014 http://beeblex.com/php.internals/14717
015 http://beeblex.com/php.internals/14736
016 http://beeblex.com/php.internals/14845
017 http://beeblex.com/php.pecl.dev/2083
018 http://beeblex.com/php.internals/14741
019 http://beeblex.com/php.internals/14803
020 http://phpdoc.org
021 http://beeblex.com/php.internals/14904
022 http://beeblex.com/php.pecl.dev/1884
023 http://www.gnu.org/licenses/gpl.txt
024 http://pecl.php.net
025 http://beeblex.com/php.pecl.cvs/2559
026 http://beeblex.com/php.internals/14971
027 http://beeblex.com/php.pecl.dev/2041
028 http://beeblex.com/php.internals/15052
029 http://derickrethans.nl/typehints_and_null.php
030 http://beeblex.com/php.internals/15137
031 http://ez.no
032 http://beeblex.com/php.internals/15375
033 http://fud.prohost.org
034 http://beeblex.com/php.internals/15424
035 http://beeblex.com/php.internals/15473
036 http://derickrethans.nl/php_look_back_2004.php
037 http://beeblex.com/php.internals/15490
038 http://beeblex.com/php.internals/15452
039 http://beeblex.com/php.pecl.dev/2189
040 http://beeblex.com/php.internals/15524
041 http://beeblex.com/php.internals/15593
042 http://beeblex.com/php.internals/14712
043 http://beeblex.com/php.internals/15558
044 http://beeblex.com/php.internals/15559
045 http://beeblex.com/php.internals/15561
16 • php|architect • Volume 5 Issue 1
048 http://beeblex.com/php.internals/15639
049 http://beeblex.com/php.internals/15657
050 http://beeblex.com/php.internals/15655
051 http://beeblex.com/php.internals/15739
052 http://beeblex.com/php.internals/15788
053 http://ez.no/products/ez_components
054 http://beeblex.com/php.internals/15735
055 http://beeblex.com/php.internals/15748
056 http://beeblex.com/php.internals/15767
057 http://beeblex.com/php.internals/15767
058 http://beeblex.com/php.internals/15773
059 http://beeblex.com/php.internals/15813
060 http://beeblex.com/php.internals/15953
061 http://beeblex.com/php.cvs/31242
062 http://beeblex.com/php.pecl.dev/2313
063 http://beeblex.com/php.pecl.dev/2316
064 http://beeblex.com/php.pecl.dev/2324
065 http://beeblex.com/php.cvs/31895
066 http://beeblex.com/php.cvs/31898
067 http://beeblex.com/php.cvs/31924
068 http://files.derickrethans.nl/patches/ze1-return-reference-20050429.diff.txt
069 http://beeblex.com/php.internals/16312
070 http://beeblex.com/php.internals/16314
071 http://beeblex.com/php.internals/16329
072 http://beeblex.com/php.internals/16325
073 http://beeblex.com/php.internals/16328
074 http://beeblex.com/php.internals/16323
075 http://beeblex.com/php.internals/16335
076 http://beeblex.com/php.zend-engine.cvs/3716
077 http://beeblex.com/php.internals/16461
078 http://beeblex.com/php.internals/16637
079 http://beeblex.com/php.internals/16375
080 http://beeblex.com/php.internals/16398
081 http://beeblex.com/php.internals/16432
082 http://beeblex.com/php.internals/16749
083 http://beeblex.com/php.internals/16583
084 http://beeblex.com/php.internals/16588
085 http://beeblex.com/php.internals/16392
086 http://beeblex.com/php.internals/16685
087 http://beeblex.com/php.internals/16698
088 http://beeblex.com/php.internals/16703
089 http://beeblex.com/php.internals/16793
090 http://beeblex.com/php.internals/16802
091 http://beeblex.com/php.internals/16903
092 http://beeblex.com/php.internals/16923
093 http://beeblex.com/php.internals/17026
094 http://beeblex.com/php.pecl.dev/2512
095 http://beeblex.com/php.pecl.dev/2513
096 http://beeblex.com/php.pecl.dev/2522
2005 Look Back
097 http://beeblex.com/php.pecl.dev/2517
148 http://beeblex.com/php.internals/19257
098 http://beeblex.com/php.cvs/32642
149 http://pecl.php.net/timezonedb
099 http://beeblex.com/php.internals/17116
150 http://beeblex.com/php.internals/19310
100 http://beeblex.com/php.internals/17109
151 http://beeblex.com/php.internals/19313
101 http://beeblex.com/php.internals/17169
152 http://beeblex.com/php.internals/19336
102 http://beeblex.com/php.internals/17177
153 http://beeblex.com/php.internals/19343
103 http://beeblex.com/php.internals/17188
154 http://beeblex.com/php.internals/19381
104 http://beeblex.com/php.cvs/33011
155 http://beeblex.com/php.internals/19348
105 http://beeblex.com/php.internals/17154
156 http://beeblex.com/php.internals/19465
106 http://beeblex.com/php.internals/17332
157 http://beeblex.com/php.internals/19470
107 http://beeblex.com/php.internals/17242
158 http://cvs.php.net
108 http://beeblex.com/php.internals/17244
159 http://beeblex.com/php.internals/19519
109 http://beeblex.com/php.internals/17287
160 http://beeblex.com/php.internals/19508
110 http://beeblex.com/php.pecl.dev/2543
161 http://beeblex.com/php.internals/19512
111 http://beeblex.com/php.internals/17579
162 http://oss.backendmedia.com/ReleaseChecklist
112 http://beeblex.com/php.internals/17638
163 http://oss.backendmedia.com/PhP4yz
113 http://beeblex.com/php.internals/17653
164 http://oss.backendmedia.com/PhP5yz
114 http://beeblex.com/php.internals/17668
165 http://beeblex.com/php.qa/26069
115 http://beeblex.com/php.internals/17719
166 http://beeblex.com/php.internals/19448
116 http://beeblex.com/php.internals/17848
167 http://beeblex.com/php.internals/19491
117 http://beeblex.com/php.internals/17771
168 http://beeblex.com/php.zend-engine.cvs/4248
118 http://beeblex.com/php.internals/18340
169 http://derickrethans.nl/typehints_and_null.php
119 http://ez.no/products/ez_components
170 http://beeblex.com/php.internals/19801
120 http://beeblex.com/php.internals/17491
171 http://beeblex.com/php.internals/19825
121 http://beeblex.com/php.internals/17610
172 http://beeblex.com/php.internals/19851
122 http://beeblex.com/php.internals/17581
173 http://beeblex.com/php.internals/19860
123 http://beeblex.com/php.internals/17588
174 http://beeblex.com/php.internals/20003
124 http://beeblex.com/php.internals/17820
175 http://beeblex.com/php.internals/20004
125 http://derickrethans.nl/erecoverableerror.php
176 http://beeblex.com/php.internals/20041
126 http://beeblex.com/php.internals/17883
177 http://beeblex.com/php.zend-engine.cvs/4336
127 http://beeblex.com/php.internals/17887
178 http://beeblex.com/php.zend-engine.cvs/4335
128 http://beeblex.com/php.internals/17890
179 http://beeblex.com/php.zend-engine.cvs/4359
129 http://beeblex.com/php.internals/18055
180 http://beeblex.com/php.internals/20066
130 http://beeblex.com/php.internals/17952
181 http://beeblex.com/php.internals/20102
131 http://beeblex.com/php.internals/17930
182 http://beeblex.com/php.internals/20112
132 http://beeblex.com/php.cvs/33824
183 http://beeblex.com/php.internals/20321
133 http://beeblex.com/php.internals/18696
184 http://beeblex.com/php.internals/20337
134 http://beeblex.com/php.internals/18806
185 http://beeblex.com/php.internals/20466
135 http://beeblex.com/php.internals/18671
186 http://beeblex.com/php.internals/20491
136 http://beeblex.com/php.internals/18691
187 http://beeblex.com/php.internals/20236
137 http://beeblex.com/php.internals/18794
188 http://php.net/~derick/meeting-notes.html
138 http://beeblex.com/php.internals/18884
189 http://beeblex.com/php.internals/20586
139 http://beeblex.com/php.internals/19048
190 http://beeblex.com/php.internals/20682
140 http://beeblex.com/php.internals/18884
191 http://beeblex.com/php.internals/20863
141 http://beeblex.com/php.cvs/33934
192 http://beeblex.com/php.internals/20277
142 http://beeblex.com/php.internals/18823
193 http://pecl.php.net/filter
143 http://beeblex.com/php.internals/18862
194 http://beeblex.com/php.internals/21188
144 http://beeblex.com/php.internals/18869
195 http://beeblex.com/php.internals/20929
145 http://beeblex.com/php.internals/19155
196 http://beeblex.com/php.qa/26614
146 http://beeblex.com/php.internals/19181
197 http://beeblex.com/php.cvs/36211
147 http://beeblex.com/php.internals/19187
198 http://beeblex.com/php.internals/21333
Volume 5 Issue 1 • php|architect • 17
F EATUR E
PHPLib’s Block Tool
p
h
p
?
<
PDFLib’s
Block Tool
If you’ve been
developing for any length
of time, you’ve probably been tasked with
generating PDFs at some point. In this article, we’ll
discuss the process of combining data from many
sources into a single PDF—from installation of the
block tool, to creating the blocks in Adobe Acrobat,
and then finally working with the blocks via PDFlib.
b y R o n G of f
T
he PDFLib Block Tool—available for use only
with PDFlib Personalization Server (PPS)—helps
create PDF documents derived from large
amounts of variable data.
Before the block tool was added, it was a
difficult process to place variable data, images, and even
other PDFs into precise areas of a PDF that had been
designed previously. Now, adding variable data is very
simple and helps create great dynamic pieces for just
about any application.
Installing the Block Tool
Currently, the block tool plug-in for Adobe Acrobat is only
available on the Windows and Macintosh (both Mac OS 9
and Mac OS X) platforms. On either platform, you must
also have Version 6 or 7 of Adobe Acrobat Professional
or Adobe Acrobat Standard, or the full version of Adobe
Acrobat 5. Other versions of Adobe Acrobat—Acrobat
Reader, and Acrobat Elements—and all other PDF creation
18 • php|architect • Volume 5 Issue 1
CODE DIRECTORY: pdflib
TO DISCUSS THIS ARTICLE VISIT:
http://forum.phparch.com/280
tools do not work with the block tool plug-in. (Check the
PDFlib web site for an up-to-date list of supported PDF
authoring tools.)
Windows OS Installation
If you’re using Windows, you can use the block tool
installer provided by PDFlib to get the plug-in installed
correctly into your version of Adobe Acrobat 5, 6,
or 7. The installer places the correct files into the
Acrobat plug-ins folder, which is typically found at
C:\Program Files\Adobe\Acrobat 6.0\Acrobat\plug_ins\
PDFlib. The Windows version of the block tool is
compatible only with PPS version 6.0.1.
PHPLib’s Block Tool
FIGURE 1
FIGURE 2
Mac OS Installation
You can install the block tool in either Mac OS 9 or OS X.
If you own Adobe Acrobat 5, place the files that comprise
the block tool into the Acrobat plug-in directory, typically
located at /Applications/Adobe Acrobat 5.0/Plug-Ins/.
If you’re using Adobe Acrobat version 6 or version 7, save
the files that comprise the block tool into a new directory
and then locate the Acrobat program, which is usually
found at /Applications/Adobe Acrobat 6.0 Professional.
Using the Finder, click once on the Acrobat application
to select it and then choose “File > Get Info” from the
menu bar. Locate the triangle next to the words “Plugins.” Expand the triangle, select “Add,” and then locate
the folder that contains the block tool plug-in files.
The New and Improved Block Tool
If you’ve used previous versions of the block tool,
you’ll notice that the new version is much more user
friendly. The export and import features have also been
updated, making it much quicker to apply blocks from
previously formatted PDFs.
FIGURE 3
Creating Blocks
After you install the block tool, you should see a new
menu called “PDFlib Blocks” in Acrobat’s main menubar.
You should also see a new icon that resembles [=])—this
is the block tool. (See the top of Figure 1.) You use the
block tool icon to create regions that you can fill with
variable data.
When you click the block tool icon and hover over the
PDF, your cursor turns into a crosshair. To create a block,
click the mouse and hold it while dragging your cursor.
As you drag your cursor, a lightly-outlined box should
appear. (See Figure 1.)
When you’re satisfied with the size of the box, release
the mouse button. A menu like the one shown in Figure
3 appears. The menu controls all of the properties of
the block, including the formatting of the data that will
be contained in the block (data that you will add via
Volume 5 Issue 1 • php|architect • 19
PHPLib’s Block Tool
PDFlib).
FIGURE 4
There are three types of blocks that can be created:
• The first and default type of block is text. It
handles any type of text, whether it’s a single
line of text or many lines of text.
• The second type of block is image. As its name
implies, an image block is a container for the
dynamic placement of images within the PDF.
• The third and last type is PDF, which is able to
contain other PDFs.
Each block has general properties (see Figure 2) and
FIGURE 5
type-specific properties. General properties set attributes
such as the placement of the block, its background and
border colours, and its orientation, to name just a few.
Some of the sections that follow describe the typespecific properties.
So what do you do with blocks? As you might have
inferred, already, you use blocks to mix dynamic content
amid static content. A designer can create a PDF, include
static text and images, and then place blocks wherever
dynamic content should appear. Your application “fills
in the blanks,” so to speak, and because blocks retain
properties such as typeface, font size, color, kerning, and
other settings, the block, once filled, looks exactly like
the rest of document—just as the designer intended.
Using blocks, the application that generates each
PDF document need not format anything. However,
if you want to customize a block on-the-fly, you can.
Pre-defined block attributes can be overwritten by your
code.
Editing Block Settings
FIGURE 6
To change a block property, select the block you want
to configure and then navigate to find the property you
want to change. For example, Figure 3 shows how to edit
the textflow property, which can be either true or false
(hence, the dropdown menu).
The purpose of most properties is obvious, but be
careful with attributes that specify font names. Unless
you’re running Acrobat on the same machine as your
PDFlib application, it’s likely that the set of fonts on
the two machines (say, your desktop and the server,
respectively) will differ. Be sure to use the name of fonts
that are installed on your server.
Text Flow Settings
If you want a block to flow (automatically wrap and
justify) arbitrary amounts of text, set the textflow
property to true. Once set to true, an additional button
named TextFlow appears next to the existing button
labeled Text. Click on TextFlow to examine and set specific
variables (such as leading and indents) that control how
text flows in the block. All other text attributes—those
for one line of text or a flow of text—remain in the same
pane as the textflow property.
Mac OS X “Tiger”
If you’re using a very recent version of Mac OS X, you
can find Acrobat’s plug-ins folder by control-clicking
the Acrobat application and selecting “Package
Content”.
20 • php|architect • Volume 5 Issue 1
PHPLib’s Block Tool
Image Settings
By changing the block option to image, you can use
PDFlib to place images dynamically in a PDF. There are far
fewer options for an image block than for a text block.
The options screen for an image block is shown in Figure
5.
The defaultimage attribute names a default image to
place if the image specified by PDFlib is unavailable.
The dpi setting, or the number of dots per inch, is
used to override the dpi of an image. PDFlib will use the
default dpi value of the image if it is available, or 72
dpi if this option isn’t set. If necessary, you can set the
horizontal and vertical dpi independently by supplying
two values instead of one, first horizontal dpi and then
vertical dpi.
The scale property controls the scaling of the
image. You can supply one value to scale horizontally
and vertically equally, or supply two values, one for the
horizontal and another for the vertical scale factor.
FIGURE 7
PDF Settings
The settings for a PDF block are very similar to the settings
for an image block, as shown in Figure 6. defaultpdf
specifies a default PDF to place if the PDF document that
PDFlib names cannot be found.
defaultpdfpage specifies which page of the default
PDF to place if the default PDF must be used.
scale controls the scaling of the PDF. As with an
image, you can specify one value to apply to both axes
or you can provide two values, one for horizontal scaling
and another for vertical scaling.
FIGURE 8
Custom Settings
When using any type of block, you can specify custom
attributes. Custom attributes do not affect the output
when using PDFlib, but can be retrieved by PDFlib for
interpretation by your code. Custom attributes are good
for passing information to the PDFlib program, or even
for just better record keeping.
As an example, say that you want to create a text
block that’s limited to ten characters or less. Create the
text block, add a custom property named length, set it
to 10, and then retrieve the value via PDFlib at runtime.
Your code can verify the length of a string before filling
the block and react accordingly, perhaps truncating the
string or asking the user to provide a new value.
FIGURE 9
The PDFlib Blocks Menu
To make setting up blocks easier, the “PDFlib Blocks”
menu has a few handy tools. You can export and import
blocks to re-use complex blocks, you can align elements,
and more.
Volume 5 Issue 1 • php|architect • 21
PHPLib’s Block Tool
Exporting
The “Export” feature is a huge timesaver when dealing
with multiple PDFs that require the same types of blocks.
Once you’ve finished setting up blocks in a single “master”
PDF, you can export those blocks and then import them
over and over again into other PDFs. There are several
different settings in the “Export” dialog (see Figure 7):
• You can export blocks from all pages of the
PDF or from a subset of them.
• You can export blocks to a new PDF or to an
existing PDF. Selecting “New File on Disk”
creates a blank PDF with the blocks set in
the new file. If you want to export blocks to
a document that you already have opened
in Adobe Acrobat, select “Open Document”
and click “Choose” to see a list of all open
documents. If you choose “Replace Existing
Files”, the block tool will overwrite the target
file with blank pages with the blocks in the
proper place.
• The next option is “Export Which Blocks?” This
section allows you to control which blocks
are exported. You can export all blocks—
depending on the number of pages you choose
in the first section—or just the blocks that
you highlight before exporting. You can also
choose to delete the blocks that exist on the
target PDF.
that it’s your primary choice. Then choose another block;
it should turn blue, indicating that it’s your secondary
choice. When you select “Align,” the blue block should
align with the pink block. Figure 9 shows two blocks,
Block_1, the secondary block, left-aligned to the primary
block, Block_0.
The “Size” alignment option only works when more
than one block is selected. You can change all secondary
blocks (blue) to be either the same width or height as
the primary block (pink).
The “Center” alignment option aligns all blocks
selected either horizontally or vertically, and even both
horizontally and vertically.
Defining Blocks and Detecting
Settings
Two other time savers are available in the “PDFlib Block”
menu: one creates a block from a placed object like an
image, and another creates blocks that automatically
detect the font settings and font color of the font that
the block is being created over.
Click on “Click Object to Define Block” and then click
on an object such as an image to create a block of the
same dimension in the exact same position.
Or, if you click on “Detect Underlying Font and Color”
before you create a block, the block’s font settings are
automatically set to match the style and size of the text
below the new block. This feature is especially useful
Whatever text you “insert” assumes the
formatting of the block.
Importing
You can import blocks from another PDF using the import
option in the “PDFlib Blocks” menu. When you choose
“Import,” you will be presented with a screen to choose
the file that contains the blocks you want to import
(Figure 8).
After you choose the appropriate file, you can
determine which pages the blocks should be applied to.
Alignment Options
The alignment option in the “PDFlib Blocks” menu allows
you to align two blocks.
To align, choose a block. It should turn pink, reflecting
22 • php|architect • Volume 5 Issue 1
when dealing with a lot of text and specific colors. (You
may have to adjust the font name to match a font located
on the server running PDFlib.)
Using Blocks
As you might imagine, working with blocks from within
your code makes placing text, images, and PDFs into a
dynamic PDF far simpler than writing code to control the
pointer, stroke text line-by-line, and so on. With blocks,
formatting is separated from your code, leaving all of the
aesthetics to the designer creating the PDF. Better yet,
a change to the design of the page doesn’t (necessarily)
Anytime
Anytime
Anytime
PHPLib’s Block Tool
necessitate tweaking your code.
Setting up the dynamic PDF document is similar to
what’s been shown in prior chapters, except you need to
pull in the PDF that contains the blocks. First, specify
the basic information:
if (!extension_loaded(‘pdf’)) {
dl(‘libpdf_php.so’);
}
$p = PDF_new();
PDF_begin_document($p, “”, “”);
PDF_set_info($p, “Creator”, “block_tool.php”);
PDF_set_info($p, “Author”, “Ron Goff”);
PDF_set_info($p, “Title”, “Block Tool”);
Next, pull in the PDF page that contains the blocks, place
it into memory, and create a new blank page:
$block_file = “block_file.pdf”;
$blockcontainer = PDF_open_pdi($p, $block_file, “”, 0);
//Page standard 8.5 x 11
PDF_begin_page_ext($p, 612, 792, “”);
Continuing, call up the actual page that you want to use.
In the line of code below, the 1 (numeral one) refers to
page one of the PDF that contains the blocks.
$page = PDF_open_pdi_page($p, $blockcontainer, 1, “”);
If you want to use another page from the “template”
PDF, just specify that page number instead of 1.
Finally, the page with blocks is “copied” to the new
page in the new PDF.
PDF_fit_pdi_page($p, $page, 0.0, 0.0, “adjustpage”);
The adjustpage option adjusts the size of the new
page to match the page size of the template PDF.
adjustpage overrides any page settings that have been
set previously.
From here, you are ready to use the blocks.
Text Blocks
Whether working with a line of text or a text flow, text is
easy to fill in: just specify the name of the block and the
text to render and call PDF_fill_textblock().
$block = “Block_1”;
$text = “All the pie in the sky wasn’t enough to fill my plate”;
PDF_fill_textblock($p, $page, $block, $text, “encoding=winansi”);
The block name, here Block_1, is the name that was
assigned to the block when it was created in the
template PDF. (Block names are unique and the default
name is Block_#, but a block name can be any string of
alphanumeric characters.)
Notice that there are no extra formatting options.
Whatever text you “insert” assumes the formatting of
the block.
24 • php|architect • Volume 5 Issue 1
Form Conversion
You may be familiar with the Adobe Acrobat “Form
Tool,” a great way to create fillable areas of your
PDF. So, why not just use forms to define variable
data placement? Because the form tool is limited:
it cannot specify advanced font settings, whereas
the block tool has been designed specifically to
customize all aspects of your text. However, if
you have a PDF that used the form tool to define
areas for text, there is an option within the “PDFlib
Blocks” menu to convert your pre-made forms into
blocks (Figure 5.4).
If you want to override a block’s formatting, you can.
Where encoding=winansi appears, add the options that
you want to override. For example, to override the font
size, specify encoding=winansi fontsize=12.
You should also enable embedding as needed. You
can enable embedding by adding embedding=true as in
encoding=winansi embedding=true.
Image Blocks
The process of placing an image in an image block
resembles that of placing the image “manually”: the
image is loaded and then placed.
$block4 = “Block_4”;
$image_load = “image.jpg”;
$image = PDF_load_image($p, “auto”, $image_load, “”);
PDF_fill_imageblock($p, $page, $block4, $image, “”);
PDF_close_image($p, $image);
In this example, the image image.jpg is placed in Block_4
using the function PDF_fill_imageblock().
PDF Blocks
The steps to place a PDF document within the dynamicallygenerated PDF are similar to the steps required to set up
a page to work with blocks. You identify which block you
want to “fill,” identify the PDF and the page you want
to extract from, and then fill the named block with that
content.
$block5 = “Block_5”;
$pdf_load = “basic_pdf.pdf”;
$pdf = PDF_open_pdi($p, $pdf_load, “”, 0);
$pdf_fill = PDF_open_pdi_page($p, $pdf, 1, “”);
PDF_fill_pdfblock($p, $page, $block5, $pdf_fill, “”);
PDF_close_pdi($p, $pdf);
PDF_open_pdi() opens the PDF, while PDF_open_pdi_page()
loads the correct page. The function PDF_fill_pdfblock()
puts it all together, placing the actual PDF onto the page.
Finally, close the open PDF by calling PDF_close_pdi(),
which frees the resources consumed by the open PDF.
PHPLib’s Block Tool
Closing the Page
After you’ve filled all of the appropriate blocks on the
open page, you must close that page.
PDF_close_pdi_page($p, $page);
This line closes the PDF and you can start a new page, or
end the entire document after this is called.
Putting All Together
A complete example using the PDF_fill_textblock()
function can be seen in Listing 1.
The PDFlib block tool is easy to use and provides
for complex layouts without extensive programming.
Using blocks, a designer can assign where dynamic text,
images, and even PDFs are to be placed, yielding a much
more professional result. 
RON GOFF is the technical director/senior programmer for Conveyor
Group (www.conveyorgroup.com), a Southern-California based
web development firm. He is the author of several articles for
PHP|Architect magazine and other online publications. Ron’s lives in
California with his wife Nadia and 2 children. You can contact him at
ron@conveyorgroup.com.
LISTING 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
<?php
if (!extension_loaded(‘pdf’)) {
dl(‘libpdf_php.so’);
}
$p = PDF_new();
PDF_begin_document($p, “”, “”);
PDF_set_info($p, “Creator”, “block_tool.php”);
PDF_set_info($p, “Author”, “Ron Goff”);
PDF_set_info($p, “Title”, “Block Tool”);
$block_file = “block_file.pdf”;
$blockcontainer = PDF_open_pdi($p, $block_file, “”, 0);
PDF_begin_page_ext($p, 612, 792, “”);
$page = PDF_open_pdi_page($p, $blockcontainer, 1, “”);
PDF_fit_pdi_page($p, $page, 0.0, 0.0, “adjustpage”);
$block = “Block_1”;
$text = “All the pie in the sky wasn’t enough to “
.”fill my plate”;
PDF_fill_textblock($p, $page, $block, $text, “”);
PDF_close_pdi($p, $blockcontainer);
PDF_close_pdi_page($p, $page);
PDF_end_page_ext($p, “”);
PDF_end_document($p, “”);
$buf = PDF_get_buffer($p);
$len = strlen($buf);
header(“Content-type: application/pdf”);
header(“Content-Length: $len”);
header(“Content-Disposition: inline; “
.”filename=block_pdf.pdf”);
print $buf;
PDF_delete($p);
?>
Volume 5 Issue 1 • php|architect • 25
FEATURE
FPDI in Detail
FPDI
in detail
Most PHP developers about the ability to create PDF documents on
the fly. When looking at the wide range of PHP classes or APIs, every
product has its own advantages and disadvantages—some of them
are very expensive and others are free, but don’t offer the same
functionality as the expensive ones. The main difference between
the free and commercial libraries is the ability to use external
documents. PDFLib has supported this through its PDI interface, but
the free classes didn’t external documents, until I released FPDI for
FPDF, which gives you the same muscle—but for free!
by JAN SLABON
P
DF documents—or better stated: the PDF
format—have reached widespread popularity
over the past few years, and this momentum
continues. A very strong example of this is
in a recent ISO standard, which is based on
PDF 1.4, and defines a PDF derivate for the long-term
preservation of electronic documents. PDF has becomea
a real standard!
In fact, the dynamic generation of PDF documents is
an important issue today, and will continue to be so in
the future. While it’s quite simple to build PDF docments
on desktop PCs, their dynamic generation on a webserver,
especially when using a language like PHP, can prove
very difficult.
On the Internet, you’ll find several PDF APIs that
will allow you to create PDF documents with PHP. Some
26 • php|architect • Volume 5 Issue 1
PHP: 4.2+
OTHER SOFTWARE: FPDF 1.53 and FPDI 1.1
CODE DIRECTORY: fpdi
TO DISCUSS THIS ARTICLE VISIT:
http://forum.phparch.com/279
are delivered as PHP extensions, and some are “simple”
PHP classes. Years ago, I came across a PHP class
going by the name of FPDF, written by Olivier Plathey
(http://www.fpdf.org). I was absolutely amazed by its
capabilities, its easy usage and that that the “F” in
“FPDF” stands for “Free.”
FPDI in Detail
When I was working with FPDF, I was often challenged
with a situation where I had to rebuild a whole document,
programmatically. As you can imagine, this part was
very frustrating, tedious, and time consuming. A digital
version of your document is sitting right in front of you,
and you just cannot use it.
Similarly, I ran into additional problems when dealing
with vector based graphics and FPDF. There was no real
way to import such things, except by converting them to
bitmaps and using the Image() method of FPDF. I’m sure I
don’t have to explain the drawbacks to this workaround.
When I found an article in php|architect (Vol. 3, Issue
5) where Marco Tabini described how to parse a PDF and
update it with some simple content, I got the idea to
implement this technique into FPDF—which resulted in
a library which was also named with 4 simple chars: FPDI
(Free PDF Import).
I released my new library under the Apache
Software License 2.0, which allows you to use it in your
commercial or non-commercial projects. The project
homepage can be found at http://fpdi.setasign.de. The
article by Marco is freely available as a monthly sample,
at http://www.phparch.com/issuedata/articles/article_110.pdf.
In this article, I’ll introduce you to FPDI, explain
how it was born, and cover its internal workings. I will
assume that you have some knowledge of FPDF, and have
a bit of experience with the Portable Document Format,
itself. If not, just download FPDF, and run the tutorials
that Olivier provided in the package. This article will not
tell you how to use FPDF, but will delve deeper into the
details of the PDF structure and how FPDI extends FPDF,
bringing out the ability to import single pages of existing
PDF documents—not just modifying existing documents.
This feature is not that clear to most people out there.
At this point I could tell you much about the structure
of a PDF document, but as I already mentioned, the whole
idea is based on another article, where everything you
need to know about parsing a PDF is already described.
I will cover some details about that issue later in this
article.
I want to make it clear why I chose the “import single
pages” method, instead of “really modifying/updating” a
PDF. To put it simply: “It is much easier.” You can look at
a PDF document as a collection of single objects which
are linked to each other. Pages, images, font descriptions,
and document information are all single objects and can
be identified by a unique ID.
The PDF format is more flexible than just assigning
objects by simple IDs, though—it allows one to define
named relations. For example, these relations can be
used to put an image into a content stream of a PDF
page. You have to set up a resource dictionary, where you
define the name of the image and its real object relation.
After this, you can simply refer to the image by using the
name you provided in the content stream. As FPDF, and
any other PDF generators, use named relations, which
lead into name conventions, you have to pay attention
when updating a PDF.
If you’ve read Marco’s article, you’ll remember that
there’s a part in it where he searches for the next
available font name. This check has to be built into FPDF
before every piece of code where FPDF creates a named
relation.
Another disadvantage of updating documents is that
you cannot remove single pages, or reuse an existing
page in an easy way. This method will, however, allow us
to reuse, resize, crop or rotate page. We can also avoid
naming conventions, because every imported page has
its own kind of namespace in the new document, as you’ll
see below.
The Basics
While I was studying the PDF reference to find a good
solution for importing pages, I came across a technique
with the spooky name of “form XObjects”. I’m sure that
everyone who stumbles upon this term thinks about
conventional “forms” like those that we use in HTML, or
on paper. In this case, “form” has another meaning: it
corresponds to the notation of forms in the PostScript
language.
A form XObject can be compared with a kind of layer.
It is a self-contained description of any sequence of
graphics objects—its whole structure is almost similar
to the structure of a single page in a PDF document.
The form XObject has its own resource dictionary, where
named relations are defined. So, it seemed to be the
perfect solution for my problem: if I could create form
XObjects, I most certainly would be able to convert pages
into them.
But, form XObjects have more advantages than simply
preparing FPDF for PDF import. For example, they can be
reused at any time in a PDF document, where the viewer
application can cache the rendered results to optimize the
execution. It sounded like a kind of template to me, so I
began extending FPDF with this feature, which resulted
in a PHP class called fpdf_tpl. This class redirects all
output made by FPDF into containers which will be used
as form XObjects, so one can reuse any output created
with FPDF, at any time.
This class has more to offer than merely preparing FPDF
for FPDI—as already stated. You can reuse a template
multiple times in a document, whereas it only needs to
be written once to the resulting document, which leads
to less memory usage and processing time in your script.
Volume 5 Issue 1 • php|architect • 27
FPDI in Detail
LISTING 1
LISTING 1 (CONT’D)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
<?php
define(FPDF_FONTPATH, ‘classes/font/’);
require_once(‘classes/fpdf_tpl.php’);
class pdf extends fpdf_tpl {
var $useTPLs = true;
var $_startTime;
var $_endTime;
var $_writingTime = false;
function Header() {
static $tplidx = null;
if ($this->_writingTime)
return;
if ($this->useTPLs) {
if (is_null($tplidx)) {
$tplidx = $this->beginTemplate();
$this->writeBackground();
$this->endTemplate();
}
$this->useTemplate($tplidx);
} else {
$this->writeBackground();
}
}
function writeBackground() {
static $content = null;
$this->SetFont(‘Arial’,’B’, 10);
$this->SetFillColor(255,153,0);
$this->Rect($this->lMargin, 28, $width =
$this->w-$this->rMargin-$this->lMargin, 3, ‘F’);
$this->Rect($this->lMargin, $this->h-10,
$width, 3, ‘F’);
list($usec, $sec) = explode(“ “, $this->_startTime);
$start = ((float)$usec + (float)$sec);
list($usec, $sec) = explode(“ “, $this->_endTime);
$end = ((float)$usec + (float)$sec);
$time = $end - $start;
$this->Cell(0, 4, ‘Time: ‘.$time, 0, 1);
// get the size of the buffer
$buffersize = 0;
for($n = 0, $c = count($this->pages); $n < $c; $n++)
$buffersize += strlen($this->pages[$n]);
for($n = 0, $c = count($this->tpls); $n < $c; $n++)
$buffersize += strlen($this->tpls[$n][‘buffer’]);
$this->Cell(0, 4, ‘Total buffersize: ‘.$buffersize.
‘ bytes (uncompressed)’);
parent::Close();
}
}
$pdf =& new pdf();
#$pdf->useTPLs = false;
for ($n = 0; $n < 200; $n++)
$pdf->AddPage();
$pdf->Output(‘test.pdf’,’I’);
?>
FIGURE 1
$this->Image(‘images/php-a.png’, 100, 5, 100);
$this->SetDrawColor(0);
$this->SetLineWidth(0.3);
$this->Rect($this->lMargin+.15, 31, $width-0.3,
$this->h-31-10, ‘D’);
$this->SetXY($this->lMargin+.15, 31+.15);
if (is_null($content))
$content = file_get_contents(__FILE__);
$this->SetFont(‘Courier’,’’,6);
$this->MultiCell($width-.3, 2.5, $content);
}
// For debugging purpose
function pdf($orientation=’P’,$unit=’mm’,$format=’A4’)
{
$this->_startTime = microtime();
parent::fpdf_tpl($orientation,$unit,$format);
}
// For debugging purpose
function Close() {
$this->_endTime = microtime();
$this->_writingTime = true;
$this->AddPage();
Examples of its use are: the generation of headers and/or
footers, table headers which could be repeated on every
page, a background grid of large tables, text in front or
behind a template, etc.
If you take a look at Listing 1 and Figure 1, you’ll
see a sample script which demonstrates the use of
templates. You turn templates on and off by setting
the $pdf->useTPLs property to true or false—the visual
result is the same. This demo has no real meaning, but it
shows how much the file size and process time decrease if
you’re using templates. My tests gave me a process time
of only 0.0766 seconds when using templates, and 3.649
seconds without them! The same was true for the buffer
size: with templates it only takes up 14.5 kb—without
28 • php|architect • Volume 5 Issue 1
templates, approximately 1.2 MB.
I hope that the main advantage of fpdf_tpl is now
clear. Let’s skip ahead and take a deeper look at this
class. The class uses an array for holding all created
templates named $this->tpls where each entry describes
a single template as an array with special keys. The main
entries in each template array are x, y, w, h and buffer.
All other entries are just used to save other information,
and are prefixed with o_.
A new property, with the name of $this->res is used
to assign resources like fonts, images, or other templates,
to the template or the page. The assignment of resources
to single pages is left in for testing purposes, and will be
removed in the next release of fpdf_tpl.
PODCAST AD
FPDI in Detail
LISTING 2
LISTING 2 (CONT’D)
1 Array
2 (
3
[0] => 9
4
[obj] => 11
5
[gen] => 0
6
[1] => Array
7
(
8
[0] => 5
9
[1] => Array
10
(
11
[/Type] => Array
12
(
13
[0] => 2
14
[1] => /Page
15
)
16
17
[/Parent] => Array
18
(
19
[0] => 8
20
[1] => 10
21
[2] => 0
22
)
23
24
[/MediaBox] => Array
25
(
26
[0] => 6
27
[1] => Array
28
(
29
[0] => Array
30
(
31
[0] => 1
32
[1] => 0
33
)
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68 )
So, we’ll only take a look at the tpl key in $this->res.
This array is needed to rebuild the form XObjects
resources dictionary with named relations, which are
used in the template. To redirect the output made by
FPDF, I used a simple flag, $this->intpl, and extended
the _out() method. I had to take special care because a
form XObject cannot include internal or external links or
better, any kind of annotation.
FPDF uses a single, global resource dictionary for all
pages and creates this within the _putresources() method.
I extended this method to make it call _puttemplates(),
which will create all necessary template objects. After
the objects are created and written, the named relations
to them will be written to the main resource dictionary.
All created templates are usable on every page!
Unfortunately, using the global resource dictionary isn’t
the best solution because it’ll introduce problems when
interpreting or extracting pages of a document, as you
will see later.
With the fpdf_tpl class, I’ve build the basis for
FPDI—now, we have to convert the pages of an already
existing PDF document, but we have to parse it first, to
get the desired information.
pdf_parser, and added support for reading streams. Let’s
Parsing the Original Document
I owe a lot of credit to Marco’s article, because the
parsing of an existing document was nearly completely
covered in it.
I adapted all parsing functions into a single class,
30 • php|architect • Volume 5 Issue 1
[1] => Array
(
[0] => 1
[1] => 0
)
[2] => Array
(
[0] => 1
[1] => 595
)
[3] => Array
(
[0] => 1
[1] => 842
)
)
)
[/Contents]
(
[0]
[1]
[2]
)
=> Array
=> 8
=> 1
=> 0
)
)
take a quick look at the structure and how the parsing
has to be done. The first task that the parser has to do
is to read the xref-table of the PDF document. This is
done by the pdf_parser::pdf_read_xref() method. The
xref-table is similar to a table of contents. It gives us
information about the objects used in the document, and
their byte-offset positions in the file. At the end of the
xref-table, we’ll find the file trailer dictionary; the entries
in this table lead us to the catalogue dictionary of the
file. The catalogue dictionary is the root of all objects
in the document’s object hierarchy and we’ll find the
reference to the first page tree node of the document’s
page tree—which is exactly what we’re searching for: all
single pages used in the existing document.
The parser has to follow the whole page tree to get
the exact page count and to collect other information on
the pages, which is done by read_pages() in the extended
class, fpdi_pdf_parser, and results in an array as the
$this->pages property. The keys of $this->pages are the
desired page numbers starting at zero where each entry
holds the related page object. After this task is done,
we have enough information about the source document
for now.
While I was implementing this code, I got stuck
on some problems—it took me several days (and
nights) to fix them. A great problem for me was the
determination of the line ending in a file. Normally,
this task is handled by the PHP configuration directive
FPDI in Detail
auto_detect_line_endings, but as a PDF file can have
multiple updates by different programs (on different
operating systems), the line endings can be mixed. To
overcome this issue, I’ve written a wrapper for fgets()
which comes in use as a fallback function if fgets()
returns incorrect data. This wrapper function also enables
the class to be used with a PHP-version less than 4.3,
where auto_detect_line_endings was introduced.
To make FPDI compatible with PHP versions less than
4.3, I also created other wrapper functions for strspn()
and strcspn() where introduced so that FPDI should run
with php 4.2+.
During my testing (with hundreds of PDF files), I
found several minor bugs in the parsing process—some
are fixed and some are so raw that they can be ignored
for now.
Let’s Convert a Page to a
Form XObject
First, we’ll take a deeper look at a page object found in
$this->pages of a parser object. A PDF object is represented
internally as an array, in a specified structure, as Marco
defined in his article. For demonstration purposes, we
use the shipped demonstration PDF with FPDI:
$pdf =& new fpdi();
$pdf->setSourceFile(‘classes/pdfdoc.pdf’);
echo “<pre>”;
print_r($pdf->current_parser->pages[0]);
You can see the output in Listing 2. At first look, it
seems very odd, but everything makes sense! Every entry
in any level is built as an array with at least the keys
0 and 1, where 0 describes the type of the value in key
1. All other keys are used to define special attributes
of that value. The types are defined as constants in
pdf_parser.php. For example the 0 key in the lowest level
is 9 which is defined as a PDF object. This object’s value
is a dictionary (5)—in this case a page dictionary—with
tokens that each have their own value types.
To import a page, FPDI offers a method called
ImportPage() which is close to the BeginTemplate()
method of fpdi_tpl. As we’ve seen, the structure of a
template entry in $this->tpls contains main entries like
x, y, w, h and buffer.
If we take a closer look at Listing 2, we can see a
relationship between these entries. /MediaBox is an array
(6) of exactly 4 entries, whose value types are numeric
(1). The first entry’s value is that of x, the second of y,
third of w and, not surprisingly, the last one of h. This is
actually a bug in the current release of FPDI. The last 2
values are also coordinates. The real values for the width
and the height have to be calculated by specifying the
distance of the first to the third and the second to the
fourth value. This bug has been overlooked for a long
time, because its only manifests itself if the MediaBox’s
x- or y-value have values other than 0. It’ll be fixed in
the future!
To resolve the MediaBox’s data, the extended parser
for FPDI is shipped with a getPageBox() method. This
method is needed, because the MediaBox (or any other
box) can also be referenced to another PDF object, or
the value can be inherited by a parent page in the page
tree. This method makes sure that the correct values
will be resolved. Currently, FPDI supports only PDFs that
contain a MediaBox—there are other boxes in the PDF
specification e.g. a CropBox or a TrimBox. If your PDF
uses other boxes instead of a MediaBox, the results of
FPDI might not be as expected. Also if another box is
used, you can ignore the bug described in the paragraph
above.
The next task is to fill the buffer of our template
with the content stream of the imported page. There’s
one important difference between a PDF page and a
form XObject: a page can have multiple content streams,
while a form XObject can only have one. Because of
this issue, we have to concatenate all content streams
of a page into one single stream. To do this, there’s a
method called getPageContent() in the extended parser
(fpdi_pdf_parser).
All of these resolved streams can be encoded with
different filters. The most commonly used filter is the
FlateDecode filter which can be decoded with the zlib
functions, if they are enabled in the PHP installation.
I’ve also written 2 more decoders for the LZWDecode- and
ASCII85Decode-filters. With these 3 filters, FPDI should
handle nearly all documents which have encoded page
content streams—until now there have been no bug
reports related to an absent filter. The decoding of the
content streams is done by the rebuildContentStream()
method, in the extended parser class. After decoding all
streams, they can be simply concatenated to a single one
and assigned to the buffer key in the desired template
array.
The next step is to resolve the resources which are
used in the content streams we want to import. These
can be relations to images, fonts or other form XObjects.
The resources are normally defined as named relations
in the page dictionary, or in one parent page in the
page tree. To resolve them, the extended parser offers a
_getPageResources() method, which returns the desired
resource data of the page. The method will not resolve
the resource’s own data, but only the information like
its name, and to which objects it is referenced in the
original document. The real import of these resources
Volume 5 Issue 1 • php|architect • 31
FPDI in Detail
FIGURE 2
FIGURE 3
A PDF cannot be compared to a file with
a structural language like HTML.
LISTING 3
LISTING 4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
1 class pdf extends fpdi {
2
[...]
3
var $_logoIdx = null;
4
[...]
5
6
function Header() {
7
[...]
8
if ($this->_writingTime)
9
return;
10
11
if (is_null($this->_logoIdx)) {
12
$this->setSourceFile(‘pdfs/php-a.pdf’);
13
$this->_logoIdx = $this->ImportPage(1);
14
}
15
16
if ($this->useTPLs) {
17
[...]
18
} else {
19
[...]
20
}
21
}
22
23
function writeBackground() {
24
[...]
25
$this->Rect($this->lMargin, $this->h-10,
26
$width, 3, ‘F’);
27
28
$this->useTemplate($this->_logoIdx, 100, 5, 100);
29
30
$this->SetDrawColor(0);
31
[...]
32
}
33
34
[...]
35 }
<?php
define(FPDF_FONTPATH, ‘classes/font/’);
require_once(‘classes/fpdi.php’);
$pdf =& new fpdi(‘L’,’pt’);
// load the origin document
$pagecount = $pdf->setSourceFile(‘pdfs/article_110.pdf’);
#$pagecount = $pdf->setSourceFile(‘pdfs/thumbnails.pdf’);
$pdf->AddPage();
$x = $pdf->lMargin;
$y = $pdf->tMargin;
for ($i = 1; $i <= $pagecount; $i++) {
// import page no. $i
$tplidx = $pdf->ImportPage($i);
// use the imported page
$size = $pdf->useTemplate($tplidx, $x, $y, 250);
// draw a border around the used page
$pdf->Rect($x, $y, $size[‘w’], $size[‘h’], ‘D’);
// if it’s the third page in a row do a
// pagebreak and reset the x- and y-values.
if ($i % 3 == 0) {
$pdf->AddPage();
$x = $pdf->lMargin;
$y = $pdf->tMargin;
continue;
}
$x += 270;
$y += 100;
}
$pdf->Output(‘thumbnails.pdf’, ‘D’);
$pdf->closeParsers();
?>
32 • php|architect • Volume 5 Issue 1
FPDI in Detail
into the new document will be done automatically in the
extended _puttemplates() method.
Because these resources have their own unique
identifiers in their source document, FPDI has to reassign
new identifiers to the objects at runtime. All of the data
which will be copied from the original document to the
new document will be written by the pdf_write_value()
method, which accepts an array in the same structure
that you see in Listing 2. If pdf_write_value() reaches
an object reference (8), it’ll reassign a new unique id
(if one does not exist), and push the original object
identifier onto a stack. This stack will be processed in the
_putOobjects() method, recursively. If _putOobjects()
sends data to pdf_write_value(), which also includes
object references, the stack will be filled again. FPDI
will not write duplicates of object references—it will
“remember” previously written objects of a specific file.
FPDI will, however, follow every object reference it
finds. This behaviour is particularly important to the
programmer, even if you want to import only a single
page of a very large file. As I’ve already stated, the PDF
structure allows the creator to define a single, global
resource dictionary, as FPDF does, where all used resources
are defined in the document. FPDI will not recognize
which of these resources are really in use on the imported
page. Just think about the following example: we create
a 100 page PDF with FPDF, where each page shows one
unique image. Now, we want to import page number
40 into a new document with FPDI. Because FPDF uses
such a global resource dictionary, FPDI will resolve that
dictionary as the resource dictionary of the single page,
and will copy all of the images into the new document—
even if it only shows one image! So, don’t be surprised,
if you re-import pages of PDFs made by FPDF.
Using FPDI
Now we should know how FPDI and fpdf_tpl work,
internally. It’s time to take a look at some examples.
Listing 3 shows code which creates a thumbnail
overview, similar to Marco’s original article. As you can
see, the usage is very simple. The first step is to call
setSourceFile() with the desired PDF file, which will
return the page count of the document. Next, we simply
use a for loop to import each page. As you can see, the
useTemplate() method nicely returns the dimensions of
the imported page, so we can use this data to draw a
border around it. You can see the results in Figure 2. To
demonstrate FPDI’s flexibility, you can try to re-import
this generated document by changing the filename to
thumbnails.pdf and then take a look at Figure 3.
I already suggested that FPDF normally cannot work
with vector based graphics, like a logo. But, as a PDF
FIGURE 4
FIGURE 5
FIGURE 6
Volume 5 Issue 1 • php|architect • 33
devshed
devshed
FPDI in Detail
document can have vector based information, we can use
FPDI to do the job. Let’s go back to the first example
of fpdf_tpl. I used a PNG image as the php|architect
logo. If we zoom in, we’ll see that the image gets a bit
distorted (see Figure 4)—it isn’t a vector image, so it
doesn’t scale. To use an imported page in a template, it is
necessary to import it before the call to beginTemplate(),
as you can see Listing 4. This results in a much better
quality page, as you can see in Figure 5.
If you’re currently reading a PDF issue of this
magazine, you’ll see that the document is personalized
with your name and email. With FPDI and FPDF, you can
get similar results. Just import a pre-existing page, and
render personalized information on top of the imported
data. In Listing 5, you’ll find an example of how you can
personalize a PDF with FPDI—the result can be viewed
in Figure 6.
There’s something you need to know about creating
such personalized documents: you should always keep
in mind, that FPDI will not and will never manipulate
an existing document, but will create a completely new
one with its own structure. I should also mention that
all dynamic content like links, PDF form elements, or
any other annotation will get lost during the import
process—they are not part of the content stream of a
page. So, this personalization will only work with simple
PDF files.
Another point to mention is the size of the original
document. Because FPDI has to rebuild the whole
document, it must decode every content stream and
hold them in memory. It will need a lot of computing
power and memory for this task, which results in a long
process time of the script—the limits of a standard PHP
installation can be reached much faster than you think!
If you take a closer look at the PDF version of php|a,
you’ll see that it is also protected with your personal
password (the same as your phparch.com account). PDF
allows this, but it cannot be implemented with FPDI,
alone.
Some time ago, the protection extension for FPDF
was written by Klemen Vodopivec, and I was involved
as a beta tester and bug hunter—which was a long time
before I thought about FPDI. Protection is an essential
extension for FPDF—I think it’s the most commonly used
one. It gives users or programmers a secure feeling.
I’ve received several emails from users who want to mix
both extensions to create protected PDFs with FPDF and
FPDI, which in the end, resulted in a FPDI_Protection
extension, which you also can download from the FPDI
project homepage.
FPDI_Protection’s task is simple: it must encrypt
output made by FPDF’s _putstream() and _textstring()
36 • php|architect • Volume 5 Issue 1
methods, and also by FPDI’s pdf_write_value() method.
There is only one particularly tricky part that you must
pay attention to: strings which are HEX-encoded, instead
of plain strings. These values have to be converted to
plain text, first, then encrypted and reconverted to HEX
values.
To use FPDI_Protection in our example, we have to
simply extend our pdf class from FPDI_Protection instead
of FPDI. Now, we can simply use the SetProtection()
method to add the protection/encryption features to our
resulting PDFs.
Future and Dreams
I’ve already mentioned some problems and bugs in FPDI,
but have you ever found software without bugs? Probably
not... I have some plans for the coming releases, which
are not only mere bug fixes, but also improvements.
On top of my list, there’s the handling of PDFs that
contain other boxes than the aforementioned MediaBox.
This missing feature is sadly FPDI’s most reported problem.
If you’ve run into same problem, you can work around
it by simply reprinting the PDF through the Adobe PDF
printer, which is shipped with Adobe Acrobat or (maybe)
some other PDF printer—I haven’t test the others.
Another missing feature that I have not yet
mentioned in this article is the handling of rotated
pages. A PDF page can be defined as rotated, whereas
LISTING 5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<?php
define(FPDF_FONTPATH, ‘classes/font/’);
require_once(‘classes/fpdi.php’);
class pdf extends fpdi {
var $text = ‘’;
function SetText($text) {
$this->text = $text;
}
function Footer() {
static $w = null;
$this->SetFont(‘Arial’, ‘B’, 6);
$this->SetFillColor(255,0,0);
if (is_null($w))
$w = $this->GetStringWidth($this->text)
+$this->cMargin*2;
$this->SetXY($this->w-$this->rMargin-$w, -3);
$this->Cell($w, 2.8, $this->text,0,0,’R’,1);
}
}
$pdf =& new pdf(‘P’, ‘mm’, array(215.9, 279.4));
$pdf->SetText(‘This document is personalized ‘
. ‘for php|arch readers.’);
$pagecount = $pdf->setSourceFile(‘pdfs/article_110.pdf’);
for ($i = 1; $i <= $pagecount; $i++) {
$pdf->AddPage();
$tplidx = $pdf->ImportPage($i);
$pdf->useTemplate($tplidx);
}
$pdf->Output();
?>
FPDI in Detail
the coordinate system isn’t. FPDI does not currently care
about the rotation, and will import such a page as it is:
rotated. This means that it will be shown rotated in the
resulting document, whereas it is displayed correctly in
the original document. For now, you can use the rotation
script at http://www.fpdf.org/en/script/script2.php to correct
this behaviour, but FPDI will automatically fix this for
you in the next release.
Another problem that I already described was the
copying of unused resources. Maybe, in the future FPDI
will remove the unneeded resources automatically, too.
As you can see, there are several things on my to-do
list, but I want to take the opportunity to write a little
about the most asked question I received after releasing
FPDI: “Can I replace placeholders in an existing PDF with
new text with FPDI?” No, you can’t—not with FPDI,
nor any other program, without preparing the original
documents. A PDF cannot be compared with a file in a
structural language like HTML, even though a PDF can be
a simple text file without any binary data.
There is a way that will work with very raw PDF files,
but it cannot be generalized. The requisites for such files
are a decoded content stream of each object that will
output any text string. The text string has to be plain text
(not encoded), and the font that is used has to be: a)
one of the 14 standard fonts, or b) completely embedded
in the original document. Now, these requirements aren’t
too strict, but a PDF can be created in various ways, and
you usually don’t have much of a say in how a particular
PDF should be build.
For example, the text string can be split into various
small pieces, because the program that created the PDF
used kerning pairs for layout purposes. These individual
pieces or even the whole text string at all can be written
as HEX-encoded strings. Generally, only a subset of the
font is embedded (only the characters that are actually
used in the document are included). In this case, even
the full version of Acrobat itself cannot change text
strings in the document. The only program I know of
that will produce PDFs which are suitable is FPDF—but it
will not make sense to build your templates in FPDF and
replace strings in it afterwards.
This intention is a dream and it looks like it will
remain so, forever. Don’t waste your time on finding a
solution for this. If it was technically possible, someone
would have already implemented the solution. 
JAN SLABON is author of FPDI and lives in Helmstedt, germany. He
has put his mainskills on development of individual PHP solutions
for endcustomers or other webdevelopment companies over the whole
world. You can contact him at jan.slabon@setasign.de
Available Right At Your Desk
All our classes take place
entirely through the Internet
and feature a real, live instructor
that interacts with each student
through voice or real-time
messaging.
What You Get
Your Own Web Sandbox
Our No-hassle Refund Policy
Smaller Classes = Better Learning
Curriculum
The training program closely
follows the certification guide—
as it was built by some of its
very same authors.
Sign-up and Save!
For a limited time, you can
get over $300 US in savings
just by signing up for our
training program!
New classes start every three weeks!
http://www.phparch.com/cert
Volume 5 Issue 1 • php|architect • 37
FEATURE
i18n
Internationalize
Your Web
Application with
Less PHP Code
If you are looking to internationalize a web application, then you
should try this simple technique which uses less PHP code, and
consists mainly of easy to maintain HTML.
by Carl McDa de
M
aking a web application support multiple
languages can be a large job. It is a job
that many do not like, and one that a lot
of open source projects have avoided until
now. It seems everyone is jumping on the
multi-lingual train and using all sorts of PHP gadgetry
to make it happen. Check a few open source projects
to-do lists and you will likely find something to do with
Internationalization listed.
In this article, I will show you one of the easier
methods of internationalizing your code using very little
PHP and ordinary HTML files. Using this method is fast,
easy to maintain and is as cross-platform as you can get.
Before we get started on that, though, we need to go
over some points that will make it easier to understand
why globalization is necessary.
Globalization explained
Globalization, abbreviated by the little used g11n, is
the area where the application of business practices and
processes to take a business or a software product to a
global market. If you want to know why globalization
is important then you only have to take a look at the
following statistics.
As you can see in Figure 1, the internet is outgrowing
its American roots and the default language is not
necessarily English. Language is only part of the picture.
You have to take into account that none of the countries
that make up a great percentage of internet users use the
38 • php|architect • Volume 5 Issue 1
PHP: 4.3+
OTHER SOFTWARE: Macromedia Dreamweaver 2004 MX
CODE DIRECTORY: i18n
TO DISCUSS THIS ARTICLE VISIT:
http://forum.phparch.com/282
same currency and possibly not the same date format.
If your software is going to grow with the growing
internet market then globalization is the key to it being
successful. Now that you are excited about the prospect
of people all over the world using your program, let’s
take a look at the steps involved with making it useful
on a global scale.
Internationalization Explained
There are several reasons why the i18n process of
programming should be done at the beginning of the
development cycle. Doing so significant decreases the
amount of necessary code, and it removes the need to
extend the product or make compromises later on in
development. In many cases, a little forethought will
make sure that the developer does not have to rewrite
all of the code. Instead, he will simply need to write a
few files to make the existing software adaptable to a
different market.
i18n
When there is less code to write, fewer programmers
will be needed to work on internationalization.
Good internationalization support means that your
programming resources can be used to improve the
software in other areas; the size of the end user market
increases, the software becomes more globally popular
because it is usable by a more diverse customer base.
Using simple text, an end user can easily localize the
product to a specific region.
Internationalization is abbreviated i18n because
there are 18 letters in between the “i” and “n”
in the word. Internationalization is the process
of designing software or a web application
to handle different linguistic and cultural
conventions without rewriting the codebase.
Internationalization is only important if you
are going to be distributing your software or
web application. If you are not doing so,
or you are borrowing code from somewhere,
then localization might be more important.
Localization Explained
Localization (also known as L10n) is the process of
adapting your software to the requirements of a target
locale. A locale is another word for the countries and
languages of a particular region. In software development
a locale is mostly used in its abbreviated form. Examples
of abbreviations used in software are en_US which stands
for “United States English” and en_uk which stands for
“European English.” Making sure the locale can be easily
changed is the most important part of internationalizing
software.
When you build or change an application so that it
can be localized to multiple languages and countries,
this process is called internationalization. Remember,
a web application can be localized without being
internationalized. You just have to translate all of the
interfaces and content into the language of choice.
There are two phases to the localization process of
a web application. The first part is the translation of
the user interface—the part that controls the events and
presentation of the resources. The second phase is the
translation of the text, media files or documents—the so
called content being delivered by the presentation layer
of the application. I will be talking mostly about the first
phase of the process, and will touch on the second when
necessary.
Internationalization of a program includes a few
tasks that should be planned out ahead of time. If
careful attention is paid to these items at the beginning
of development then there is less to debug later on.
Encodings & Code Pages
When building web enabled applications, you need to
encode the page using either UTF-8 or UTF-16, and with
send it with the appropriate HTML headers. It is very
important to have some test content on hand, and to
test the HTML page in the web browsers of choice, to
make sure that they react to the headers and encoding.
The localized text should appear properly with very little
(or no) user configuration. The single most important
element in internationalizing a web application is the
page content-type.
<meta http-equiv=”Content-Type” content=”text/html; charset=utf-8”>
Plural Text
The plural format of text is the nemesis of a software
developer. Plurals, added to gender characteristics and
social hierarchy of a language, all add up to a real
challenge. The best thing to do would be to minimize
the usage of text and to design flows so that the same
phrase or text can be used multiple times.
• There are 0 Comments
• There is 1 Comment
• There are 3 posts
FIGURE 1
WORLD INTERNET USAGE AND POPULATION STATISTICS
WORLD REGIONS
POPULATION
(2005 EST.)
POPULATION
% OF WORLD
INTERNET USAGE,
LATEST DATA
% POPULATION
896,721,874
14.0 %
23,917,500
2.7 %
AFRICA
ASIA
3,622,994,130
56.4 %
332,590,713
9.2 %
EUROPE
804,574,696
12.5 %
285,408,118
35.5 %
MIDDLE EAST
187,258,006
2.9 %
16,163,500
8.6 %
NORTH AMERICA
328,387,059
5.1 %
224,103,811
68.2 %
LATIN AMERICA/CARIBBEAN
546,723,509
8.5 %
72,953,597
13.3 %
OCEANIA / AUSTRALIA
33,443,448
0.5 %
17,690,762
52.9 %
WORLD TOTAL
6,420,102,722
100.0 %
972,828,001
15.2 %
NOTES: (1) Internet Usage and World Population Statistics were updated on November 21, 2005
USAGE
% OF
WORLD
2.5 %
34.2 %
29.3 %
1.7 %
23.0 %
7.5 %
1.8 %
100.0 %
USAGE
GROWTH
2000-2005
429.8 %
191.0 %
171.6 %
392.1 %
107.3 %
303.8 %
132.2 %
169.5 %
Volume 5 Issue 1 • php|architect • 39
i18n
Making plurals like this can be avoided using different
wording which make localization simpler by removing
grammatical differences.
• New Comments 0
• Comments to date 1
• Number of posts 3
Dates
Usually, this is where a coder has to show some talent
for business logic, or get help from a group. The
internationalization of dates is an area where software
companies start guarding their secrets. The date format
problem can be compounded by the location of the client
using the software and the location of the webserver that
the software is being run on.
Database Encoding
Database encoding and unicode support are musts. A
coder can never tell when the database is going to store
or return incorrectly encoded text, seemingly at random.
The fact that MySQL, the most popular web database, now
supports unicode will make things much easier. You only
now have to make sure that unicode support is enabled
and ready to go.
Search
If you build search functionality into your application,
then how the data is stored is critical, since all searches
will likely be based on SQL statements influenced by the
language and calendar system being used. The sorting
and ordering of database information must also be
internationalized; otherwise the search data returned
may be invalid or irrelevant. Do not forget to code your
PHP to allow for Unicode strings. It does not do any good
to go through all the trouble of preparing a Unicodeenabled database and flexible SQL statements when the
PHP code cannot insert or retrieve resources in Unicode
format.
The PHP Language
It is important to remember that PHP, unlike Java, does
not yet have native multi-byte character (or more simply
put: Unicode) support. In PHP, a character is the same
as a byte, so there are exactly 256 different possible
characters. Since a string is a series of characters, this
means there is a limitation on how a string is interpreted.
As long as the string contains a combination of the 256
characters allowed, then things are okay. But, the internet
is a very large place where some languages contain more
than 256 characters. This is not quite enough characters
to cover all those languages. Japanese, where the number
of characters is in the hundreds, is a good example.
There is, however, a way to encode and decode
40 • php|architect • Volume 5 Issue 1
strings to and from UTF-8, or Unicode, which allows a
much larger set of characters. The PHP utf8_encode()
and utf8_decode()functions allow string characters to
be stored in multiple bytes. There are also a number of
conversion routines to fix the problem of using multibyte characters When using routines like utf8_encode()
on its own, the manipulation of strings cannot be trusted
to the default single byte string handlers in PHP.
This is where the mbstring extension comes into play.
mbstring contains functions that are sensitive to multibyte encodings and allow splitting, splicing, searching
and other areas of string handling. As of this writing
the mbstring extension is not enabled in a default
installation of PHP. This means that developers and end
users that want to run software that requires mbstring_*
functions should check their PHP configuration. There
are still many shared hosting companies and server
administrators that are unaware of the importance of the
mbstring extension.
Using Open Source to Get a
Jumpstart
If you are not creating a new PHP application from
scratch, using an open source application may take
care of most of the internationalization steps involved
in building a website. The popular content management
systems all use one of the three listed techniques for
internationalization. Though using a content management
system’s i18n support may be transparent, knowing the
underlying techniques used by it can be a deciding factor
in choosing a pre-made application as a base for your
own projects. Knowledge of what is used in a CMS to
internationalize will also influence your choice of shared
hosting or what should be installed on your own server
to support the software.
Internationalization Techniques
There are very few techniques used in internationalizing
a PHP web application. Listed here are the three most
popular:
• Text definition files written in PHP, using constants
• Using PHP gettext to extract and do string
substitution
• Using a database to store and retrieve translated
text
The above techniques all have their place and are
useful. They also have many things in common. They are
not simple enough for lazy web developers, like me.
The storage method for the localized text or resources
is not always readily accessible. How the resources
are stored determines if they are difficult to read and
manipulate. Two of the techniques in the list do not
i18n
allow for easy visual formatting of the HTML code within
resources while they are being translated. Being able to
see the visual formatting is important, as it influences
the words and choices made when translating text for a
web application. Frequently, when doing a translation,
it is necessary to see the wording in context with a list,
line break, paragraph or the direction the text is read.
Let’s take a look at each one of these techniques
so that we get a baseline for comparison to the new
technique I will be showing later in this article. I will
also use some of the more popular open source software
as reference examples of the techniques. The reason
that I go through these alternate methods is that I feel
you have to be familiar with the other more difficult
techniques in order to see how easy it can be.
Text Definition Files
This is my personal favorite because of its simplicity
and the fact that it works in the widest range of server
environments. There is usually no need to do any
pre-investigation of the server or shared host before
installing software using this method. This is probably
the most popular technique used. The reason for this is
the reliability and ease of implementation.
Distribution and sharing of both the original text
and the translated resources is easy and fast. Some of
the more popular open source content management
systems that use this technique are Xoops, Joomla and
PHPnuke.
Disadvantages of
Text Definition Files
Duplication of defined variables can easily occur, and
these files can be hard to read, at times. Like gettext(),
this technique does not allow for easy formatting of
HTML code. Using a visual editor to edit, copy and paste
helps with this, but there is still room for improvement,
as I will explain later.
This technique also exposes the translator to the PHP
source code and the temptation to “fix” things as they
translate. There is a duplicate constant, do they delete
it or change the name? It might seem a minor thing,
but what if the constant contains an entire page of help
text that suddenly does not show? When the application
is updated and additional strings are added, there is no
way to determine which new strings were added and if
they are present in every language. What happens if a
newly added string is not yet translated into a specific
language? You have to write a script that checks for the
instance and location of a variable.
Text definition files suffer from a lack of readability if
not formatted properly. Formatting is critical as there are
no readers or other tools to help with the maintenance
of the files. The use of double or single quotes becomes
a factor. Choosing one or the other means that some of
the text will have to be escaped to prevent PHP parsing
errors. So, while this method is very simple in itself,
it does require a bit more code to implement properly.
Typically, a file will contain text as shown here.
Define(‘_ERROR_1’, ‘You cannot use double quotes (\” \”) ’
. ‘in the text you are sending.’);
Define(“_ERROR_1”, “You cannot use single quotes (\’ \’) ”
. “in the text you are sending.”);
Choices about the type of variable to be used need to be
made, when writing a definition file. The PHP define()
function has advantages of being slightly more readable,
the use of array elements has the plus of performance
and the ability to use the array index to create groups
to increase the amount of text that is reusable over the
entire application.
$language[‘the_index’] = ‘This is some translatable text’;
A bit of advice: leave grammatical logic to the translator.
Creating or finding a localization scheme that properly
covers plurals is a difficult task, and many times, the
coder comes to a point where they will try to use PHP to
create some translation logic. Plurals can turn an elegant
and simple solution in to a coding nightmare. This usually
happens when the coder decides to introduce grammar
and plurals to the application to make it “easier” to
translate. Take a look at the following code.
<?php
$messages = array(
‘en_US’ => array(
‘I am X years and Y months old.’ =>
‘I am %d years and %d months old.’),
‘es_US’ => array(
‘I am X years and Y months old.’=>
‘Tengo %2$d meses y %1$d años.’)
);
?>
This was a simple array of strings before the coder
decided to allow for word plurals and grammar. By doing
this, the translator is forced to know PHP. The legibility
of the text and the context become lost in the code.
When doing this, the coder may also introduce errors
in to the text. The coder should save their energy for
internationalization of business logic, date formats
and try to keep program logic separated from language
specific terms. Text definition files are not really meant
to deal with complicated language structure.
In situations like this, the better option is to allow
for variances in text by using multi-dimensional arrays to
group plurals
$language[‘the_index’][0] =
$language[‘the_index’][1] =
. ‘in the standard plural
$language[‘the_index’][2] =
. ‘in the gender specific
‘This is some translatable text’;
‘This is some translatable text ’
form’;
‘This is some translatable text ‘
plural form’;
Volume 5 Issue 1 • php|architect • 41
i18n
Directory Structure
The directory structure for this type of system does not
have to be elaborate, but it should have some standard
and memorable path mapping to make coding and
troubleshooting easier. A slightly modified version of the
typical gettext() hierarchy works nicely.
Whatever the choice, it should include separate
subdirectories for each language. The reason for this
is that I have found that frequently, a specialty file or
extension may be needed in the localization of a web
application. I also recommend that the directory and
file names be similar or follow some type of naming
scheme that eases the dynamic writing of paths and SQL
statements.
/languages
/en_En
en_En.php
/sv_SV
sv_Sv.php
Setting up definition files
Below are examples of typical definition files. As you can
see, creating one of these leaves a lot of room for error on
the part of the coder. This particular code does something
which I consider to be an internationalization mistake.
They have used place holders in the strings. This is not
a developer- or translator-friendly mechanism, because
it hard-codes the context and removes any possibility
of reusing the phrase. It also makes it necessary to hunt
down the string that will be used in the place holder.
When creating translations, a non-coder may be forced to
remove or adjust what is considered to be PHP code. As
mentioned earlier, text should be as generic and simple
as possible to make this type of thing unnecessary. Doing
this is a form of string concatenation, something that
should be avoided when globalizing software.
// %s is your site name
define(‘_US_NEWPWDREQ’,’New Password Request at %s’);
define(‘_US_YOURACCOUNT’, ‘Your account at %s’);
define(‘_US_MAILPWDNG’,’mail_password: could not update ‘
. ‘user entry. Contact the Administrator’);
Some other PHP software uses this format. Take note of
the use of numbered indexing, which makes matching
the strings to their location in the program easier.
$txt[342]
$txt[343]
$txt[344]
$txt[345]
$txt[346]
=
=
=
=
=
‘Una palabra por línea’;
‘Coincidir todas las palabras’;
‘Coincidir con cualquier palabra’;
‘Coincidir como frase’;
‘Buscar -Todo- Sólo miembros’;
Advantages of Text Definition Files
Defining variables to hold text strings is the simplest
42 • php|architect • Volume 5 Issue 1
and most developer friendly method of internationalizing
a web application. It requires no special tools for creation
and maintenance. The technique does not impose a great
amount of server resources, such as hard drive space or
memory.
PHP gettext
The PHP gettext() method of localizing a web application
is a blessing for those that have finished a web application
and want to internationalize it afterwards. Many open
source PHP applications like Drupal and Gallery2 rely on
the gettext extension.
Disadvantages of gettext
There are several problems with this the use of this
function, though:
• gettext() isn’t thread-safe, so it is not advisable
in a multi-threaded environment
• gettext() relies on setlocale(), but that depends
on which languages are installed on the system,
and in this case UTF-8 is a very tricky setting to
use.
I personally dislike gettext() because once you
change the default language template you have to review
and re-compile all the secondary languages. It is very
difficult to design and program around gettext because
of this factor. The addition or modification of PHP code
pages that contain text which needs localization requires
going through a multi-step process over and over again.
This redundant process can lead to mistakes, which can
waste even more time.
In open source web applications, where things are
being changed due to security, bug fixes or regular version
upgrading, you run the risk of losing your translation in
part or entirely. There just may be no translation files for
the code that you are using, which may force you into
learning about the systems involved and trying to find a
translator on short notice.
Finally, it is difficult—if not impossible—to reuse
translated text when using gettext. The text extraction
process is on a by file and per hit basis. So, when
creating a translation, you may find yourself writing
several instances of the same text, or writing a similar
translation with only minor differences for many files.
This is costly if you are paying for a translation. “Time is
money” as they say.
In a large application, where the text is stored in a
PO file and there are similar occurrences of the same text,
it is difficult to find the text string for just that element
on the page that you are looking to change. Message
IDs are no indicator of the location of the string being
swapped via gettext(). PO files, themselves, are strange
things that require some programming knowledge and
i18n
careful usage. Although they can be altered manually in
a text editor, using a program like POEdit is the preferred
method. This is a limitation for many, because POEdit is
not a cross-platform program. POedit has no Macintosh
version, which leaves those types of users out. This is
saddening, since many Mac users are writers or in the
news media. They are the ones most likely to also be in
the need of, or provide translation services.
Computer assisted translation, CAT, is also very
difficult to setup and use with PO files. The CAT programs
that do this well are very expensive. These shortcomings
are probably the reason that Word files are the standard
file format for translators. After translation texts are
completed, a PO file must compiled into an MO file for
use by PHP
Directory Structure
gettext requires that the resource files have a specific
structure and that the information about this structure
be set into the PHP code.
/locale
/en
/LC_MESSAGES
messages.po
messages.mo
Multiple languages are set up in an identical hierarchy.
/locale
/en
/LC_MESSAGES
messages.po
messages.mo
/sv_SV
/LC_MESSAGES
messages.po
messages.mo
Setting a Locale
(and Other Requirements)
Setting a locale is requirement for gettext(). This is the
main instructions that PHP needs if it is to find resources
for translated text.
<?php
// I18N support information here
$language = ‘en’;
putenv(“LANG=$language”);
setlocale(LC_ALL, $language);
// Set the text domain as ‘messages’
$domain = ‘messages’;
bindtextdomain($domain, “/www/htdocs/site.com/locale”);
textdomain($domain);
echo gettext(“A string to be translated would go here”);
?>
Designating and Extracting Strings
The PHP code needs to be set up to accommodate the
extraction of strings and so that PHP can find the strings
that are to be translated. This is done by using the
gettext() function on strings:
<?php $var = gexttext(‘ This is a translatable string ’);?>
The text string in the above code can be extracted and
set into a po file using a command line function that will
hunt for instances of gettext() and set the strings into
indexed messages for each occurrence:
$ xgettext -n *.php
After extraction, the po file to be translated should look
like Listing 1.
Creating the MO Files
In any case, either you or the volunteers will translate
the po file and then you will need to convert the file into
a binary file that gettext actually understands. For that,
you would use the following command:
$ msgfmt messages.po
The line above will create a messages.mo file, which you
should save in the appropriate directory.
locale/<LANG_CODE>/LC_MESSAGES/ ng strings y.
Plurals and ngettext()
Plural form is the toughest part of text translation,
especially if you have lots of text where plurals are
needed. In this case, you will need ngettext() and not
the simpler gettext().
<?php
$n = 3;
printf(ngettext(“%d comment”, “%d comments”, $n), $n);
?>
Advantages of gettext
The gettext method of internationalization is not as
popular as the other two methods. The reason for this is
that it poses a heavy burden on the developers and the
end users.
In most OSS projects, the developers are responsible
for providing the original translation files. After this is
done via extraction scripts, the files need to be once again
translated and possibly merged to previous translation
versions by the translator. The translator can be the end
user, a volunteer, or even another development team
member. The bottom line is that gettext requires a lot of
resources to maintain and support.
In a large project with lots of volunteers, or a medium
sized company, this is not really a hindrance. But, for the
lone developer or small group the burden is large. There
is also the factor that gettext does not mean that the
developer escapes the job of hunting down text strings
and formatting them to use the gettext() function in the
Volume 5 Issue 1 • php|architect • 43
i18n
same way that you would have to do if definition files
were used.
The best thing about the gettext method of
internationalization is that the developer does not
have to think up unique names for variables. In a large
application this can be a tremendous advantage over
other techniques.
Database Storage
At first look, working with the database method of
storing translated text seems like a joy. I admit I had
fun using the Mambelfish component for the Mambo
CMS when doing a translation of a website. A database
gives what the other techniques seem to lack: order.
Relational database systems were built to give power to
how information is related, and use these relationships
to organize the information into an easily accessible
source.
are not part of innovation. Even if I were not so lazy,
there are no repositories of MySQL translation tables for
Mambelfish which is used in Joomla or any other open
source CMS project. Asking for exports from someone
else’s database on the Mambo and Joomla CMS forums
proved to be less than successful. If a repository for
database tables did exist, there is also the problem of
not being able to browse the translation beforehand to
check its quality.
There is always a bit of uncertainty associated with
storing information in a database which is why backups
are so important. When you start moving information
from one database or database server to another, things
can rapidly start to fail or acquire bugs. In my experience,
you just never know if the encoding is going to be correct
after the move. Even when the server configurations are
identical, there may be some things that just do not
work.
Internationalization of a program
includes a few tasks that
should be planned out ahead of time.
Disadvantages of Using a Database
When internationalizing a web application, distribution of
the resources to be translated is very important. Getting
the work to the translators is necessary, and there must
be a system in place for getting the finished translations
to the end users of the product.
So far, I have not found one commercial or open
source product that offers localization resources in the
form of SQL scripts or native database files. As a result,
translations are done repeatedly by each end user of
that product. Frequently, internationalization using a
database is mixed with the other techniques to make up
for this shortcoming.
I first came across this problem when I found that
I wanted to reuse my translation for several different
website installations, or borrow one for a language I did
not know. Even though the exportation and importation
of database tables was not difficult, I found the need
maintain an archive of translations because I am the lazy
type of coder. I don’t like doing repetitive tasks that
44 • php|architect • Volume 5 Issue 1
You just never know until you determine which part
of the chain is responsible for an incorrect encoding bug.
Was it the PHP code, the HTML, or the data source? You
are just very happy if everything works. I use a lot of web
hosting located in the United States, but frequently, my
clients are in Sweden or another European country. There
have been times where the web host has not installed
a UTF-8 character set. The Swedish alphabet only has
three characters more than its English counterpart, so
fixing any problems was easy. But I do not envy any web
developer that has to solve this with any of the Cyrillic
alphabet languages. This technique of using a database
as a resource for translation strings works well when it
works.
Using computer assisted translation tools is obviously
difficult if not impossible with the database method. You
are reduced to using cut and paste operations within a
web based interface or a database front end program like
MySQL administrator or Microsoft enterprise manager.
Caution must be taken when doing this as inputting text
this way may work fine and produce the right results at
i18n
first glance, but when the actual web application is used
to retrieve the text the encoding maybe different from
what you expect.
MySQL 4.1
MySQL 3.x or MySQL 4.0.x do not have unicode support.
The default character encoding is called latin1 and is
single-byte, may not seem like much of a problem at first
glance because while the database itself is not aware of
the actual encoding, using a varchar field type, it still
manages to output the strings in much the same way that
they were previously put in the database. But in some
cases, you may see incorrect characters when directly
accessing the database with code that does not take this
into account. Searching or ordering will sometimes not
work correctly.
These inconsistencies are due to the fact that even
though two, three or four bytes should actually represent
one character, MySQL interprets them as one character
per byte. I have personally had experiences with the
Swedish characters äåö being stored as varchar but being
seen differently by different versions of phpMyAdmin,
the php database administration tool, when exploring a
database with these characters.
Many people wondered why I got so excited that
MySQL was finally going to support unicode with
version 4.1. This is because with unicode support
(UTF-8), a more elegant internationalization
plan can be implemented. Different character sets
can also be set per column, table or database,
which means data from many languages can be
stored without using elaborate coding routines
to encode and decode strings. It also means
ordering, searching, indexing and similar stringrelated functions in MySQL work correctly.
Advantages of Using a Database
The greatest advantage of using a database to store the
resources for localization is the convenience. An interface
can be built to group the translation tasks in to a single
area. You don’t have to dig into the file system to find
the proper resource file that holds the text strings that
you want to translate. If done right, usually within a
few clicks you are presented with a user interface and
only have to make a few simple choices before enter the
LISTING 1
LISTING 2 (CONT’D)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR Free Software Foundation, Inc.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid “”
msgstr “”
“Project-Id-Version: PACKAGE VERSION\n”
“POT-Creation-Date: 2002-04-06 21:44-0500\n”
“PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n”
“Last-Translator: FULL NAME <EMAIL@ADDRESS>\n”
“Language-Team: LANGUAGE <LL@li.org>\n”
“MIME-Version: 1.0\n”
“Content-Type: text/plain; charset=CHARSET\n”
“Content-Transfer-Encoding: 8bit\n”
#: gettext_example.php:12
msgid “A string to be translated would go here”
msgstr “”
LISTING 2
1 <!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//
EN” “http://www.w3.org/TR/html4/loose.dtd”>
2 <html>
3 <head>
4 <meta http-equiv=”Content-Type” content=”text/html;
charset=utf-8”>
5 <title>Untitled Document</title>
6 </head>
7
8 <body>
9 <p>my own</p>
10
11 <!--my own-->
12 <p dir=”LTR”>Najib said “?????? ?????”
13 (as-salaam alaykum] to me.</p>This is the
14 help text for my own idea module possible
15 to see the line ends in Dreamweaver because
16 of the syntax editor.
17 <!--end-->
18
19 <p>my own button</p>
20
21 <!--my own button-->
22 send
23 <!--end-->
24
<p>my own idea text</p>
<!--my own idea text-->
1This is some other
text in the module this to check for paragraph
and line breaks***
<p>en styke till</p>
<ul>
<li>add some more HTML here l&auml;gga till mer
HTML text h&auml;r </li>
<li>this works nicely a&nbsp;in both design and
code mode of Dreamweaver.</li>
<li>var bra med &ouml;&auml;&aring; ocks&aring;</li>
<li>possibly want to have the headers in so that
unicode can be used in the editors. They are easy
enough to remove</li>
</ul>
<!--end-->
<p>my own translation scheme</p>
<!--my own translation scheme-->
<p>This is the translation scheme for my own idea
module which does not make room for HTML yet. but
the best thing about it is that translations can
be done in a simple HTML editor or a visual editor
like Dreamweaver.</p>
<!--end-->
<p>my own reset button</p>
<!--my own reset button-->
reset
<!--end-->
<p>my own more text</p>
<!--my own more text-->
Detta ar nagra text p&aring; svenska
<!--end-->
<p>a new button</p>
<!--a new button-->
button text
<!--end-->
</body>
</html>
Volume 5 Issue 1 • php|architect • 45
i18n
translation. This method and allows for making small
changes quickly.
The translator is kept totally separate form the
underlying PHP code. Though database resource storage
is suitable to content translation for the most part, there
are situations where it shines when used on the user
interface. Dynamic menu systems are a good example of
where this technique is a must. In the Joomla content
management system, database translation tables are
used in coordination with text definition files to make
localization easier. The database tables feed the more
dynamic presentation layer, while the definition files deal
with the administration areas, which are not changed
often (or at all).
Searching resources stored in a database results is
much more relevant information being returned because
more relevant information can be stored. Dates, titles,
categories and strings can be searched in the localized
language. This is very hard to do when translations are
stored in other formats.
But, in most situations, a dynamic web page will access
a single array element no more than twice to get the
needed texts. Calling an array into your code may require
you to set it as a global. Arrays have the benefits of being
organized and duplicates can be weeded out easily.
When using constants, you always run the risk of name
collisions. The plus side to using constants is that they
are easily written to a cache table and are not required to
be set as global, to be called within your php code.
Rather than getting into benchmarking and other
aspects of performance I will just say that you should
weigh the pluses and minuses and choose the method
that seems best for your application.
The code to process the HTML into PHP data can be
seen in Listing 3.
Editing the Text
Here it is the technique you have been waiting for. It
is simple, user friendly and editable without a using
database, special tools or exposing the translator to
PHP source code. The code is short, easily modified to
suit various needs, and PHP makes using this technique
easy.
The best thing about this technique is that any text or
HTML editor can be used. These are available on most
popular desktop operating systems. The translator is not
bound by the restrictions of a program like POedit. The
text is also seen in a familiar format.
When using Dreamweaver in code mode, editing
the translation file is easy and straight forward. After
setting up the translation file in the code view, using
Dreamweaver in design mode makes translating and
editing the text even easier. You can also see and edit
the comment tags in design mode as shown in Figure 2.
Disadvantages
Computer Assisted Translation
HTML Definition Files
Yes, there are disadvantages to using this method, but
they are the same as those when using a typical definition
file described earlier. Some of the problems in using text
definition are solved by changing the storage method
and avoiding using PHP within the resource files.
Creating Resources
First, you need to create a simply formatted HTML
document using <p> tags to show the names of the
variables to be created as separate text blocks while in a
visual HTML editor. Comment tags are used to designate
text blocks to be translated and loaded into PHP variables.
When finished and formatted, your HTML file should look
like Listing 2.
The PHP Code
Let’s look at two examples. The first uses an array to
store the translated text; the second, a set of defined
constants. Both of these methods have some minor
drawbacks.
When using an array, if the array is large with the
number of elements in the millions and it is accessed
multiple times, then a performance problem may occur.
46 • php|architect • Volume 5 Issue 1
Although much CAT software does not like HTML, this
is not really a big problem when using the technique
described here. You can easily use a WYSIWYG editor then
cut and paste the translation text into the CAT program.
Benefits
Why use the technique I’ve described here? There are
many reasons but here are a few of the strong ones.
HTML is universally used and accepted with a
very shallow learning curve. Translators, developers,
programmers, webmasters and designers can easily see
the HTML text and know what is going on, thus making
it easier to maintain a good translation and share it.
HTML pages can be checked in a web browser for proper
encoding. CSS can be used to create a more visually
pleasing text at the time of translation. In cases of
right-to-left or top-to-bottom languages the technique
can show the text in the proper read direction while
editing.
There is less PHP code, fewer server resources and
reduced maintenance to worry about. I hope that many
PHP developers will start using this simple technique in
the future, as it makes everyone’s job easier.
i18n
Both commercial and open source projects can
benefit from this type of Internationalization technique.
It goes very quickly and previous resources used in
internationalization maintenance can be used to make
improvements elsewhere in the project. The time to
get the software to market becomes shorter and more
defined.
You might think that delivering an English version
of an application is good enough, but is it really? The
software market may carry your work across national
borders—if it does, an English version is only the
beginning. 
LISTING 3
FIGURE 2
1 <head>
2 <meta http-equiv=”Content-Type” content=”text/html;
charset=utf-8”>
3 <title>Untitled Document</title>
4 </head>
5
6 <body>
7 <?php
8
9 //This is where the php code goes that translates the
10 // HTML into php variables when the file is scanned and
parsed
11 //avoid calling the regular expression engine by using
12 // other faster php functions
13 // if cached file exists then skip all steps or clear
14 // cached file and create new.
15
16 // Get the file contents based on language
17 $text = file_get_contents(“language/sv_language.html”);
18
19 //Set file content string into an Array;
20 $preVar = explode(‘<!--end-->’,$text);
21
22 foreach ($preVar as $preVar_1){
23
24
//Start seperation of array item key name and array
item value;
25
$preVar_2 = explode(‘-->’, $preVar_1);
26
27
//Seperate out the names for the array keys’;
28
$preVar_3 = explode(‘<!--’,$preVar_2[0]);
29
30
//Clean up the names to be used as array keys’;
31
$newVar = str_replace(‘ ‘,’_’,$preVar_3[1]);
32
33
// Load text into array item $_lang[“my_own_idea_text”]
the
34
// thought of using the faster variable variables
35
// should be passed over due to possible security
breaches
36
// if users are creating or helping with translations
37
$_lang[$newVar] = $preVar_2[1];
38
39
// optionally you can use define() and goes with
setting
40
// constants but be wary of name collisions
41
define(“_lang_”.$newVar, $preVar_2[1]);
42
43
}
44
45 // check for duplicate array keys and throw error if found
46 // speed things up after first use by caching the result
47 // into a php file for include
48
49 // Test print the language array elements
50 print $_lang[“my_own”].’<br>’.”\n\n”;
51 print $_lang[“my_own_translation_scheme”].’<br>’.”\n\n”;
52 print $_lang[“my_own_idea_text”].”\n\n”;
53 print $_lang[“a_new_button”].”\n\n”;
54
55 print ‘Test print constants’;
56 print _lang_my_own.’<br>’.”\n\n”;
57 print _lang_my_own_translation_scheme.’<br>’.”\n\n”;
58 print _lang_my_own_idea_text.”\n\n”;
59 print _lang_a_new_button.”\n\n”;
60 ?>
61 </body>
62 </html>
Carl McDade is a freelance web developer and programmer living in
Sweden. He is a certified Microsoft database administrator and has
been doing web development since 1997. Carl spends most of his
development time working with documentation, code and studying
PHP content management systems. You can find him at his website
http://www.hivemindz.com.
Volume 5 Issue 1 • php|architect • 47
Why is it Taking so Long?
TEST PATTERN
Why is it
Taking so Long?
How long does it take to push out a new feature? Not the man/hours
spent coding, but the actual dates. From the point of deciding to
implement something, by what date are the users actually using it?
This is called “lead time.” It’s not just an academic measure; it’s
actually the most important measurement a development team can
make, even if it’s a little simplistic.
by MARKUS BAKER
W
orking out the time interval, from customer
request to actual feature delivery, is a
very powerful measure. Often, it’s quite a
shock just how long things take to deliver.
Fortunately, it can generate action as well
as dismay, so it’s a measurement worth making. It’s not a
great diagnostic: it won’t tell you what has gone wrong,
just that something is going off of the rails. To analyze
long lead times, we need to look a little deeper.
Lead time a bit simplistic for what I want to
look at next, so I’m going to choose a variant called
“Value Stream Mapping.” This was popularized by Mary
and Tom Poppendeick in their book “Lean Software
Development.”
Here’s how it works. You plot—on a timeline—all of
the periods when a feature is being worked on, and all of
the times when it’s idle. There are some genuine reasons
for something being idle: waiting for an external resource
or user feedback are good examples. Most times, though,
it’s because something more urgent has to finish first.
I’ve plotted a real life example for a persistence
library written at my current main client, although it’s
also available publicly (Figure 1). It catalogues quite a
few mistakes on our part, but also a few lessons learned.
We’ll look at it in detail.
48 • php|architect • Volume 5 Issue 1
TO DISCUSS THIS ARTICLE VISIT:
http://forum.phparch.com/278
Rapid Progress at First
First, you need a quick bit of history. Our previous library
was a bit untidy when it came to handling transactions,
and you had to call save() on every object you wanted
to persist. Something like this...
$person = &new Person();
$person->setName(‘Marcus’);
$person->save();
That second requirement was a real pain. Often, a method
would get some dependent objects (shipping cost on an
order, for example), and change some them. The top level
script would no longer have references to these objects,
so it couldn’t persist the change. This meant lots of
save() calls were spread around the accounting code in
order to ensure that the calculations would not get lost
(yuck!).
function addShippingCost(&$order) {
$shipping = &$order->getShipping();
$shipping->setTotal(
0.15 * $order->getTotal());
$shipping->save();
}
The development team was small at that time, only two
full time enterprise developers. Luckily, we were self
managing, and had the freedom to do what we knew was
right. We could move quickly.
We decided to write a much sharper tool. Thanks to
the magic of CVS, I can pinpoint that day as Monday July
19, 2004. That was the day of the first design session,
and we progressed rapidly. By July 26th, we had the core
working, but were starting to run into interruptions. We
hadn’t tested the transaction side, but we could perform
CRUD operations on objects. Knocking out a crude
persistence layer in a week with only two developers isn’t
bad, but we were able to do it because we were doing
nothing else.
Then, my co-developer went on holiday. By August
14th, working mainly at home, I had got the whole
design up—collections, transactions and UnitOfWork.
UnitOfWork is a pattern for bulk saving—see “Patterns
of Enterprise Application Architecture” by Martin Fowler.
For us, it looked like this...
Multitasking is Evil
Although multitasking is bad for the lower priority tasks,
it’s actually bad for the high priority ones too. Suppose
you have two tasks of one week. If you release the work
to the team sequentially, then it takes two weeks, and
one of those tasks is done after the first week and can
FIGURE 1
$ship_it = &new Change();
$order = $ship_it->create(‘Order’);
...
addShippingCost($order);
$ship_it->commit();
The call to addShippingCost() will now work without
the save() call, making that function much cleaner. The
Change object—the UnitOfWork—knows it has to save
persistent objects and their dependents. It treats the
entire business transaction as a whole. Much nicer.
The new code was not yet battle-tested, but it was
definitely up and running. All we needed to do was add
cascading deletes and back-up and restore capabilities,
and it was ready to go into our system.
The rot was starting to set in. A direct competitor to
the company had appeared and this altered some of the
priorities towards front end stuff. Persistence refactoring
had to wait for a bit in favour of more urgent tasks.
It wasn’t until a month later that cascading deletes
were fixed. The only other changes until December were
either minor, or forced externally by other code changes.
Without a nice sized time slot, it was difficult to sink the
mind back into tricky transaction problems. This meant
exclusive time had to be scheduled, but there always
seemed more pressing concerns. It seems that under
time pressure, it’s the clever stuff that gets left by the
wayside, no matter the importance.
Volume 5 Issue 1 • php|architect • 49
Why is it Taking so Long?
be leveraged. After all, it was high priority for a reason.
If you release all the work to the developers at once, and
allow for task switching cost, things will probably take
three weeks. That is, you get both tasks completed after
three weeks. Even without the extra switching cost, it
means you are denying yourself the benefit of having one
task completed after the first week.
Now there are some things that you cannot put the
whole team on—you won’t have a baby in less than
nine months no matter how many mothers-to-be are
deployed—so you have to subdivide a little. If you can
though, choose one task and do it to the exclusion of
all others. If you have already started something, then
it’s best to complete it first, rather than come back to
it later.
item is done. You may as well wait until you have less
than two items on your list, and only then even entertain
the idea of working on it. Until then, it’s just clutter.
It’s even dangerous clutter, because if you do get
those two days free, you will do a day’s real work on a
task that someone has to spend a day understanding. In
other words, you have made absolutely no progress in the
best case. More likely, is that conditions have changed
since then and some of the code is wrong and now has
to be fixed. The contribution of an odd couple of days
is almost certainly negative. You’d do better to just tidy
your desk, or better yet, to spend your time clearing the
bottleneck in your main task.
Developer stacks are a bad thing. Let the project
manager build a simple queue of work. The project
Multitasking is bad for both lower priority
tasks, and high priority ones too.
The Public Release
December 2004 saw a short flurry of activity prior
to a public release. We often release code into
the wild that is not business-critical, and that we
would like to see tested more widely. Besides, we
liked the code. It even appeared on Sitepoint at
http://www.sitepoint.com/forums/showthread.php?t=214183.
Some minor changes resulted from this public appearance,
and usage in the wild began. Our plan was progressing,
albeit a bit slowly.
Then we moved offices. We also hired more staff
and reorganized our schedule. The persistence layer was
declared our third most important thing. That sounds
good, but that’s actually a long way down the stack. All
through 2005, the library has languished.
Forget Your Stack
Third on the stack is actually pretty useless. It means
that at least two other things must be completely and
utterly stuck before it gets a time slot. You must also
have a time slot that allows you to do serious work on it,
which means two days at least if you allow a day to pick
up the mental pieces.
This is such an important experimental result that it’s
worth repeating. If you have three items in your work
queue, the third item will never progress until another
50 • php|architect • Volume 5 Issue 1
manager then delivers these parcels to the development
team, not individuals, just below the rate that will
produce stacking. If you think of the developers as a
road, you want a steady flow of traffic, not gridlock.
Constant Drain
Some work was forced upon us. The library had to
have some minor edits when PHP 4.4 spewed reference
warnings, costing another day. It also got accidentally
deleted for a while in July, when someone was clearing
out unused code. That was an hour spent removing it,
and several putting it back in. That’s not progress; that’s
more cost. Meanwhile this unused library contributes
nothing to our bottom line.
What’s really annoying is the maintenance of the very
system that this new code is due to replace. We have
to apply fixes to our old persistence layer too. We also
want to make changes to other parts of the application—
changes that would be a lot easier if we had the new
code in place, as it was designed to facilitate just these
alterations. We know we have a better system waiting
in the wings, so maintaining the old code is doubly
depressing.
The good news? Persistence is the next task queued.
Its possible inclusion in the next iteration has been
discussed in every iteration meeting for the past two
months. We really, really want this work finished now.
Why is it Taking so Long?
Hopefully this story will soon have a happy ending.
a process of continual improvement.
The Theory of Constraints
Light at the End of the Tunnel?
This is a one sentence theory that has great implications.
It says you cannot change one of throughput, inventory
or cost without affecting the others. What does that
mean?
Throughput is the amount of money software makes
once delivered and the rate you deliver it. Think of it as
delivered value in each iteration. You subtract fixed costs,
but not daily labour costs. For software, the fixed costs
are usually zero—there are no materials to consume.
Inventory is the cost of all of the stuff you have lying
around that is incomplete or unused. Note that you count
this as money spent, not an asset. It’s only an asset once
you have delivered and it’s contributing to throughput.
Inventory costs money even when it’s just lying around,
as it has to be read occasionally, and maintained. Even
if neither of these things happen, it’s more files to scroll
past every day you look at the code base and it’s more
cognitive load on everybody just knowing that it’s there.
Our persistence library is currently inventory.
Cost is daily costs such as staffing, training, software
licenses and equipment. Costs are investments to improve
throughput.
This can be a big change of emphasis for some
companies, because the focus is on making money, not
saving money. For example, you often end up multitasking
so as to raise your efficiency by reducing your idle time,
but this is misguided. By creating inventory you will slow
down the rate things get done overall, which hurts the
company. Counter intuitively it can be best to do nothing.
Throughput is king in this model, not efficiency.
It’s the dominance of throughput that gives this
theory its name. If you look at a manufacturing process,
you will find that throughput is limited by one element
in the workflow. Perhaps it’s some machine that has a
maximum rate it can work. In a software shop, it’s likely
to be a programmer with a key area of knowledge. This
limit is called the constraint, and if other parts of the
workflow go faster than the constraint, you just pile up
inventory. A chain is only as strong as its weakest link.
You can attack constraints by increasing cost. You
could train another developer in a vital skillset, or use
pair programming to spread around knowledge of the
system or simply hire more developers in key areas. All of
these raise costs, but to the greater good of making more
money. You also need a minimal amount of inventory
in front of bottlenecks so that they don’t get starved
of work, whereas non-bottlenecks can always catch
up, so their inventory is just clutter. As you attack one
bottleneck, another will appear. Removing constraints is
Right now, the constraint is me. I’m the one that
knows the persistence layer, having written most of it.
Unfortunately, I am still finishing off a previous task
and will be heavily involved in training new staff. That’s
not good, and it leads to some stark choices. Likely, the
persistence layer gets placed on hold yet again and our
story lasts a little longer. Alternately, I am protected
from staff training for a few weeks. This will delay the
positive contribution of the new team members and may
be a greater evil, but it only has to happen long enough
for another developer to take over the persistence code.
If we were to do all of this again, we would, of course,
not start the persistence layer at all unless we were
certain of finishing it. Once started, we would either
finish it regardless, or delete the parts already done
and wait until we really had enough time. Deleting half
finished work sounds brutal, but I am more convinced
than ever that it would have been the right thing to do.
We now understand scheduling problems a lot better
and can finally make rational decisions. I didn’t even
mention the option of stopping my current task, for
instance, as it’s now unthinkable. We can just work out
the real throughput costs and make our choice by the
numbers.
I think we will make the right decision.
Things to do While Waiting
While you are waiting for your bottleneck to clear, you
could do worse than read these resources...
The Goal by Eliyahu Goldratt: it’s actually a novel and
I recommended it last month. It explains the theory of
constraints.
http://www.poppendieck.com has many papers on
applying lean manufacturing to software.
Out of Crisis by W. Edwards Deming: it was he who
started it all when he visited Japan in 1950. His 14
principles of management seem as modern as ever, such
as “foster all chances for pride of workmanship and
sharing in the improvement process.” He had vision. 
MARCUS BAKER works at Wordtracker as a Technical Consultant, where
his responsibilities include the development of applications for mining
Internet search engine data (www.wordtracker.com). Based in London,
he is a regular contributor to Sitepoint forums (www.sitepoint.com).
His previous work includes telephony and robotics. Marcus is the lead
developer of the SimpleTest project, which is available on Sourceforge.
He’s also a big fan of eXtreme programming, which he has been
practising for about two years.
Volume 5 Issue 1 • php|architect • 51
"Zend Studio is far and above
the best IDE on the market for
PHP / LAMP development."
- Rich Morrow,
Senior Software Engineer, Lockheed Martin
Email Injection
SECURITY CORN E R
Email Injection
This edition marks a milestone: two full years of Security Corner.
Thanks to everyone at php|architect for the opportunity to write
about PHP security and especially to my loyal readers. I hope this
column continues to help the PHP community develop more secure
applications. Thanks for reading!
by CHRIS SHIFLETT
I
must admit that when I first heard about email
injection a few years ago, I wasn’t very impressed.
After all, it’s just another case of developers
making the mistake of blindly trusting user input.
If you let users manipulate the arguments passed
to the mail() function, they can send email from your
server. No big surprise there.
There are an alarming number of email injection
vulnerabilities in PHP applications, and this has prompted
me to focus on email injection in this month’s column.
The popularity of this type of vulnerability has become
a beloved treasure trove for spammers around the world,
but why is it so common?
I think the root cause of email injection’s popularity
is that developers don’t understand the attack or the
necessity of filtering input prior to passing it to the
mail() function. It is as if there is an assumption being
made about how PHP sends email.
Sending Email
Most veteran PHP developers know all about the mail()
function, and they realize that it provides a rather raw
interface for sending email. Like many PHP functions, it
is very flexible—mail() provides enough functionality to
send almost any type of email you can imagine, provided
you know the proper format.
As with most security vulnerabilities, it isn’t the
experts who are making the most mistakes. Using mail()
is very simple, and the possibilities aren’t immediately
TO DISCUSS THIS ARTICLE VISIT:
http://forum.phparch.com/276
obvious to a novice developer. For example, the following
demonstrates a basic use:
<?php
mail(‘to@example.org’, ‘My Subject’, ‘My Message’);
?>
A simple email injection vulnerability is to let a user
provide the first argument:
<?php
mail($_POST[‘email’], ‘My Subject’, ‘My Message’);
?>
This is similar to any other injection vulnerability, but
the context is different. If you try this yourself and
provide your own email address (you@example.org), you’ll
receive an email similar to the following (some headers
removed for clarity):
To: you@example.org
Subject: My Subject
From: nobody@localhost
My Message
The value of the To header is provided by the user, so
someone wanting to exploit this situation might try to
Volume 5 Issue 1 • php|architect • 53
Email Injection
send spam from your server by simply providing a list of
addresses:
chris@example.org, rasmus@example.org, andi@example.org
You can mimic this situation with a simple test:
<?php
mail(‘chris@example.org,
rasmus@example.org,
andi@example.org’,
‘My Subject’,
‘My Message’);
?>
Each of these addresses will receive the email, because
the To header can handle multiple addresses in this way.
Many developers mistakenly assume that this
problem represents the extent of the concern, so it’s not
a concern if the recipient is static. This isn’t true. The
larger problem is when the fourth argument to the mail()
function is provided in part by the user.
Injecting Headers
A common use of the mail() function is to provide a
contact or feedback form. An example of such a form is
as follows:
<form action=”sendmail.php” method=”POST”>
<p>Your Email:<br />
<input type=”text” name=”email” /></p>
<p>Your Subject:<br />
<input type=”text” name=”subject” /></p>
<p>Your Message:<br />
<textarea name=”message”></textarea></p>
<p><input type=”submit” value=”Send Message” /></p>
</form>
The trouble with using the mail() function as
demonstrated earlier, is that the email appears to be sent
from your web server—the From header is generated by
PHP, automatically. If you want to let customers send
you email through a page on your web site, you need to
be able to set this, so that the email appears to be from
them. This is where using the mail() function’s fourth
argument helps:
<?php
mail(‘to@example.org’,
‘My Subject’,
‘My Message’,
‘From: from@example.org’);
?>
By letting you specify additional headers, PHP gives you
the flexibility you need to specify the From header. This
is great, but it is also where the real danger of email
injection lurks. Consider the following example for the
sendmail.php script referenced in the previous contact
form:
54 • php|architect • Volume 5 Issue 1
<?php
mail(‘contact@example.org’,
$_POST[‘subject’],
$_POST[‘message’],
“From: {$_POST[‘email’]}”);
?>
If you test this yourself, you’ll see that it works as
expected. You receive an email at your contact address
(contact@example.org) just as if the user had emailed you
directly. Unfortunately, users now have almost absolute
control over the email. The most common tactic used by
spammers is to provide the spam message in the contact
form and attempt to provide additional headers in the
email field. For example, they can provide an email such
as the following to send the message to an additional
recipient:
from@example.org
To: victim@example.org
The trouble with this approach, from a spammer’s
perspective, is that you’re more likely to notice the
vulnerability. The contact@example.org address will
receive an email similar to the following:
To: you@example.org
Subject: My Subject
From: from@example.org
To: victim@example.org
My Message
Although the email is sent to both you@example.org and
victim@ example.org as desired, the exploit is quite
obvious. Spammers don’t succeed by exploiting a script
once—they want to find a vulnerable script and exploit
it for a long time. This makes the Bcc header a favorite
injection:
from@example.org
Bcc: victim@example.org
As many Bcc headers as desired can be provided, and the
resulting email will be much less conspicuous:
To: you@example.org
Subject: My Subject
From: from@example.org
My Message
Because the Bcc header is not present in the message,
you’re more likely to think that a spammer has simply
tried to spam you personally using your online form.
After all, you’re allowing anyone to send you a message,
and this doesn’t appear to break the rules in any way.
However, an unknown number of other people might
have received the same spam message, and your script’s
URL will become a favorite spammer destination.
Email Injection
Exploiting Vulnerable Scripts
In order to test your own scripts, you want to be able to
provide more than one line for the email, as demonstrated
in a previous example:
From@example.org
Bcc: victim@example.org
There are a few ways to accomplish this. The simplest
is to type it out in a text editor, copy it, and paste it
into the form. Of course, spammers opt for something
a bit more automatic. The following PHP script
exploits a contact form hypothetically located at
http://example.org/sendmail.php:
<?php
$fp = fsockopen(‘example.org’, 80);
fputs($fp, “POST /sendmail.php HTTP/1.1\r\n”);
fputs($fp, “Host: example.org\r\n”);
fputs($fp, “Content-Type: application/”
. “x-www-form-urlencoded\r\n”);
fputs($fp, “Content-Length: 95\r\n\r\n”);
fputs($fp, ‘email=from%40example.org’
. ‘%0D%0ABcc%3A+victim%40example.org’
. ‘&subject=My+Subject&message=My+Message’);
fclose($fp);
?>
The format of POST data is exactly the same as the
format of GET data, and the URL-encoded CRLFs appear
as %0D%0A.
Note: Although I do not wish to detract from the
focus of the article, keep in mind that more sophisticated
attacks can be used to send HTML email, attachments,
and the like. An attacker, given complete control over
the arguments to the mail() function, can do anything
PHP is capable of.
<?php
$clean = array();
$email_pattern = ‘/^[^@\s<&>]+@([-a-z0-9]+\.)+[a-z]{2,}$/i’;
if (preg_match($email_pattern, $_POST[‘email’]))
{
$clean[‘email’] = $_POST[‘email’];
}
?>
A good Defense in Depth approach is to inspect the data
specifically for newlines and carriage returns, and the
ctype_print() function can help:
<?php
if (ctype_print($clean[‘email’]))
{
/* The email contains no newlines or carriage returns. */
}
?>
This technique can save the day in the event that your
filtering logic has a flaw.
Until Next Time...
I hope this article helps you appreciate the need to consider
security in every aspect of your PHP development, even
those simple contact and feedback forms. By inspecting
input to be sure that it’s the format and size that you
expect, you can prevent many types of vulnerabilities,
email injection included.
Defense in Depth measures such as checking for
carriage returns and newlines are very useful, but try to
resist the urge to rely on these techniques as primary
safeguards.
Until next month, be safe. 
Preventing Email Injection
As I hope is already clear, preventing email injection is
a simple matter of filtering input. In this case, filtering
with a whitelist approach isn’t easy. You can probably
restrict the subject to a whitelist of valid characters, but
you might need to be more lenient in the message. Email
addresses have proven difficult to filter for a number of
reasons, including the fact that the specification isn’t
very restrictive. (Did you know an email address can have
comments in it?)
My advice is to do the best you can and consider some
Defense in Depth approaches to strengthen your filtering.
There are numerous regular expressions that can help
you filter an email address, and even the more lenient
examples prevent common email injection attacks:
CHRIS SHIFLETT is an internationally recognized expert in the field
of PHP security and the founder and President of Brain Bulb, a PHP
consultancy that offers a variety of services to clients around the world.
Chris is a leader in the PHP community, and his involvement includes
being the founder of the PHP Security Consortium, the founder of
PHPCommunity.org, a member of the Zend PHP Advisory Board, and
an author of the Zend PHP Certification. A prolific writer, Chris has
regular columns in both PHP Magazine and php|architect. He is also
the author of HTTP Developer’s Handbook (Sams) as well as the highly
acclaimed Essential PHP Security (O’Reilly). You can contact him at
shiflett@php.net or visit his web site and blog at http://shiflett.org/.
Volume 5 Issue 1 • php|architect • 55
Output Buffering
TIPS & TRIC KS
Output Buffering
Output is generally sent from calls to echo or print, or from outside
PHP code blocks, and once it’s sent, it’s gone. However, using PHP’s
output buffering functionality, it is possible to capture this output
and further manipulate it before sending to the client. In this
month’s Tips & Tricks, I’ll show you why and how to control output
with output buffering.
by BEN R A MSEY
P
ortable Document Markup Language (PDML) is
a language used for creating PDF documents.
What’s best: it’s implemented entirely in PHP,
and it’s extremely simple to use. All a user must
do is create a document with markup similar to
HTML, include one line of PHP at the top of the file, and
then the file will magically render a PDF document when
called from a Web browser.
PDML is a remarkably lightweight package. It only
requires that the user create a PDF using a simple markup
language. After glancing at PDML, other PDF-creation
packages written in PHP seem to introduce needless
complexity to the process of creating a PDF on the fly.
For example, Listing 1 shows a very simple “Hello, World”
document using PDML. So, what makes it work?
The magic behind PDML: output buffering.
What is Output Buffering?
Normally, when output is echoed or printed, it is sent
immediately to PHP’s output buffer. It cannot be retrieved
or changed once this occurs, and all document headers
must be set before echoing or printing output. This is
not the case when using output buffering.
Output buffering, put simply, is the process of
56 • php|architect • Volume 5 Issue 1
CODE DIRECTORY: output
TO DISCUSS THIS ARTICLE VISIT:
http://forum.phparch.com/277
delaying the transmission of output to the client. During
this delay, the script may access or modify the contents
of the buffer before it is sent. What’s more, the script
can send the buffer all at once or in chunks, which I’ll
explain later on.
In the PDML example, the markup is never sent to the
client. Instead, PDML uses ob_start() to start buffering
output. Meanwhile, it passes a callback function to
ob_start()—the custom function ob_pdml(). Now, when
the output is flushed to the client—in this case, when the
script is finished processing—it will first pass through
ob_pdml(). What comes out is a PDF document.
I hope it is evident how this technique can be useful
for any number of applications.
Start Buffering
As mentioned, to start buffering content, one must
place a call to ob_start(). Any output echoed or printed
Output Buffering
previous to the ob_start() call will not be stored in the
internal buffer. That is, it has already been sent to the
client, even though the sending of output will actually
be delayed until the script has finished running (or the
buffer is full). All output after the ob_start() call will be
in the script’s local output buffer.
Aside from starting the output buffer, ob_start() also
accepts a callback function parameter, also mentioned
earlier. Using a custom callback function, one can use
output buffering to create one’s own markup language
(as is the case with PDML), perform customized content
rewriting before sending the output to a client (e.g.
URL rewriting, output escaping), or implement a custom
templating engine.
Sending Compressed Content
PHP comes with a built-in output buffering callback
function that can be used along with ob_start() to
send gzip-compressed data to browsers. In fact, it
Accessing the Buffer
All data stored in the buffer may be easily accessed,
provided the buffer has not yet been flushed. To get
the contents of the buffer at any given time, simply
use ob_get_contents() or ob_get_flush(). Both of
these functions return a string representing all current
output in the buffer. However, ob_get_flush() returns
the buffer string and then flushes the buffer, while
ob_get_contents() leaves the buffer unchanged.
Take, for example, the following:
<?php
ob_start();
echo ‘Hello, World!’;
$output = ob_get_contents();
ob_end_clean();
?>
This code, when run, will not output anything. Since
I have turned on output buffering with ob_start()
and cleared the buffer, turning off buffering, with
ob_end_clean(), the echo doesn’t send anything to the
Output buffering, put simply, is the
process of delaying the transmission
of output to the client.
will even detect whether the browser requests gzipped
content, and if so, how to send the data—compressed
or uncompressed.
For example, my browser (Mozilla Firefox) sends an
Accept-Encoding header with most requests, the value
of which is “gzip,deflate”. This tells the Web server
that it can compress content before sending it to the
browser, which saves on bandwidth and cuts down load
times. Placing the following at the top of a script will
force PHP to handle the compression, which can be
helpful, especially if your Web server doesn’t compress
responses:
<?php
ob_start(‘ob_gzhandler’);
?>
Now, the response will include a Content-Encoding header
with a value of “gzip”. Please note, however, that this
works only for browsers that request (and can read)
compressed content. All other browsers will receive
uncompressed content.
client. Instead, the variable $output contains the value
“Hello, World!” I simply captured the contents of the
buffer with ob_get_contents().
Had I used ob_get_flush(), the contents of the
buffer would have also been sent to the client. While
“Hello, World!” would have displayed in the client
output, the script would still have a chance to take
action on all of the data stored in $output, which, in this
case, is only “Hello, World!”
Using this technique, it is possible to control all
output from an application, running it through any
number of functions and processing routines. At the top
of the script, use ob_start(), and at the bottom, get
the contents with ob_get_contents(), clear and close
the buffer with ob_end_clean(). Now, we can modify
everything the script intended to output.
For example, regular expression matching with the
Perl-Compatible Regular Expression (PCRE) library is
often used on buffered data to replace certain content,
such as HTML or Javascript in output. For that matter,
Volume 5 Issue 1 • php|architect • 57
Output Buffering
the full content of $output may be passed through
htmlentities() or htmlspecialchars().
It is also important to note that, when using this
technique, document headers may be sent at anytime
until the buffer is flushed, which, depending on the
methods used, may not be until the very end of the
script. In the example above, it is possible to place a call
to header() after the echo. However, it is not possible to
buffer headers sent with header(). These headers are still
sent immediately to PHP’s output buffer and cannot be
changed. As in all cases, headers must be sent before any
output. With output buffering, though, output is being
delayed. This is why it is possible to set headers after
calls to echo or print.
table “foo” contains 20,000 records. Iterating over
these records may take some time. Meanwhile, without
a chunked response, the user waits on this data with
no real feedback that the request is being processed.
However, the example in Listing 2 uses output buffering
to send a chunked response using flush().
According to the PHP manual, flush() flushes “the
output buffers of PHP and whatever backend PHP is
using (CGI, a web server, etc).” Thus, it “effectively tries
to push all the output so far to the user’s browser.” As
mentioned earlier, this is not always the case, however.
So, in Listing 2, the buffer is being explicitly flushed to
the client after every 100 records. Thus, the user receives
some feedback that the request is being processed and
Output buffering is a surefire way to take
control of your output.
Sending Chunked Responses
A chunked response is one that is broken up into smaller
pieces and sent separately rather than all at once. In a
typical process, all output is sent to PHP’s output buffer,
which usually waits to send the data to the client until
the script finishes. Then, when it is sent to the client,
it includes a Content-Length header specifying the exact
length of the content.
Sometimes, however, it is necessary to send data to
the client before the script finishes. This is especially the
case when processing large amounts of data could lead
to very long page load times. Output buffering can solve
this problem by providing the means to immediately
flush the contents of the buffer to the Web server itself,
encouraging it to send the contents immediately. I say
that it “encourages” the Web server because the Web
server may not always do this, as is the case when Apache
is using mod_gzip or when using certain Web servers on
the Microsoft Windows platform.
Nevertheless, when using a standard Apache
installation without mod_gzip, the ability to send chunked
responses can greatly improve usability and decrease
load times. Listing 2 shows an example that might be
used in a real-world scenario.
For the sake of argument, let’s say the fictional
58 • php|architect • Volume 5 Issue 1
can begin viewing records while the remainder of the
script continues to process and send more data to the
client.
Note that the response now contains a
Transfer-Encoding header with a value of “chunked” in
lieu of the Content-Length header.
URL Rewriting
Not to be confused with Apache’s mod_rewrite, PHP’s
output buffering functionality allows users to “rewrite”
URLs by dynamically appending querystring values to
URLs and adding hidden form fields in output. This
works in much the same way as the session ID with
session.use_trans_sid set in php.ini.
For example, consider the following HTML:
<a href=”foo.php”>Link</a>
<form action=”bar.php” method=”POST”>
<input type=”text” name=”baz” />
</form>
Now, consider that a persistent variable of some
sort—perhaps an authentication token—needs to exist
throughout the script in all links and forms. Simply add
the following at the top of the script (or above the
content where the variable should be appended):
Output Buffering
<?php
output_add_rewrite_var(‘token’, ‘abc123’);
?>
Now, the link and form will be rewritten as such:
<a href=”foo.php?token=abc123”>Link</a>
<form action=”bar.php” method=”POST”>
<input type=”hidden” name=”token” value=”abc123” />
<input type=”text” name=”baz” />
</form>
To clear the variable(s) set with output_add_rewrite_var()
from being appended in later parts of the script, use
output_reset_rewrite_vars().
The behavior of this functionality is controlled by
url_rewriter.tags in php.ini.
Content Length and Fin
Finally, it is possible to get the length of the content
in the buffer with ob_get_length() for times when it is
necessary to explicitly set the Content-Length header at
the script level, among other things.
Output buffering is a surefire way to take control
of your output. Implementing these techniques in
your scripts can help improve the performance and, in
some cases, usability of your applications. Still, there
are myriad ways to use output buffering; I’d like to
hear yours. If you have a tip or trick that you’d like to
see published here, send it to tnt@benramsey.com, and,
if I use it, you’ll receive a free digital subscription to
php|architect.
Until next time, happy coding! 
LISTING 1
1 <?php require_once ‘pdml.php’; ?>
2 <pdml>
3
<body>
4
<font face=”Arial” size=”16pt”>Hello, World!</font>
5
</body>
6 </pdml>
LISTING 2
1 <?php
2 ob_start();
3 try
4 {
5
$i = 0;
6
$dbh = new PDO(‘mysql:host=localhost;dbname=test’,
$user, $pass);
7
foreach ($dbh->query(‘SELECT * FROM foo’) as $row) {
8
print_r($row);
9
$i++;
10
if ($i % 100 == 0) flush();
11
}
12
flush();
13
$dbh = NULL;
14 }
15 catch (PDOException $e)
16 {
17
print ‘Error: ‘ . $e->getMessage();
18 }
19 ?>
BEN RAMSEY is a Technology Manager for Hands On Network
in Atlanta, Georgia. He is an author, Principal member of the PHP
Security Consortium, and Zend Certified Engineer. Ben lives just north
of Atlanta with his wife Liz and dog Ashley. You may contact him at
ramsey@php.net or read his blog at http://benramsey.com/.
Volume 5 Issue 1 • php|architect • 59
Product Review: Komodo
PRODUCT REVIEW
Komodo
The Web Development IDE for all platforms?
by PETER B. MacINTYRE
I
first heard of the Komodo development
environment by seeing one of their advertisements
in our magazine. This got me interested in the
tool, as I had been using another competing IDE
for some time. What interested me most was that
Komodo claimed to be a development environment for
many other languages. The focus of this review, however,
will be on the PHP portion. I usually give the product the
first say in summarizing its claims, so let’s see what they
have to say about themselves:
PHP: 4+, 5+
PRODUCT VERSION: 3.5.2
O/S: Linux/Unix, Windows, MacOS, Solaris
PRICING:
Personal - $29.95 US
Professional - $295.00 US
LINK: http://www.activestate.org
ActiveState Komodo is the award-winning, professional
integrated development environment (IDE) for dynamic
languages, providing a powerful workspace for editing,
debugging, and testing your programs.
Getting Started
So, let’s take a look at what this tool is supposed to
do. The installation process on Windows was quite uneventful. I simply downloaded the license key and the
installation file and ran the install wizard. Once this
was completed I started up the IDE and started to poke
around. The layout itself takes a little getting used to,
especially if you are quite accustomed to using another
interface, but once I got oriented, I generally liked what
I saw. Figure 1 shows the Komodo IDE at first start-up.
As is evident here, there are two major sections to
the IDE: the project management area on the left side,
and the main editing window on the right. Of course,
there are also toolbars at the ready along the top and
debugging panes along the bottom. One nice thing about
how Komodo starts up is that there is a page full of
60 • php|architect • Volume 5 Issue 1
your most recent work (projects and files) and helpful
tutorials. This certainly aids the developer in getting
right back to the work that they were doing the last time
they were using this IDE. Much time is often lost in reopening the number of files in a related project just to
get back to what you were most recently doing.
Digging Deeper
Let’s look at some of the features that make Komodo
stand out from the crowd. Apart from the toolbars along
the top of the IDE that appear on startup, there are
also a few other toolbars that can be added to the mix.
One thing that I have been looking for in a PHP toolbar
(and haven’t really found, elsewhere) is the ability to
customize and size the toolbars. I am one developer
who likes to set up their own environment and have the
ability to add commonly used menu items (common for
me) to the tool bar.
The code editor is also a place where I have certain
Product Review: Komodo
things that I am looking for. As is shown in Figure 2,
you can see that this editor is color sensitive to the
differences between raw HTML and PHP code (or the web
development language that you are currently using). I
did find, though, that the default colors were not as stark
in their differences as I would like so that they stand
out more clearly, but this is a personal matter, and the
that is very often repeated--just this little feature alone
is quite a time-saver.
Another feature that I liked was in the project pane
on the left of the display. There is a little toolbar there as
well, with 3 top-level options all related to the project at
hand. This is another time-saving feature that puts the
more-often-used menu items at the ready.
The built-in browser automatically refreshes
itself if the file being viewed is changed
within Komodo.
colors can be adjusted in the preferences section. Figure
3 shows the “Dark” code editing setting as a contrast
to the default settings. As well, this editor has a code
folding feature which is quite useful in getting some
code that you know is fine out of scope while looking for
a coding problem (as one example).
What I liked
What I really liked about this product was that the builtin browser would automatically refresh itself if the file
being served was changed within Komodo. This only saves
a mouse press or two, generally, but it is also something
FIGURE 1
I also really liked the fact that a developer could have
a few projects open at the same time. This is valuable
in that you can access code that is similar in another
project and bring it into play in a different project
without having to close one and open the other—yet
another time saver.
But, it’s not all about saving development time on
menu items and key-strokes, there is also the overall
functionality of the product. This seems to be where
Komodo didn’t quite live up to my expectations.
What I Didn’t Like
In other IDEs that I have looked at, there
were some great features that, once a
person gets comfortable in using them, are
definitely missed when they are absent.
Code completion (AKA Code intelligence)
and syntax checking is the first example.
Komodo has support for these features, but
I was completely unable to get it to function
in the PHP context (it did, however seem to
work fine with some of the other supported
languages), even after an email exchange
with Komodo’s tech support team.
The connection to database servers is
another feature that is common to a few
other PHP IDEs; this is not present in
Komodo. Although it is not necessarily
directly related to a language’s development
environment, it is almost a must-have these
Volume 5 Issue 1 • php|architect • 61
Product Review: Komodo
FIGURE 2
FIGURE 3
FIGURE 4
days, since there is so much use of database connectivity
in web sites.
The last item that I wanted to touch on as frustrating
was my inability to get the PHP debugger operational.
Now, to be fair, this may be a basic PEBCAK (problem
exists between chair and keyboard) error, but I did try
a number of times to get it operational, and failed. I
expected this feature to be quite a bit simpler to set
up.
Summary
The Komodo IDE project has won awards; you can see
them listed on their web site. In my opinion, however,
there are better PHP IDEs on the market.
I think that part of the issue that I have with Komodo
is that they are trying to be all editors to many languages
and that seems to be too large of a task for them.
There was lots of supporting information, and I did
get some good pointers from some of the staff members
at Active State, so it’s not all bad. Since I only tested
this product in the context of PHP development, I cannot
speak of its abilities with the other languages that it
claims to support, like Perl, Python, and Ruby.
Dynamic Web Pages
I give this product 3.5 starts out of a possible 5.
www.dynamicwebpages.de
sex could not be better |
dynamic web pages - german php.node
news . scripts . tutorials . downloads . books . installation hints
62 • php|architect • Volume 5 Issue 1
PETER MACINTYRE lives and works in Prince Edward Island,
Canada. He has been and editor with php|architect since September
2003. Peter is a Zend Certified Engineer. Peter’s web site is at
http://www.paladin-bs.com.
///exit(0); //////
2006: A Look Forward
by M ARCO TABINI
I
’ve got to hand it to Derick Rethans—he’s launched
a new fashion. Derick’s Look Back, which has been
running for three years now and is featured for
the first time in this issue of php|architect, has
spawned a veritable industry of blog posts and
articles that provide some interesting insight into what
happened to PHP in the past year.
At the risk of alienating some of my other friends,
however, I must say that I still like Derick’s Look Back best,
because its only goal is really that of fondly reminiscing
about how his life—and by extension the many lives that
PHP touches every day—has been affected by the events
surrounding the internals mailing list.
Since so much has been said and done about what
has happened in 2005, I figured I’d take a look at what
I think will happen in 2006. Predictions are nothing
new, of course, and there’s nothing quite as potentially
devastating to one’s reputation than playing prophet only
to be proven completely wrong in the end. Oh well, it’s
worked for Nostradamus… so, as long as I can manage to
be vague enough to give the idea that I know what I’m
talking about without actually saying anything, I should
be perfectly fine.
In a recent blog post (http://blogs.phparch.com/
mt/?p=106), I claimed that 2006 would be the “year
of confusion” for PHP. The short version of my thesis
there was that PHP has reached such a level of maturity
that any further innovation comes at a steep price for
everyone involved—those who develop the language,
who must carry an ever-growing baggage of backwardscompatibility needs, and those who develop with the
language, who will be faced with the non-trivial task of
migrating the next cycle of their applications to PHP 5.
I still think that confusion will be the defining factor
of 2006. The problem with PHP is best described by
an aphorism that I picked up from one of my very first
64 • php|architect • Volume 5 Issue 1
business partners way back when I got started: “turning
a hundred thousand dollars in a million is a heck of a
lot easier than turning a million into ten.” I remember
laughing at the time—probably because a hundred
thousand dollars seemed such a ridiculous amount of
money—but I have found out just how right he was. With
monetary growth also comes a growth in the complexity
of a company, which, in turn, increases your overhead.
The same is true of a mature language like PHP, and
in more ways that meet the eye. As I mentioned, changes
to the language itself are getting increasingly difficult
to implement because of compatibility considerations.
The real challenge, however, is going to be in the hands
of the thousands (or hundreds of thousands, if you
believe some research firms), who in 2006 are likely to
find themselves faced with an end-of-lifetime decision
regarding their current applications.
On one hand, porting software to PHP 5 seems to
be the logical thing to do: after all, what’s the point of
rewriting your applications if you don’t take advantage
of a version of the language that provides you with
the best possible facilities and the highest longevity?
Maintaining PHP 4 is going to make less and less sense
to developers for a number of reasons—primarily the
fact that it provides very limited support for some of the
technologies that are emerging as the must-haves of web
development, like good XML handling, SOAP, and so on.
On the flip side, the average PHP developer is, in
my opinion, thoroughly confused by PHP 5—and
several well-publicized recent “discussions,” such as the
reference hoopla, have done nothing to make things
better. While the PHP development team is busy scoping
out and developing PHP 6, the community will have a lot
of catching up to do trying to educate itself into proper
PHP 5 development. 
/