Enhance performance with Zend_Cache

1/7/10 Update:  I submitted this post to Zend Developer Zone and they published it! Short Link: http://dz.zend.com/a/11582

So I finally took the Zend Framework plunge a few weeks ago.  No, I didn’t start building an application with the framework, but I did start investigating some of the ZF classes for stand-alone implementation in my existing projects.

The first ZF class to catch my eye was Zend_Cache, for its obvious performance implications.  The applications I develop and manage are very heavy with database transactions.  Hitting the db every time you need an object is a serious performance bottleneck, and on a shared environment can be troubling to other applications living in the same environment.

I’m going to describe my experiences with Zend_Cache here, but I am not going to bore you with lots of code detail and specifics. The above-linked reference page on the ZF site has more than adequate documentation.

Zend_Cache is incredibly flexible, and allows you to use any one of a number of back end caching methods…i.e. the place you want to store your cached data.  Each of these methods is implemented with an adapter class.  Your choices include file-based caching, sqlite, APC, memcached, Xcache, and ZendPlatform.  There are additional adapters to permit caching in two different methods (TwoLevels) and for methods specific to Zend Server.

The enterprise-shared hosting environment my apps live on includes APC, so I went with that adapter.  My dev environment runs Zend Server CE, which includes an APC-compatible Zend caching extension, so I enabled it through the Zend Server config pane.

There are also a few front end adapters to use, which operate the caching machinery.  Zend_Cache_Core is the base adapter and is, in Zend’s words, generic and flexible.  However, for more exotic solutions are available: File, Function, Output and File.

You can cache just about any kind of data that you can store in a variable:  intergers, strings, arrays, serialized stuff, and even objects.  Think about the possible implementations and their performance benefits for just a moment: Object persistence across sessions, recordset caching for speedier searches and paged results, I suppose one could even use this for an alternative session manager.

I already use a vastly simpler file-based caching method for storing serialized arrays for use in various select boxes that occurr frequently across my apps (e.g. “select your department” type stuff).  I may port these over to Zend_Cache later.  Instead, I have implemented Zend_Cache to store objects built from database queries. I am able to manipulate objects across multiple page requests and reduce database requests to only when the object is being modified and saved.

The implementation is extremely simple.  Let me demonstrate how to create a cached object. I will assume we already have an instance of the object we want to cache ($this), and I am not providing a complete class below (so don’t try to cut and paste the code literally).

// we need a method create a unique ID for the object
// this will be used to identify the cached object when created or when retrieving
// I would suggest using a hash that includes your db unique id
public function create_cache_id($id) {
    $cache_id = md5('myRecord_' . $id);
    return $cache_id;
}
 
// this method will retrieve the object
// $id should be unique id of your database record
public function fetch($id) {
 
    $cache_id = $this->create_cache_id($id);
 
    $frontendOptions = array(
        'lifetime' => 300 ,                 // cached object will expire after this many seconds
        /* this ROCKS because you don't spend time writing code to check if the cache is expired */
        'automatic_serialization' => true   // we'll need this on to store the object, also for an array
    );
 
    // note that for the APC adapter there are no back end options to configure
    // instantiate instance of Zend_Cache
 
    $cache = Zend_Cache::factory('Core', 'APC', $frontendOptions);
 
    if(!($obj = $cache->load($this->cache_id)) ) {
 
        /* if a cache with this ID doesn't exist, then execute your db query and build your object, then we'll add it to cache: */
 
        $this->myQuery($id); // this is just a generic method meant to represent your db transation
 
        $cache->save($this); // adds the object to cache
 
        echo 'this is not from cache'; // diagnostic, comment out later
 
    } else {
 
        /* if the cached object does exist, then we need to re-populate our object properties */
 
	// loop through each element of $obj and add as a property to $this
        foreach($obj as $key => $val) {
            $this->$key = $val;
        }
 
        echo 'this is from cache';  // diagnostic, comment out later
 
    }
 
}

I’ve grossly oversimplified the external parts of the process such as the parent object class and the db interactions, but this should still give you the idea. Any time you generate your object, you’ll automatically check the cache first before querying the database.

You’re also going to want a method to clear the cache if your object changes. For example, if your user edits a record using an input form, you’re going to want to update the db and the cache. Rather than immediately re-loading the object into cache, I prefer to simply clear it out of cache and wait for the next fetch request to add back into cache. This way you don’t have a bunch of objects stored in cache that aren’t being used.

Anyway, the approach to clearing the cache is extremely simple. Let’s assume you already have a method for processing an update to a database record that underlies your object. To clear the old version of the object from cache, simply call a method like this:

public function clear_cached($id) {
 
       $cache_id = $this->create_cache_id($id);
       $cache = Zend_Cache::factory('Core', 'APC');
       $cache->remove($cache_id);
 
}

It is truly not much more complicated than the above examples, particularly if you already have an existing application structure you want to add Zend_Cache to. I’ve been pleased with my first experiments into Zend Framework and intend to explore some other components for use in my projects. Hopefully this tutorial will help someone else out there who’s been hesitant to buy into the framework craze.

UPDATE: Just thought I’d link to a couple of other great resources on Zend_Cache:

Zend Framework Hidden Gems: Zend_Cache (Zend Developer Zone)

Joey Rivera: Caching using PHP/Zend_Cache and MySQL

Lifehacker: Teach Yourself to How to Code

Lifehacker continues to move up my list of favorite sites to visit on a daily basis.  They recently ran a list of top how-to guides from 2009, and included among them is Programming 101: Teach Yourself How to Code.
PHP gets plenty of attention in the section about server-side scripting languages, althought I will note they chose to post the cover of a Python book in the paragraph.

Server-side scripting: Once you’re good at making things happen inside a web page, you’re going to need to put some dynamic server action behind it—and for that, you’ll need to move into a server-side scripting language, like PHP, Python, Perl, or Ruby. For example, to make a web-based contact form that sends an email somewhere based on what a user entered, a server-side script is required. Scripting languages like PHP can talk to a database on your web server as well, so if you want to make a site where users can log in and store information, that’s the way to go. Excellent web development site Webmonkey is full of tutorials for various web programming languages. See their PHP Tutorial for Beginners. When you’re ready, check out how to use PHP to talk to a database in WebMonkey’s PHP and MySQL tutorial. PHP’s online documentation and function reference is the best on the web. Each entry (like this one on the strlen function) includes user comments at the bottom which are often as helpful as the documentation itself. (I happen to be partial to PHP, but there are plenty of other server-side scripting languages you might decide to go with instead.)

The author, Gina Trapani, is Lifehacker’s founder and, like me,  is a self-taught programmer:

Good coders are a special breed of persistent problem-solvers who are addicted to the small victories that come along a long path of trial and error. Learning how to program is very rewarding, but it can also be a frustrating and solitary experience. If you can, get a buddy to work with you along the way. Getting really good at programming, like anything else, is a matter of sticking with it, trying things out, and getting experience as you go.

New theme for my blog…

I’ve been paying more attention to web design lately.  Why?  Well I’m definitely not morphing into a designer–I don’t have the DNA for it–but rather because design is relevant, even in an enterprise environment.  As I have mentioned before, the interface matters to the end user–perhaps moreso than all the backend code you’ve spent hours building and that only other coders can truly appreciate.

With that said, I think this blog was getting stale, and although I continue to adore everything Apple, the “Mac” themed blog thing is just a little dated.  After searching WordPress.org, I stumbled upon MacPress, a very clean and modern Web 2.0-ish theme by the folks at Sizlopedia.  I hope you enjoy the cleaner look, which I happen to think is vastly more readable.  I’ll probably tweak here and there in the next few weeks, but I am largely happy with it as-is.

You can download MacPress from WordPress.org or from Sizlopedia.com.

Smashing Mag: How to Support IE and Still be Cutting Edge

Smashing Magazine is more for the web “design” and photoshop crowd, but any developers working with GUI/front ends at all (which is most of us) will find tons of great information there.  Chris Blatnik says that it is the GUI that makes or breaks an app (after all, the users never see the code, no matter how great the developer thinks it is).

Their latest post is about supporting IE on your websites while still utilizing the latest web technologies, such as CSS3.

The payoff:

Remember that the purpose of this post is not to teach you how to hack IE or deal with its quirks or even how to achieve effects by resorting to JavaScript. Rather, it is to explain how we can design and build websites knowing that differences will arise between browsers.

You won’t see people rioting over the lack of rounded corners on Twitter or WordPress; they aren’t even upset by it, because those differences don’t fundamentally break the websites. People can still use the websites and have a good experience. In most cases, they won’t even notice it!

Brandon Savage: He knows what he’s talking about…

Via PHPdeveloper.org, I’ve been following Brandon Savage’s blog for a few months.  Most recently, he’s posted a pair of excellent articles on productive and useless micro-optimizations.  These are code “enhancements” such as “change all print() statements to echo statements because echo is faster”.  Brandon takes a look at several popular micro-enhancements and offers his professional opinion on whether they are truly worth your time or not.

I find Brandon’s knowledge of PHP a great resource, and his practical approach is appreciated and it is very evident he develops in the real world rather than in abstract-land. Many of his recent posts have been focused on the beginning php developer.  So if you are a newbie or an intermediate (I consider myself the latter), Brandon’s stuff is very much worth your time.

Fabien Potencier: PHP does need a template engine

Fabien Potencier’s latest blog post is yet another entry in the popular debate in the PHP community over templating. Recently, conventional wisdom has swung back to the use of PHP itself for templating, rather than one of the popular but resource intensive template engines like Smarty.

I discussed this myself here. At the time Brian Lozier wrote his article, his approach was probably in the minority. Other than the use of short tags, however, PHP for templating has seen a renaissance.

Fabien’s angle on this, however, is anti-popular sentiment. He argues that PHP is too verbose to be a good template language, its syntax is not concise enough for templating, and it lacks sufficient reusability in a template context. I find the first two of these positions pretty subjective, and you’ll find from the blog comments there are plenty of opposing opinions on this.

I posted my own comments over there, and re-print them here for your consideration. I think I made some valid points, but then again I have no perspective but my own, so I could be way off.

I’ve been templating directly with PHP for some time now using Brian Lozier’s. I’ve since ditched the short tags, but for me, working in a solo environment, I’d rather have simplicity (don’t have to learn a template engine syntax), control, and save the performance to spend later on things like actual application features.

Further, I’m not sure what all the heartburn over escaping variables in the template is about? IMO you should never be doing real work in the template anyway, any variables should be prepared in your controller class/script BEFORE being sent to the template. You’re violating your own rule of separation of concerns by doing that much PHP work in the template. Likewise, making a function call in the template would violate the same principle. I’ve yet to encounter anything in two large scale enterprise apps that I couldn’t template with PHP.

I also agree with the comments above that its not going to hurt the designer to learn a bit of PHP. Its really not that hard and it is a translatable skill. The en-vogue template engine of the day may or may not be in use in 5 years, but PHP will be.

I would argue that a very small percentage of total PHP developers are working in an environment where there is a separate templater that ONLY does the template work and has no knowledge of or role in application coding.

In the end, Fabien announces he’s built upon some prior work by another developer to create yet another “magic bullet” template language. I had to re-think the intent of the article in its entirety once I realized at least part of his motivation must be to encourage folks to try out his new template engine. I do not fault Fabien for this (we all want to promote and share our work), but it causes me to reach the conclusion that his critique of PHP for templating is perhaps not 100% impartial.

Update: Eli White has also posted a response to Fabien.

Update 2: Padriac Brady has now posted a rather brutal response to Fabien:

Fabien’s article triggered the urge once again to challenge the status quo, the continued view of something in PHP being necessary when in truth it simply isn’t. The article takes that view to extremes, going to some effort arguing against the recent slide towards templating with PHP with arguments which are so biased as to misinform readers. [trackback]

And, now Fabien has posted a follow up to his original post:

Before I try to answer some questions, I’d like to reinstate that I like PHP templates. And you should remember that symfony has only used plain old PHP templates since the beginning. As a matter of fact, I’m been advocating about using PHP templates since my first PHP project, and I have never used any other PHP template engines. So, I’m not against PHP templates; I just find that some PHP limitations as a template language are more and more irritating for me.

Basic AJAX with PHP and jQuery

I gave a live tutorial/demo at work today for room full of folks. I’ve posted the source code and powerpoint below. The title is pretty self explanatory…just a very simple demonstration of AJAX techniques using jQuery JavaScript and PHP on the server side in a very crude Twitter-like mini-posting app.

Basic AJAX with PHP and jQuery – Source Code

Powerpoint Presentation

The problem with Database Abstraction Layers…

Let me preface this post by saying I reserve the right to be entirely mistaken, and I invite comments with opposing opinions…hey, maybe I’ll learn something by mouthing off! I know I’m painting with very broad brush-strokes and I expect to be corrected where my statements may be overly generalized.

Now, as to the topic at hand:  The problem with DALs, PDOs, ORMs, data access objects, etc. (such as Propel) is that they are only “clean” solutions for relatively simple, single table queries and/or small result sets.

Larger databases will see huge hits on performance if you try to return only primary keys for a result set, and then loop through the keys, creating an object for each (thereby firing a query for each). For displaying result sets, a single query is vastly superior.

Further, the DAL pattern appears to break down when your query includes multiple joins and aliases. Properly normalized relational databases require joins in order to return relevant data about a record to the user, i.e. you don’t store the user’s full name in every sales order record, only the id reference to his record in the customer table. I’ve yet to see a DAL that can effectively deal with multiple joins and aliasing without asking the coder to write explicit SQL.

Let’s look at an example of a “simple” query to demonstrate the power of DALs.  Here’s the standard way to write a simple MySQL query with PHP.

$sql = "SELECT name FROM users WHERE id = '100' LIMIT 1 ";
 
$result = mysql_query($sql);
 
$row = mysql_fetch_object($result);
 
$name = $row->name;

Now, lets see how a theoretical DAL class might handle the same query:

$name = $db->Fetch_Var('users', 'name', 'id', '100');

Wow.  You can see what took us four lines of code before now takes only one.  The second example uses an actual method I’ve written for a simple DAL class I use at work.  If I can ever get permission from the licensing dept, I’ll share it here.

For retrieving a single result or even an entire row (or rows) from a single table, such DALs can be effective and incredible powerful. However, for large scale paged result sets requiring joins, etc, I find it is most efficient to write custom SQL. In my opinion, these DALs are wishful thinking if the goal is to remove the need for the PHP developer to be able to write effective SQL. These tools should, rather, be treated exactly as such. Each is a tool to speed your coding and abstract away routine query building. But for any complex application, you are never going to get fully away from writing your own SQL queries. And due to the nature of these DALs/ORMs, the only SQL queries left to write are always going to be the most complex.

Propel actually does manage to provide some support for joins. I do not like Propel’s reliance on the “*” wildcard in the linked example, but perhaps that is just for simplicity in the documentation. Good SQL explicity names the columns to retrieve, and except for in-development testing, you should rarely if ever use “*” …its lazy coding and wastes resources (see Rudy Limeback’s Simply SQL for a more detailed argument against “*”). I’d like to see Propel, or any other ORM/DAL class deal, however, with a query such as this:

SELECT SQL_CALC_FOUND_ROWS
          projects.id
          , faculty
          , author
          , author2
          , author3
          , date_created
          , users.last_name     as faculty_last
          , users.first_name    as faculty_first
          , users2.last_name    as author_last
          , users2.first_name    as author_first
          , dept.dept_name
          , project_title
          , project_type
          , staff_no
          , project_status
          , organization.org_name
          , project_date
FROM projects
LEFT OUTER JOIN users as faculty  ON projects.faculty = users.id
LEFT OUTER JOIN users as users2  ON projects.author = users2.id
LEFT OUTER JOIN dept ON projects.dept = dept.id
LEFT OUTER JOIN organization ON project.org = organization.id
LEFT OUTER JOIN staff ON project.staff_no = staff.id
WHERE (users.last_name LIKE '%smith%' OR users.first_name LIKE '%smith%')
AND project.title LIKE '%Material Composition of Unobtanium%'
ORDER BY Items.Entry_Date DESC 
LIMIT 0, 15

The above example is a sanitized (names anonymized to protect the innocent) version of a real query I use to return paginated results in one of our web apps. Some of the joins are included dynamically based on search criteria, others are always included. The WHERE clause is likewise generated dynamically, as well as the ORDER BY, which is used for column-based sorting.

Show me a DAL/ORM class that can deal with this example and I’ll be your best friend forever.

Let’s now look, for example, at Propel’s approach to a multi-join query, taken from the above link.

Here’s the standard SQL way:

SELECT * 
FROM author 
  INNER JOIN book ON book.author_id = author.id 
  INNER JOIN publisher ON publisher.id = book.publisher_id
WHERE publisher.name = 'Some Name'

Here’s the Propel way:

$c = new Criteria(AuthorPeer::DATABASE_NAME);
 
$c->addJoin(AuthorPeer::ID, BookPeer::AUTHOR_ID, Criteria::INNER_JOIN);
$c->addJoin(BookPeer::PUBLISHER_ID, PublisherPeer::ID, Criteria::INNER_JOIN);
 
$c->add(PublisherPeer::NAME, 'Some Name');
 
$authors = AuthorPeer::doSelect($c);

If you are absolutely hell bent on returning your result set as objects, then I suppose the Propel approach is appealing, but it is more code in the end and there is an unavoidable learning curve (though admittedly not that daunting) to writing it.

If you’re considering yourself a professional PHP application developer and not just a “coder” or “website designer”, rather than avoiding SQL at all costs, you should be learning how to write good SQL.  I’m not sure where we got the idea that we needed a more succinct way to write queries without using SQL, since Structured Query Language is already succinct by design.

This is much the same argument as voiced against the myraid template engines out there…why build a template engine with its own simple syntax when PHP already is essentially a template engine?  Further, a similar argument has been made by Rasmus Lerdorf (also here), the father of PHP, with respect to the proliferation of frameworks.

Rather than use these drawbacks as an excuse to reject DALs (and template engines and frameworks for that matter), I am merely suggesting that the professional PHP developer recognize the limitations of these tools, and to know when it is more efficient to bang out an SQL statement than trying to contort your code and DAL class to deal with very complex queries.

One caveat I’d like to insert here at the end is ADOdb, perhaps the most popular DAL for PHP. I’ve used ADOdb in the past and like it. Rather than trying to abstract away all your SQL worries, ADOdb deals with the differences in connections and the PHP query functions across different database types, so that instead of obsessing about “mysql_query($sql)” versus “oci_execute($sql)”, you focus only on your SQL. The advantage is that your apps become more portable/scalable across various relational db products.

Further reading:
Propel Object-Relational Mapping Framework
ADOdb Database Abstraction Layer
Wikipedia entry on Object-Relational Mapping

Mac OS X 10.6 Snow Leopard: My upgrade experience

My Snow Leopard disk arrived Friday, and I promptly began upgrading my three Macs (A 2008 Mac Pro and early 2008 MacBook Pro at work, and a 2008 Mac Pro at home).

I’ve read very few horror stories so far about folks and their upgrades.  I had no issues whatsoever.  Install took about 1 hour for each of these three machines.

I likewise had virtually no issues with any of my apps.  Preference pane add-ons iStat Menus and Blueharvest didn’t work.  Fortunately, Blueharvest’s developer already had a new 10.6 compatible version available. Still waiting on something for iStat Menus however, but it is not a big operational loss for me to not know the exact load of my CPUs at any given moment.

hero_osx_2009082810.6 boots by default into 32-bit kernel mode. This is done to maximize compatibility with dozens of apps that haven’t been updated to work in 64-bit mode.  10.6 is great in 32-bit mode, though if you’re like me and cannot use the new Exchange features (We’re in the midst of migrating from Exch 03 to 07 and my account hasn’t been moved yet), you were left Friday a little bit disappointed by just how few visible perks Snow Leopard gives you.

So this morning I booted into 64-bit mode on my Mac Pro at the office (done by holding down “6” and “4” keys during boot, hold “3” and “2” during boot to go back to 32-bit).  All I have to say is “WOW.”  This machines flies now…start up was multiple times faster (or at least seemed..I didn’t time it) than before, and all my login items fired up at least twice as fast as before.

Mac Pro64-bit mode, as you might suspect, produced more software incompatibilities.  My Parallels 4.0 (mission critical for me to run a few Windows apps, as well as IE6 and IE7), would not load a VM.  The problem was a driver incompatible with 64-bit mode.  Big kudos to the folks at Parallels, however, because they released an update over the weekend that resolves the issues.

1Password 2 also encountered issues with 64-bit mode, but a few minutes spend cruizing the Agile Software website got me into the 1Password 3 BETA program and a 64-bit compatible version. Thanks to Brett Terpstra of TUAW for the tip.

Everything else seems to work, and work fantastically well.  I’ve read reports of Office for Mac 2008 not running well in 64-bit mode, but its been fine for me, if not much quicker to load than in 32-bit mode.

Other Apps I’ve tested in 64-bit mode so far:

Netbeans 6.7.1 – works just fine

The Gimp (in X11) – seems to work fine

LittleSnapper – hung on my first try

Panic Transmit – works fine