MySQL Views just aren’t “there yet”….

I am a big fan of Jason Gilmore. His book Beginning PHP and MySQL (link is to latest version, not my actual copy) actually launched my developer career four years ago.

His latest post over on PHPbuilder.com, Refactor Your PHP Site Using MySQL Procedures and Views, has some good information about using views and stored procedures to simplify application logic and improve code maintainability.

I tried to comment to the article on PHPBuilder, but their comment deal is broken, so I thought this would make a decent blog post.

There is no question views and stored procedures are both powerful tools that MySQL went without for quite some time. However, I disagree with Jason to the extent his article fails to mention there is a cost to using these tools.

MySQL’s implementation of views has some well documented performance problems. I won’t go into them in detail here, you can read an excellent summary on the MySQL Performance Blog in a post by Peter called MySQL view as a performance troublemaker.  The urge to save yourself from writing a complex query can end up costing you and your users down the road when performance issues crop up.

In a vacuum, I completely agree that views and stored procedures should be an essential tool in the developer’s toolbox.
However, in reality, MySQL’s implementation of views leaves a lot to be desired.  MySQL views have a tremendous amount of overhead associated with them and using a PHP-side join query, no matter how complex, will result in significantly better performance than a MySQL view.
This article has much more to say on View performance in MySQL: http://www.mysqlperformanceblog.com/2007/08/12/mysql-view-as-performance-troublemaker/
I like stored procedures but as with views, PHP can often do many of these transformations and calculations more quickly than the MySQL server can.  I also think you have to be wary of putting too much stuff into stored procedures and creating a confusing architecture if not careful to  heed separation of concerns.

I like stored procedures, but SQL is slow when compared to PHP, and stored procedures will hit the database CPU more than a corresponding function in PHP.  I also think you have to be wary of putting too much stuff into stored procedures and creating a confusing architecture. A good developer always heeds the rule of separation of concerns.  Stuffing so much stuff into the database eventually will defeat the original purpose of simplifying code maintainability, particularly in a team environment.

If we’re talking Oracle or another of the more mature RDBMS products out there, there is no question that views should be leveraged as much as possible. While stored procedures are not as big of a performance concern, mis- or over-use can lead to architectural issues. With respect to views,  MySQL just isn’t there yet.

The problem with Database Abstraction Layers…

Let me preface this post by saying I reserve the right to be entirely mistaken, and I invite comments with opposing opinions…hey, maybe I’ll learn something by mouthing off! I know I’m painting with very broad brush-strokes and I expect to be corrected where my statements may be overly generalized.

Now, as to the topic at hand:  The problem with DALs, PDOs, ORMs, data access objects, etc. (such as Propel) is that they are only “clean” solutions for relatively simple, single table queries and/or small result sets.

Larger databases will see huge hits on performance if you try to return only primary keys for a result set, and then loop through the keys, creating an object for each (thereby firing a query for each). For displaying result sets, a single query is vastly superior.

Further, the DAL pattern appears to break down when your query includes multiple joins and aliases. Properly normalized relational databases require joins in order to return relevant data about a record to the user, i.e. you don’t store the user’s full name in every sales order record, only the id reference to his record in the customer table. I’ve yet to see a DAL that can effectively deal with multiple joins and aliasing without asking the coder to write explicit SQL.

Let’s look at an example of a “simple” query to demonstrate the power of DALs.  Here’s the standard way to write a simple MySQL query with PHP.

$sql = "SELECT name FROM users WHERE id = '100' LIMIT 1 ";
 
$result = mysql_query($sql);
 
$row = mysql_fetch_object($result);
 
$name = $row->name;

Now, lets see how a theoretical DAL class might handle the same query:

$name = $db->Fetch_Var('users', 'name', 'id', '100');

Wow.  You can see what took us four lines of code before now takes only one.  The second example uses an actual method I’ve written for a simple DAL class I use at work.  If I can ever get permission from the licensing dept, I’ll share it here.

For retrieving a single result or even an entire row (or rows) from a single table, such DALs can be effective and incredible powerful. However, for large scale paged result sets requiring joins, etc, I find it is most efficient to write custom SQL. In my opinion, these DALs are wishful thinking if the goal is to remove the need for the PHP developer to be able to write effective SQL. These tools should, rather, be treated exactly as such. Each is a tool to speed your coding and abstract away routine query building. But for any complex application, you are never going to get fully away from writing your own SQL queries. And due to the nature of these DALs/ORMs, the only SQL queries left to write are always going to be the most complex.

Propel actually does manage to provide some support for joins. I do not like Propel’s reliance on the “*” wildcard in the linked example, but perhaps that is just for simplicity in the documentation. Good SQL explicity names the columns to retrieve, and except for in-development testing, you should rarely if ever use “*” …its lazy coding and wastes resources (see Rudy Limeback’s Simply SQL for a more detailed argument against “*”). I’d like to see Propel, or any other ORM/DAL class deal, however, with a query such as this:

SELECT SQL_CALC_FOUND_ROWS
          projects.id
          , faculty
          , author
          , author2
          , author3
          , date_created
          , users.last_name     as faculty_last
          , users.first_name    as faculty_first
          , users2.last_name    as author_last
          , users2.first_name    as author_first
          , dept.dept_name
          , project_title
          , project_type
          , staff_no
          , project_status
          , organization.org_name
          , project_date
FROM projects
LEFT OUTER JOIN users as faculty  ON projects.faculty = users.id
LEFT OUTER JOIN users as users2  ON projects.author = users2.id
LEFT OUTER JOIN dept ON projects.dept = dept.id
LEFT OUTER JOIN organization ON project.org = organization.id
LEFT OUTER JOIN staff ON project.staff_no = staff.id
WHERE (users.last_name LIKE '%smith%' OR users.first_name LIKE '%smith%')
AND project.title LIKE '%Material Composition of Unobtanium%'
ORDER BY Items.Entry_Date DESC 
LIMIT 0, 15

The above example is a sanitized (names anonymized to protect the innocent) version of a real query I use to return paginated results in one of our web apps. Some of the joins are included dynamically based on search criteria, others are always included. The WHERE clause is likewise generated dynamically, as well as the ORDER BY, which is used for column-based sorting.

Show me a DAL/ORM class that can deal with this example and I’ll be your best friend forever.

Let’s now look, for example, at Propel’s approach to a multi-join query, taken from the above link.

Here’s the standard SQL way:

SELECT * 
FROM author 
  INNER JOIN book ON book.author_id = author.id 
  INNER JOIN publisher ON publisher.id = book.publisher_id
WHERE publisher.name = 'Some Name'

Here’s the Propel way:

$c = new Criteria(AuthorPeer::DATABASE_NAME);
 
$c->addJoin(AuthorPeer::ID, BookPeer::AUTHOR_ID, Criteria::INNER_JOIN);
$c->addJoin(BookPeer::PUBLISHER_ID, PublisherPeer::ID, Criteria::INNER_JOIN);
 
$c->add(PublisherPeer::NAME, 'Some Name');
 
$authors = AuthorPeer::doSelect($c);

If you are absolutely hell bent on returning your result set as objects, then I suppose the Propel approach is appealing, but it is more code in the end and there is an unavoidable learning curve (though admittedly not that daunting) to writing it.

If you’re considering yourself a professional PHP application developer and not just a “coder” or “website designer”, rather than avoiding SQL at all costs, you should be learning how to write good SQL.  I’m not sure where we got the idea that we needed a more succinct way to write queries without using SQL, since Structured Query Language is already succinct by design.

This is much the same argument as voiced against the myraid template engines out there…why build a template engine with its own simple syntax when PHP already is essentially a template engine?  Further, a similar argument has been made by Rasmus Lerdorf (also here), the father of PHP, with respect to the proliferation of frameworks.

Rather than use these drawbacks as an excuse to reject DALs (and template engines and frameworks for that matter), I am merely suggesting that the professional PHP developer recognize the limitations of these tools, and to know when it is more efficient to bang out an SQL statement than trying to contort your code and DAL class to deal with very complex queries.

One caveat I’d like to insert here at the end is ADOdb, perhaps the most popular DAL for PHP. I’ve used ADOdb in the past and like it. Rather than trying to abstract away all your SQL worries, ADOdb deals with the differences in connections and the PHP query functions across different database types, so that instead of obsessing about “mysql_query($sql)” versus “oci_execute($sql)”, you focus only on your SQL. The advantage is that your apps become more portable/scalable across various relational db products.

Further reading:
Propel Object-Relational Mapping Framework
ADOdb Database Abstraction Layer
Wikipedia entry on Object-Relational Mapping

MySQL forks reveal uncertainty about Oracle’s plans

Ran across this article on Zend.com about Drizzle, a lightweight fork of MySQL intended for “cloud” applications.  As we see more fork projects and groups take MySQL in their own directions, I continue to wonder what Oracle’s ultimate intentions are for the open source db now that it owns MySQL as part of its acquisision of Sun Microsystems.

Maybe tech and business industry types pointed out the painfully obvious fact that MySQL creates serious overlap with Oracle’s existing relational db product lineup.  However, it has not been emphasised that the two products really cater to different markets.  Yes, there’s some overlap, particularly in the low-cost rapid-application-development arena.  Are there low-cost small scale apps using Oracle? Yes, but I’m sure most are using the free “Express” version.  Are there huge enterprise scale apps using MySQL? Sure.  But, for the most part, I would wager that this overlap is negligible in comparison to the leverage that the gigantic MySQL installed base could give Oracle in terms of a gateway to new classes of potential customers for Oracle’s other products and services.