The problem with Database Abstraction Layers…

Let me preface this post by saying I reserve the right to be entirely mistaken, and I invite comments with opposing opinions…hey, maybe I’ll learn something by mouthing off! I know I’m painting with very broad brush-strokes and I expect to be corrected where my statements may be overly generalized.

Now, as to the topic at hand:  The problem with DALs, PDOs, ORMs, data access objects, etc. (such as Propel) is that they are only “clean” solutions for relatively simple, single table queries and/or small result sets.

Larger databases will see huge hits on performance if you try to return only primary keys for a result set, and then loop through the keys, creating an object for each (thereby firing a query for each). For displaying result sets, a single query is vastly superior.

Further, the DAL pattern appears to break down when your query includes multiple joins and aliases. Properly normalized relational databases require joins in order to return relevant data about a record to the user, i.e. you don’t store the user’s full name in every sales order record, only the id reference to his record in the customer table. I’ve yet to see a DAL that can effectively deal with multiple joins and aliasing without asking the coder to write explicit SQL.

Let’s look at an example of a “simple” query to demonstrate the power of DALs.  Here’s the standard way to write a simple MySQL query with PHP.

$sql = "SELECT name FROM users WHERE id = '100' LIMIT 1 ";
 
$result = mysql_query($sql);
 
$row = mysql_fetch_object($result);
 
$name = $row->name;

Now, lets see how a theoretical DAL class might handle the same query:

$name = $db->Fetch_Var('users', 'name', 'id', '100');

Wow.  You can see what took us four lines of code before now takes only one.  The second example uses an actual method I’ve written for a simple DAL class I use at work.  If I can ever get permission from the licensing dept, I’ll share it here.

For retrieving a single result or even an entire row (or rows) from a single table, such DALs can be effective and incredible powerful. However, for large scale paged result sets requiring joins, etc, I find it is most efficient to write custom SQL. In my opinion, these DALs are wishful thinking if the goal is to remove the need for the PHP developer to be able to write effective SQL. These tools should, rather, be treated exactly as such. Each is a tool to speed your coding and abstract away routine query building. But for any complex application, you are never going to get fully away from writing your own SQL queries. And due to the nature of these DALs/ORMs, the only SQL queries left to write are always going to be the most complex.

Propel actually does manage to provide some support for joins. I do not like Propel’s reliance on the “*” wildcard in the linked example, but perhaps that is just for simplicity in the documentation. Good SQL explicity names the columns to retrieve, and except for in-development testing, you should rarely if ever use “*” …its lazy coding and wastes resources (see Rudy Limeback’s Simply SQL for a more detailed argument against “*”). I’d like to see Propel, or any other ORM/DAL class deal, however, with a query such as this:

SELECT SQL_CALC_FOUND_ROWS
          projects.id
          , faculty
          , author
          , author2
          , author3
          , date_created
          , users.last_name     as faculty_last
          , users.first_name    as faculty_first
          , users2.last_name    as author_last
          , users2.first_name    as author_first
          , dept.dept_name
          , project_title
          , project_type
          , staff_no
          , project_status
          , organization.org_name
          , project_date
FROM projects
LEFT OUTER JOIN users as faculty  ON projects.faculty = users.id
LEFT OUTER JOIN users as users2  ON projects.author = users2.id
LEFT OUTER JOIN dept ON projects.dept = dept.id
LEFT OUTER JOIN organization ON project.org = organization.id
LEFT OUTER JOIN staff ON project.staff_no = staff.id
WHERE (users.last_name LIKE '%smith%' OR users.first_name LIKE '%smith%')
AND project.title LIKE '%Material Composition of Unobtanium%'
ORDER BY Items.Entry_Date DESC 
LIMIT 0, 15

The above example is a sanitized (names anonymized to protect the innocent) version of a real query I use to return paginated results in one of our web apps. Some of the joins are included dynamically based on search criteria, others are always included. The WHERE clause is likewise generated dynamically, as well as the ORDER BY, which is used for column-based sorting.

Show me a DAL/ORM class that can deal with this example and I’ll be your best friend forever.

Let’s now look, for example, at Propel’s approach to a multi-join query, taken from the above link.

Here’s the standard SQL way:

SELECT * 
FROM author 
  INNER JOIN book ON book.author_id = author.id 
  INNER JOIN publisher ON publisher.id = book.publisher_id
WHERE publisher.name = 'Some Name'

Here’s the Propel way:

$c = new Criteria(AuthorPeer::DATABASE_NAME);
 
$c->addJoin(AuthorPeer::ID, BookPeer::AUTHOR_ID, Criteria::INNER_JOIN);
$c->addJoin(BookPeer::PUBLISHER_ID, PublisherPeer::ID, Criteria::INNER_JOIN);
 
$c->add(PublisherPeer::NAME, 'Some Name');
 
$authors = AuthorPeer::doSelect($c);

If you are absolutely hell bent on returning your result set as objects, then I suppose the Propel approach is appealing, but it is more code in the end and there is an unavoidable learning curve (though admittedly not that daunting) to writing it.

If you’re considering yourself a professional PHP application developer and not just a “coder” or “website designer”, rather than avoiding SQL at all costs, you should be learning how to write good SQL.  I’m not sure where we got the idea that we needed a more succinct way to write queries without using SQL, since Structured Query Language is already succinct by design.

This is much the same argument as voiced against the myraid template engines out there…why build a template engine with its own simple syntax when PHP already is essentially a template engine?  Further, a similar argument has been made by Rasmus Lerdorf (also here), the father of PHP, with respect to the proliferation of frameworks.

Rather than use these drawbacks as an excuse to reject DALs (and template engines and frameworks for that matter), I am merely suggesting that the professional PHP developer recognize the limitations of these tools, and to know when it is more efficient to bang out an SQL statement than trying to contort your code and DAL class to deal with very complex queries.

One caveat I’d like to insert here at the end is ADOdb, perhaps the most popular DAL for PHP. I’ve used ADOdb in the past and like it. Rather than trying to abstract away all your SQL worries, ADOdb deals with the differences in connections and the PHP query functions across different database types, so that instead of obsessing about “mysql_query($sql)” versus “oci_execute($sql)”, you focus only on your SQL. The advantage is that your apps become more portable/scalable across various relational db products.

Further reading:
Propel Object-Relational Mapping Framework
ADOdb Database Abstraction Layer
Wikipedia entry on Object-Relational Mapping