The problem with Database Abstraction Layers…

Let me preface this post by saying I reserve the right to be entirely mistaken, and I invite comments with opposing opinions…hey, maybe I’ll learn something by mouthing off! I know I’m painting with very broad brush-strokes and I expect to be corrected where my statements may be overly generalized.

Now, as to the topic at hand:  The problem with DALs, PDOs, ORMs, data access objects, etc. (such as Propel) is that they are only “clean” solutions for relatively simple, single table queries and/or small result sets.

Larger databases will see huge hits on performance if you try to return only primary keys for a result set, and then loop through the keys, creating an object for each (thereby firing a query for each). For displaying result sets, a single query is vastly superior.

Further, the DAL pattern appears to break down when your query includes multiple joins and aliases. Properly normalized relational databases require joins in order to return relevant data about a record to the user, i.e. you don’t store the user’s full name in every sales order record, only the id reference to his record in the customer table. I’ve yet to see a DAL that can effectively deal with multiple joins and aliasing without asking the coder to write explicit SQL.

Let’s look at an example of a “simple” query to demonstrate the power of DALs.  Here’s the standard way to write a simple MySQL query with PHP.

$sql = "SELECT name FROM users WHERE id = '100' LIMIT 1 ";
 
$result = mysql_query($sql);
 
$row = mysql_fetch_object($result);
 
$name = $row->name;

Now, lets see how a theoretical DAL class might handle the same query:

$name = $db->Fetch_Var('users', 'name', 'id', '100');

Wow.  You can see what took us four lines of code before now takes only one.  The second example uses an actual method I’ve written for a simple DAL class I use at work.  If I can ever get permission from the licensing dept, I’ll share it here.

For retrieving a single result or even an entire row (or rows) from a single table, such DALs can be effective and incredible powerful. However, for large scale paged result sets requiring joins, etc, I find it is most efficient to write custom SQL. In my opinion, these DALs are wishful thinking if the goal is to remove the need for the PHP developer to be able to write effective SQL. These tools should, rather, be treated exactly as such. Each is a tool to speed your coding and abstract away routine query building. But for any complex application, you are never going to get fully away from writing your own SQL queries. And due to the nature of these DALs/ORMs, the only SQL queries left to write are always going to be the most complex.

Propel actually does manage to provide some support for joins. I do not like Propel’s reliance on the “*” wildcard in the linked example, but perhaps that is just for simplicity in the documentation. Good SQL explicity names the columns to retrieve, and except for in-development testing, you should rarely if ever use “*” …its lazy coding and wastes resources (see Rudy Limeback’s Simply SQL for a more detailed argument against “*”). I’d like to see Propel, or any other ORM/DAL class deal, however, with a query such as this:

SELECT SQL_CALC_FOUND_ROWS
          projects.id
          , faculty
          , author
          , author2
          , author3
          , date_created
          , users.last_name     as faculty_last
          , users.first_name    as faculty_first
          , users2.last_name    as author_last
          , users2.first_name    as author_first
          , dept.dept_name
          , project_title
          , project_type
          , staff_no
          , project_status
          , organization.org_name
          , project_date
FROM projects
LEFT OUTER JOIN users as faculty  ON projects.faculty = users.id
LEFT OUTER JOIN users as users2  ON projects.author = users2.id
LEFT OUTER JOIN dept ON projects.dept = dept.id
LEFT OUTER JOIN organization ON project.org = organization.id
LEFT OUTER JOIN staff ON project.staff_no = staff.id
WHERE (users.last_name LIKE '%smith%' OR users.first_name LIKE '%smith%')
AND project.title LIKE '%Material Composition of Unobtanium%'
ORDER BY Items.Entry_Date DESC 
LIMIT 0, 15

The above example is a sanitized (names anonymized to protect the innocent) version of a real query I use to return paginated results in one of our web apps. Some of the joins are included dynamically based on search criteria, others are always included. The WHERE clause is likewise generated dynamically, as well as the ORDER BY, which is used for column-based sorting.

Show me a DAL/ORM class that can deal with this example and I’ll be your best friend forever.

Let’s now look, for example, at Propel’s approach to a multi-join query, taken from the above link.

Here’s the standard SQL way:

SELECT * 
FROM author 
  INNER JOIN book ON book.author_id = author.id 
  INNER JOIN publisher ON publisher.id = book.publisher_id
WHERE publisher.name = 'Some Name'

Here’s the Propel way:

$c = new Criteria(AuthorPeer::DATABASE_NAME);
 
$c->addJoin(AuthorPeer::ID, BookPeer::AUTHOR_ID, Criteria::INNER_JOIN);
$c->addJoin(BookPeer::PUBLISHER_ID, PublisherPeer::ID, Criteria::INNER_JOIN);
 
$c->add(PublisherPeer::NAME, 'Some Name');
 
$authors = AuthorPeer::doSelect($c);

If you are absolutely hell bent on returning your result set as objects, then I suppose the Propel approach is appealing, but it is more code in the end and there is an unavoidable learning curve (though admittedly not that daunting) to writing it.

If you’re considering yourself a professional PHP application developer and not just a “coder” or “website designer”, rather than avoiding SQL at all costs, you should be learning how to write good SQL.  I’m not sure where we got the idea that we needed a more succinct way to write queries without using SQL, since Structured Query Language is already succinct by design.

This is much the same argument as voiced against the myraid template engines out there…why build a template engine with its own simple syntax when PHP already is essentially a template engine?  Further, a similar argument has been made by Rasmus Lerdorf (also here), the father of PHP, with respect to the proliferation of frameworks.

Rather than use these drawbacks as an excuse to reject DALs (and template engines and frameworks for that matter), I am merely suggesting that the professional PHP developer recognize the limitations of these tools, and to know when it is more efficient to bang out an SQL statement than trying to contort your code and DAL class to deal with very complex queries.

One caveat I’d like to insert here at the end is ADOdb, perhaps the most popular DAL for PHP. I’ve used ADOdb in the past and like it. Rather than trying to abstract away all your SQL worries, ADOdb deals with the differences in connections and the PHP query functions across different database types, so that instead of obsessing about “mysql_query($sql)” versus “oci_execute($sql)”, you focus only on your SQL. The advantage is that your apps become more portable/scalable across various relational db products.

Further reading:
Propel Object-Relational Mapping Framework
ADOdb Database Abstraction Layer
Wikipedia entry on Object-Relational Mapping

delicious | digg | reddit | facebook | technorati | stumbleupon | chatintamil

10 thoughts on “The problem with Database Abstraction Layers…

  1. I’d be interested to get your opinion on Zend Framework (ZF) and its implementation of the Table Data Gateway and Row Data Gateway patterns. http://framework.zend.com/manual/en/zend.db.table.html. I tend to consider anything from Zend as the standard for PHP.

    In general, I would agree that a multi-join query is significantly easier to read and write in straight SQL. I have used Zend_Db_Table_Abstract (http://framework.zend.com/manual/en/zend.db.table.relationships.html) for abstract multi-join queries, and while it works, it just isn’t fun to write. I do use Zend_Db_Table_Abstract for all of my new development, but mainly for simple CRUD operations and sql escaping. Single-line inserts, updates, and deletes are very nice.

  2. Hey Jonathan. Thanks for reading my rantings. I think this post is a reflection of my growing pains as a developer. We’ll see if I think the same things about DALs a year from now. The Zend links are very interesting and I plan to spend some time exploring those more thoroughly. An upcoming project is to completely re-work an existing application that is 100% procedural. I am considering using ZF for this since I’ve not worked with it yet.

    I like to build my own classes for “framework” tasks, mostly for self-education. But I recognize I’m to the point I probably need to be leveraging some of the existing stuff out there.

  3. Thanks Jozef, but I don’t think I can justify switching to an entirely different language just to get a good mapper layer! Its not a problem for me in PHP, because I am comfortable writing SQL and I am not quite so hellbent on an OOP-or-nothing approach.

  4. I don’t want to convert you to Java 😉 , I only wanted to show you that there is a tool to satisfy your needs for PHP DAL. IMHO there are 3 kinds of DAL (or DAO pattern http://java.sun.com/blueprints/corej2eepatterns/Patterns/DataAccessObject.html). 1. classic procedural table oriented approach based on unified database API (ADOdb or PDO) 2. simple data mappers (e.g. iBatis) 3. full featured ORM (Hibernate for Java, Propel or Doctrine for PHP). My winner is the choice 2 where i can build an OO application with a DAL under my complete control.

  5. “Larger databases will see huge hits on performance if you try to return only primary keys for a result set, and then loop through the keys, creating an object for each”

    I don’t know about other DALs, but Zend DB at least does not do that. If you do a fetchAll query for example, each row object is created from the data in the result set.

    Also, not sure why you’ve mentioned PDO as a DAL – it has more or less exactly the same functionality as the php mysql_ functions, but in an object orientated manner.

  6. Thanks for the comments Dan. I’ve actually begun using Zend_Db in the past couple of months and its growing on me rapidly. Once you get comfortable with the syntax it is not much more trouble than writing raw SQL, and I like the prepared statement features you get when using the PDO_MySql adapter. The inclusion of PDO was more or less from ignorance, and if I wrote the article fresh today there’s quite a bit I could write differently.

  7. Hi, took me some time to read all the comments, but I really enjoyed the article. It proved to be very helpful to me and I am quite sure to all the commenters here! It’s always cool when you can not only be informed, but also entertained! As consequence: thanks a lot and please please keep up the good work. I’ll surely come back and look for new stories! Tschau

  8. Glad you got something out of the article. My position on abstraction has evolved since writing the article. I’ve been using Zend_Db extensively for the past 18 months or so and its grown on me.

    For simple queries and/or complex queries that you know are going to occur only in one place, I prefer to just pass in an SQL statement as opposed to using the Zend_Db_Statement syntax, just for simplicity. But, for complex dynamic queries I’ve used it a lot and it really helps cut down on the amount of custom stuff I have to write to pull such a query off effectively.

Leave a Reply

Your email address will not be published. Required fields are marked *