PHP bad practice: associative arrays for complex data

July 3rd, 2009 by Robert Enyedi

Because software migration is our business, at Numiton we’ve had our fair share of PHP code reviews. The commercial PHP code that we dealt with yet was of a much better quality than open-source code. This is probably due to the apparent easiness of the PHP language and the lower entry barrier for contributors. These make the open-source projects more dynamic and surrounded with enthusiastic users and developers. There are constant trade-offs in software development, so nothing is new here.

The problem

While migrating various open-source PHP projects to Java we’ve come across several usages of PHP that are worrisome and which we consider bad practice. One that keeps recurring is the use of associative arrays in representing complex data structures.

Let’s assume that we want to represent a blog entry and its comments. Most of the time this is done with arrays:

$entryArr =
  array("content"=>"Sample content",
        "comments"=>
          array(
            "author"=>array(
                       "name"=>"John Doe",
                       "email"=>"john.doe@***.org"),
            "content"=>"Comment content"));

To display the email of the first commenter, we have:

echo $entryArr["comments"][0]["author"]["email"];

The problem is we have several levels of data nesting and not much information on how to use the overall structure. We can supply some documentation in the code, but where exactly? In this example we can put it at the construction point of $entryArr, but in a larger code we will have many of these.

Of course, we can create a document describing the structure. But the document will be disconnected from the code and prone to becoming inaccurate in time, as we make changes to the data structure.

The solution

PHP does not have a simple mechanism for defining data structures (for instance like the C structs). However, versions 4+ support object-oriented constructions and thus data encapsulation.

The fundamental idea is to enforce a stricter organizing of data with specialized classes. Based on domain knowledge, the following classes would hold the blog entry data:

class BlogEntry {
  var $content;
  var $publishdate;

  /** @var array An array of BlogComment objects. */
  var $comments;
}

/** A comment to a blog entry. */
class BlogComment {
  /** @var User */
  var $author;
  var $content;

  function __construct(User $author, $content) {
    $this->author = $author;
    $this->content = $content;
  }
}

class User {
  var $name;
  var $email;

  function __construct($name, $email) {
    $this->name = $name;
    $this->email = $email;
  }
}

This makes it much easier for everyone to understand how a blog entry is represented. It is verbose for sure, but we don’t need a separate document to tell us how to construct and navigate blog entry data structures.

Note the usage of phpDocumentor comments to specify meta information. These are standardized ways of describing your code which also come handy for generating code overview documentation like this one.

Let’s now use this object-oriented data structure:

$entry = new BlogEntry();
$entry->content = "Sample content";
$entry->comments[] = new BlogComment(new User(”John Doe”, “john.doe@***.org”), “Comment content”);

Looks cleaner already. The code to display the email of the first commenter becomes:

echo $entry->comments[0]->author->email;

As an added bonus, if you are using an IDE for development (like PDT) you will also get auto-completion of class fields.

Something to keep in mind though, if you plan to run your code in both PHP 4 and 5. Object transfer in assignments and function calls differs between versions. In PHP 4 objects behave just like arrays and are always copied by value. In PHP 5 however they are passed by reference (see this PHP manual entry for details). But since PHP 4 has been retired this should be no problem.

Conclusion

PHP associative arrays are powerful, but often overused - especially when creating complex data structures. If you are no longer using PHP 4 then classes are your best bet. They provide a meaningful way to encapsulate data and to bring order in large and complex structures.

Tags:

Leave a Reply