Tuesday, October 19, 2010

Static Analysis with PHP CodeSniffer

PHP CodeSniffer is a great tool to have in your development environment. Originally developed to ensure that your code matches your coding standards, I think its strength is somewhere else...

If you take the time to read through some of the coding standards that ship with PHP CodeSniffer, you will find that some of them don't really relate to coding standards exactly. Some are more related to code analysis, and after a recent update to a project I was working on, I think that this is PHP CodeSniffer's real strength...

You may recall from an earlier post that one of the projects I was working on used an in house DB abstraction layer. For reasons that are probably obvious, we began porting our code to MDB2. It was a large codebase, with queries sprinkled throughout. Quickly we realised that search and replace was not going to cut it, as we needed to begin using a result set variable.

The original class used a single variable for the connection and the result set. That is to say, with the single variable you would connect to the database, make the query and then iterate over the result set. MDB2 is different (and in this regard, more like the basic MySQL functions in PHP) in that you have one variable for the connection, call the query on that connection and then store the return value into another variable (the result set). So in effect, you use one variable for some things (calling queries, getting error messages and so on) and another for iterating over the results. Our problem with porting was that our class used one variable for all of that, so we couldn't easily search and replace.

 So what to do? Well naturally we could have looked everywhere for any instance where we made a connection, then ensured that the variable was used correctly, as were any variables returned from function calls. Or, we could have automated it.

So automate it I did.

I had been playing with PHP CodeSniffer anyway, and so I saw a great use for it here. Of course, it's really the tokeniser that we're taking advantage of, but PHPCS does a great job of providing us with extra features related to moving through the token list.
A couple of hours later, I had written the sniff. But what does it do? Well basically it looks for function calls that make a connection to the database. When it finds that, it looks back a little bit and determines the variable name that was being assigned to. We now know a variable that holds a DB connection, and with a little bit more magic, we can determine the scope that this variable is used in. Now look around in that scope, and ensure that when that variable gets used, it gets used in the right context. Let's go a bit deeper though... if the connection gets used in the right function, we know that the return value is a result set. Surely we can then iterate through it, doing the same work, testing that it gets used correctly?

And that's what I did. Because we were using MDB2, I even had more information... using reflection, I was able to determine return types of all sorts of functions and functions that existed on all sorts of classes. I could make sure that those variables were used correctly and that any functions called on those variables were valid for their class.

It was great, even if I do say so myself! In a couple of hours, I was able to test our entire codebase and ensure, recursively, that all of the variables that we were assigning to, passing as parameters, and calling functions on were being used in the right context.

And we found a heap of places where our attempts at porting went wrong. Which is what it's all about.

After that, I went nuts with static analysis. I started looking even at the queries that were being called and checking that when doing a SELECT COUNT, for example, that we used the queryOne function to just grab the value instead of returning it into an array to simply pull it out from $row[0].

I also wrote another sniff that ensured that when using references in a foreach loop, that we unset the by-reference variable before reusing it (so as to make sure we didn't accidentally overwrite part of the initial array).

I think using PHP CodeSniffer this way is awesome. It takes care of so much stuff and when adding in reflection on classes it's amazing what can be achieved in just a few lines of code.

I'd love to hear if you have ever solved a static analysis problem with PHP CodeSniffer!