It's Friday - what a day to visit jfoobar!

 

Continuous Integration in PHP; Duplicate code detection

WTF image, taken from http://www.osnews.com/comics

Image taken from http://www.osnews.com/comics

After I described the building blocks for a PHP continuous build environment I went into more detail on the PHP code sniffer features to validate against coding standards. When I described the building blocks I focused on the minimum assets for such an environment.

In todays article I want to describe the use of other PHP tools that can be used to increase the quality of your code. Today I will take a closer look at a Copy/Paste Detector (CPD) for PHP code. The goal of phpcpd is not not to replace more sophisticated tools such as PHP Code sniffer but rather to provide an alternative to them when you just need to get a quick overview of duplicated code in a project. Tools in this category can be integrated into the continuous build environment, but personally I would suggest using them additionally to improve your code quality, and not as part of the main logic that determines if a build is successful or not...it is of course up to you to include it in your environment.

Code duplication is generally considered a mark of poor or lazy programming style. Good coding style is generally associated with the reuse of code. In some situations it may be slightly faster to develop by duplicating code, because the developer need not concern himself with how the code is already used or how it may be used in the future. The difficulty is that original development is only a small fraction of a product's life cycle, and with code duplication the maintenance costs are much higher, and not to mentioned when bugs are in the duplicated code. A good reason to analyse your code if you care about code quality in general and maintenance costs specifically.

The PHP Copy/Paste Detector is also a tool written by Sabastian Bergmann who also is the author of PHP Unit. After installation of phpcpd you can run it from the command line, the syntax is pretty straight forward, a short overview:


willebil@willebil-desktop:/var/www/phpmd$ phpcpd
phpcpd 1.1.1 by Sebastian Bergmann.

Usage: phpcpd [switches] <directory>
       phpcpd [switches] <file>

--log-pmd <file>         Write report in PMD-CPD XML format to file.

--min-lines <N>          Minimum number of identical lines (default: 5).
--min-tokens <N>         Minimum number of identical tokens (default: 70).

--suffixes <suffix,...>  A comma-separated list of file suffixes to check.

--help                   Prints this usage information.
--version                Prints the version and exits.
                        (report is also written to screen)

I will use the same Joomla 1.6 codebase (trunk/libraries/joomla/) as in the coding standards article to do the initial analyses. Let us fire up the analyses...


willebil@willebil-desktop:/var/www$ phpcpd j16trunk/libraries/joomla |more
phpcpd 1.1.1 by Sebastian Bergmann.

Found 24 exact clones with 484 duplicated lines in 17 files:

  - client/ftp.php:1189-1195
    client/ftp.php:1218-1224

  - access/permission/simplerule.php:297-305
    access/permission/accesslevel.php:349-357
.
.
.
  - installer/adapters/component.php:68-76
    installer/adapters/component.php:1429-1437

  - installer/adapters/module.php:172-183
    installer/adapters/component.php:1455-1466

0.82% duplicated lines out of 59292 total lines of code.

This outcome tells us that around 5000 lines of code in the Joomla application framework code is marked as duplicate in around 40 files. The ideal situation is that 0% of your code is marked duplicate, and there are situations that are not a problem. Let us take a random example out of the report that is generated, we take the "client/ftp.php" file that is the first entry in the report. When you take a look at that code you instantaneously see why the tools marked this as duplicate code (lines 1185-1245).


/*
 * Here is where it is going to get dirty....
 */
if ($osType == 'UNIX') {
    foreach ($contents as $file) {
        $tmp_array = null;
        if (ereg($regexp, $file, $regs)) {
            $fType = (int) strpos("-dl", $regs[1] { 0 });
            //$tmp_array['line'] = $regs[0];
            $tmp_array['type'] = $fType;
            $tmp_array['rights'] = $regs[1];
            //$tmp_array['number'] = $regs[2];
            $tmp_array['user'] = $regs[3];
            $tmp_array['group'] = $regs[4];
            $tmp_array['size'] = $regs[5];
            $tmp_array['date'] = date("m-d", strtotime($regs[6]));
            $tmp_array['time'] = $regs[7];
            $tmp_array['name'] = $regs[9];
       }
        // If we just want files, do not add a folder
        if ($type == 'files' && $tmp_array['type'] == 1) {
            continue;
        }
        // If we just want folders, do not add a file
        if ($type == 'folders' && $tmp_array['type'] == 0) {
            continue;
        }
        if (is_array($tmp_array) && $tmp_array['name'] != '.' && $tmp_array['name'] != '..') {
            $dir_list[] = $tmp_array;
        }
    }
}
elseif ($osType == 'MAC') {
    foreach ($contents as $file) {
        $tmp_array = null;
        if (ereg($regexp, $file, $regs)) {
            $fType = (int) strpos("-dl", $regs[1] { 0 });
            //$tmp_array['line'] = $regs[0];
            $tmp_array['type'] = $fType;
            $tmp_array['rights'] = $regs[1];
            //$tmp_array['number'] = $regs[2];
            $tmp_array['user'] = $regs[3];
            $tmp_array['group'] = $regs[4];
            $tmp_array['size'] = $regs[5];
            $tmp_array['date'] = date("m-d", strtotime($regs[6]));
            $tmp_array['time'] = $regs[7];
            $tmp_array['name'] = $regs[9];
        }
        // If we just want files, do not add a folder
        if ($type == 'files' && $tmp_array['type'] == 1) {
            continue;
        }
        // If we just want folders, do not add a file
        if ($type == 'folders' && $tmp_array['type'] == 0) {
            continue;
        }
        if (is_array($tmp_array) && $tmp_array['name'] != '.' && $tmp_array['name'] != '..') {
            $dir_list[] = $tmp_array;
        }
    }
}

The code itself is not very efficient and can be rewritten pretty simple. Let us rewrite this section, and then re-run the analyzer. This is an example on how this code could be optimized.


/*
 * Here is where it is going to get dirty....
 */
if ($osType == 'UNIX' || $osType == 'MAC') {
    foreach ($contents as $file) {
        $tmp_array = null;
        if (ereg($regexp, $file, $regs)) {
            $fType = (int) strpos("-dl", $regs[1] { 0 });
            //$tmp_array['line'] = $regs[0];
            $tmp_array['type'] = $fType;
            $tmp_array['rights'] = $regs[1];
            //$tmp_array['number'] = $regs[2];
            $tmp_array['user'] = $regs[3];
            $tmp_array['group'] = $regs[4];
            $tmp_array['size'] = $regs[5];
            $tmp_array['date'] = date("m-d", strtotime($regs[6]));
            $tmp_array['time'] = $regs[7];
            $tmp_array['name'] = $regs[9];
        }
        // If we just want files, do not add a folder
        if ($type == 'files' && $tmp_array['type'] == 1) {
            continue;
        }
        // If we just want folders, do not add a file
        if ($type == 'folders' && $tmp_array['type'] == 0) {
            continue;
        }
        if (is_array($tmp_array) && $tmp_array['name'] != '.' && $tmp_array['name'] != '..') {
            $dir_list[] = $tmp_array;
        }
    }
}

The power of continuous integration would be proven here if there would have been Unit tests available for this class. You then can validate if the class behavior has not changed, and with this improvement you would indeed have simplified the code-base so that maintenance would be less work. I am aware this is just a simple example of how you can put this tool at use, I have found out that these tools give you a new and very interesting perspective on your own code when you run them...in the end I am convinced the code quality improves, the number of bugs get lower and with that you will have more time to work on the next state of the art program, instead of hunting bugs you created ;-)

In the next article of this series I will take a look at other tools that can help you improve the code quality, and with that I will give an overview of all tools handled so far in the blog series about continuous integration.

About the author Wilco Jansen

Wilco was born in 1967 in the Netherlands where he still lives. After years of being a programmer Wilco has worked as project manager and IT manager. Discovered Joomla! when he was creating his own content management system, and never lost focus after then. Joined core team as development coordinator in May 2006 just helping to make Joomla! even better then it is already. Wilco has been deeply involved in the Joomla project as Google summer of code program manager 2006, 2007 and 2008 editions, co-organizer of the Google Highly Participation contest in 2008, first ever development coordinator, creator of the Joomla bug squad, member of the board of Open source matters, regular speaker on world wide conference advocating Joomla and much, much more. Wilco has a bachelor degree in business and information engineering and studied Master of Science knowledge and information engineering at the Middlesex University in London.

More about Wilco Jansen

Like it? Share it!

There are 0 comments posted.

Help for creating beautiful comments.

Enter Your Details:
Enter Your Comments:
I'm finished with the form Your form will be checked and you'll get a preview.
moovur promo

Blogging team

We have a team that works on the blogs presented on this site. Below you will find all present members who are actively working on blogs on this site.


Please contact us if you are interested in helping us out with the creation of the blogs.

Post translations

jfoobar has readers from all over the world and in many languages. If you create a translation of one of our posts and link to it than please let us know so we can add a link back to the translation at the original post.

JFoobar friends on Twitter

Follow JFoobar on twitter

Sponsored Links

Latest Comments

Aaron wrote:
2009-12-23 13:19:22 - Genius! Thanks, Wilco. I've been dying to take .
Posted in How to downlo .
Amy Stephen wrote:
2009-12-22 18:39:37 - Happy Birthday to one of Joomla!'s most noble - .
Posted in Mister Joomla .
Antonie de Wilde wrote:
2009-12-22 09:30:26 - Congrats Robin. Have a good day and watch out w .
Posted in Mister Joomla .
Robert wrote:
2009-12-22 08:51:02 - Happy Birthday Robin .
Posted in Mister Joomla .
Arno wrote:
2009-12-22 08:43:28 - Happy Birthday Robin, love your suit, you wife .
Posted in Mister Joomla .
Brian Teeman wrote:
2009-12-22 00:17:41 - Happy Birthday Robin, Welcome to the big four oh .
Posted in Mister Joomla .