Written by Wilco Jansen Monday, 14 September 2009 12:00

After I described the building blocks for a PHP continuous build environment I went into more detail on the PHP code sniffer features to validate against coding standards. When I described the building blocks I focused on the minimum assets for such an environment.
In todays article I want to describe the use of other PHP tools that can be used to increase the quality of your code. Today I will take a closer look at a Copy/Paste Detector (CPD) for PHP code. The goal of phpcpd is not not to replace more sophisticated tools such as PHP Code sniffer but rather to provide an alternative to them when you just need to get a quick overview of duplicated code in a project. Tools in this category can be integrated into the continuous build environment, but personally I would suggest using them additionally to improve your code quality, and not as part of the main logic that determines if a build is successful or not...it is of course up to you to include it in your environment.
Code duplication is generally considered a mark of poor or lazy programming style. Good coding style is generally associated with the reuse of code. In some situations it may be slightly faster to develop by duplicating code, because the developer need not concern himself with how the code is already used or how it may be used in the future. The difficulty is that original development is only a small fraction of a product's life cycle, and with code duplication the maintenance costs are much higher, and not to mentioned when bugs are in the duplicated code. A good reason to analyse your code if you care about code quality in general and maintenance costs specifically.
The PHP Copy/Paste Detector is also a tool written by Sabastian Bergmann who also is the author of PHP Unit. After installation of phpcpd you can run it from the command line, the syntax is pretty straight forward, a short overview:
willebil@willebil-desktop:/var/www/phpmd$ phpcpd
phpcpd 1.1.1 by Sebastian Bergmann.
Usage: phpcpd [switches] <directory>
phpcpd [switches] <file>
--log-pmd <file> Write report in PMD-CPD XML format to file.
--min-lines <N> Minimum number of identical lines (default: 5).
--min-tokens <N> Minimum number of identical tokens (default: 70).
--suffixes <suffix,...> A comma-separated list of file suffixes to check.
--help Prints this usage information.
--version Prints the version and exits.
(report is also written to screen)
I will use the same Joomla 1.6 codebase (trunk/libraries/joomla/) as in the coding standards article to do the initial analyses. Let us fire up the analyses...
willebil@willebil-desktop:/var/www$ phpcpd j16trunk/libraries/joomla |more
phpcpd 1.1.1 by Sebastian Bergmann.
Found 24 exact clones with 484 duplicated lines in 17 files:
- client/ftp.php:1189-1195
client/ftp.php:1218-1224
- access/permission/simplerule.php:297-305
access/permission/accesslevel.php:349-357
.
.
.
- installer/adapters/component.php:68-76
installer/adapters/component.php:1429-1437
- installer/adapters/module.php:172-183
installer/adapters/component.php:1455-1466
0.82% duplicated lines out of 59292 total lines of code.
This outcome tells us that around 5000 lines of code in the Joomla application framework code is marked as duplicate in around 40 files. The ideal situation is that 0% of your code is marked duplicate, and there are situations that are not a problem. Let us take a random example out of the report that is generated, we take the "client/ftp.php" file that is the first entry in the report. When you take a look at that code you instantaneously see why the tools marked this as duplicate code (lines 1185-1245).
/*
* Here is where it is going to get dirty....
*/
if ($osType == 'UNIX') {
foreach ($contents as $file) {
$tmp_array = null;
if (ereg($regexp, $file, $regs)) {
$fType = (int) strpos("-dl", $regs[1] { 0 });
//$tmp_array['line'] = $regs[0];
$tmp_array['type'] = $fType;
$tmp_array['rights'] = $regs[1];
//$tmp_array['number'] = $regs[2];
$tmp_array['user'] = $regs[3];
$tmp_array['group'] = $regs[4];
$tmp_array['size'] = $regs[5];
$tmp_array['date'] = date("m-d", strtotime($regs[6]));
$tmp_array['time'] = $regs[7];
$tmp_array['name'] = $regs[9];
}
// If we just want files, do not add a folder
if ($type == 'files' && $tmp_array['type'] == 1) {
continue;
}
// If we just want folders, do not add a file
if ($type == 'folders' && $tmp_array['type'] == 0) {
continue;
}
if (is_array($tmp_array) && $tmp_array['name'] != '.' && $tmp_array['name'] != '..') {
$dir_list[] = $tmp_array;
}
}
}
elseif ($osType == 'MAC') {
foreach ($contents as $file) {
$tmp_array = null;
if (ereg($regexp, $file, $regs)) {
$fType = (int) strpos("-dl", $regs[1] { 0 });
//$tmp_array['line'] = $regs[0];
$tmp_array['type'] = $fType;
$tmp_array['rights'] = $regs[1];
//$tmp_array['number'] = $regs[2];
$tmp_array['user'] = $regs[3];
$tmp_array['group'] = $regs[4];
$tmp_array['size'] = $regs[5];
$tmp_array['date'] = date("m-d", strtotime($regs[6]));
$tmp_array['time'] = $regs[7];
$tmp_array['name'] = $regs[9];
}
// If we just want files, do not add a folder
if ($type == 'files' && $tmp_array['type'] == 1) {
continue;
}
// If we just want folders, do not add a file
if ($type == 'folders' && $tmp_array['type'] == 0) {
continue;
}
if (is_array($tmp_array) && $tmp_array['name'] != '.' && $tmp_array['name'] != '..') {
$dir_list[] = $tmp_array;
}
}
}
The code itself is not very efficient and can be rewritten pretty simple. Let us rewrite this section, and then re-run the analyzer. This is an example on how this code could be optimized.
/*
* Here is where it is going to get dirty....
*/
if ($osType == 'UNIX' || $osType == 'MAC') {
foreach ($contents as $file) {
$tmp_array = null;
if (ereg($regexp, $file, $regs)) {
$fType = (int) strpos("-dl", $regs[1] { 0 });
//$tmp_array['line'] = $regs[0];
$tmp_array['type'] = $fType;
$tmp_array['rights'] = $regs[1];
//$tmp_array['number'] = $regs[2];
$tmp_array['user'] = $regs[3];
$tmp_array['group'] = $regs[4];
$tmp_array['size'] = $regs[5];
$tmp_array['date'] = date("m-d", strtotime($regs[6]));
$tmp_array['time'] = $regs[7];
$tmp_array['name'] = $regs[9];
}
// If we just want files, do not add a folder
if ($type == 'files' && $tmp_array['type'] == 1) {
continue;
}
// If we just want folders, do not add a file
if ($type == 'folders' && $tmp_array['type'] == 0) {
continue;
}
if (is_array($tmp_array) && $tmp_array['name'] != '.' && $tmp_array['name'] != '..') {
$dir_list[] = $tmp_array;
}
}
}
The power of continuous integration would be proven here if there would have been Unit tests available for this class. You then can validate if the class behavior has not changed, and with this improvement you would indeed have simplified the code-base so that maintenance would be less work. I am aware this is just a simple example of how you can put this tool at use, I have found out that these tools give you a new and very interesting perspective on your own code when you run them...in the end I am convinced the code quality improves, the number of bugs get lower and with that you will have more time to work on the next state of the art program, instead of hunting bugs you created ;-)
In the next article of this series I will take a look at other tools that can help you improve the code quality, and with that I will give an overview of all tools handled so far in the blog series about continuous integration.
Wilco was born in 1967 in the Netherlands where he still lives. After years of being a programmer Wilco has worked as project manager and IT manager. Discovered Joomla! when he was creating his own content management system, and never lost focus after then. Joined core team as development coordinator in May 2006 just helping to make Joomla! even better then it is already. Wilco has been deeply involved in the Joomla project as Google summer of code program manager 2006, 2007 and 2008 editions, co-organizer of the Google Highly Participation contest in 2008, first ever development coordinator, creator of the Joomla bug squad, member of the board of Open source matters, regular speaker on world wide conference advocating Joomla and much, much more. Wilco has a bachelor degree in business and information engineering and studied Master of Science knowledge and information engineering at the Middlesex University in London.
More about Wilco Jansen
We have a team that works on the blogs presented on this site. Below you will find all present members who are actively working on blogs on this site.
Please contact us if you are interested in helping us out with the creation of the blogs.
jfoobar has readers from all over the world and in many languages. If you create a translation of one of our posts and link to it than please let us know so we can add a link back to the translation at the original post.
Copyright © 2008 jfoobar - All Rights Reserved - Joomla! is a registered trademark of Open Source Matters, Inc - Disclaimer