Of PHP, Java, hammers and nails
First of all, this is neither a PHP-bashing topic nor a Java evangelizing one. Or vice-versa :) Even if there seem to be more people that say PHP sucks than people that say Java stinks. But after all, stubborn preference for a programming language is a matter of opinion or maybe a side-effect of Murphy's law - ”When all you have is a hammer everything starts to look like a nail.”
Secondly, I use Java as an exemplification because I can speak from experience. But I strongly suspect that the same arguments could apply to platforms like .NET or Ruby on Rails.
That being said, let's take a look at ye olde HelloWorld program with a twist. Let's use an array for storing the user's name and age:
<?php
$user[]=$argv[1];
$user[]=$argv[2];
echo "hello ".$user[0].", you are ".$user[1]." years old";
?>
Now to translate this into Java. The command-line arguments spare us the trouble of writing a servlet. First attempt:
public class HelloWorld {
public static final void main(String[] args){
String[] user=new String[2];
user[0]=args[0];
user[1]=args[1];
System.out.println("hello "+user[0]+", you are "+user[1]+" years old");
}
}
But there's already a problem. The age should be a number, not a string. So do we do this?
Object[] user=new Object[2];
user[0]=args[0];
user[1]=Integer.parseInt(args[1]);
Rather ugly.
What if the PHP program extends the array by adding more names and ages, e.g. in an user-controlled loop? We'd have to use a List in Java. What if we have keys in PHP, like this:
$user['name']=$argv[1];
$user['age']=$argv[2];
We'd have to use a Map!
At this point, the versatility of PHP arrays smacks us in the face, together with the realization that Java is object-oriented so we'd better create an User class:
class User{
String name;
int age;
}
...
User user=new User();
user.name=args[0];
user.age=Integer.parseInt(args[1]);
System.out.println("hello "+user.name+", you are "+user.age+" years old");
and then have a List<User> or Map<User> if necessary.
But all we wanted is a simple HelloWorld program and this starts to look complicated. The types of the variables, the size of the array, we have to bother with all these things. And if, God forbid, the name and age were taken from an HTTP request and we had to write a Java servlet, then all those objects and methods (request, response, writer, doGet, doPost) would come into play. All this to accomplish the same thing as 4-5 lines of PHP code??
This is the main reason why PHP has gained such a huge popularity. It's SIMPLE. It does so much behind the scenes, that the developer has very few worries. No types for variables and no explicit request/response operations – just regular arrays, like this:
$user[]=$_REQUEST['name']
Cool! Add to this a lot of convenient runtime functions/classes, and you can write a whole Web site in a few days!
But then maybe the site grows, the arrays collect so many things you can't remember which is which, and you start having ugly runtime errors – appearing only when somebody strays off the regular usage scenario, so they were not detected beforehand. Your newer code accidentally reuses (rewrites) some global variables and again, you might only see this when the site is up and running.
Maybe those strings, integers, classes and collections of classes would help you be more organized. Maybe those free Java/Java EE IDEs, with their hoards of features and plug-ins, would also help you be more organized. Maybe having compile-time errors would prevent some of the boo-boo-s.
Of course, PHP4 came with its own object-oriented model and PHP5 enhanced it. This model is very similar to the Java one (well, to the SmallTalk one if you want to be historically accurate). Even so, many PHP developers prefer the versatile arrays because they are much easier to use. In my opinion arrays are one of PHPs greatest strengths and weaknesses at the same time. Together with the inherent characteristics of a weakly-typed/interpreted/mostly procedural language, arrays contribute to the speed of development - and facilitate poorly structured applications.
Because there comes a time when maintainability and reliability must be taken into account, at the expense of coding speed. After all, the time gained when developing will be lost later on – debugging and fixing problems. And the cost will be way higher.
For small to medium applications, PHP may be a very good choice. No point in bringing in the heavy artillery. Go past a certain code size though (or past a certain size of the development team) and you might need some shiny howitzers. Or just a bigger hammer for those pesky nails :)
What is automated software translation and why should I care?
Your code is your asset
Software development is not easy. And not because programming in itself is rocket science. As an application grows in size, things get increasingly complex. The software industry is still young and we have a low level of standardization in the software development process. Let's face it, each company does things differently and takes pride in doing so. And we could go on.Yet have you ever found yourself asking Why did we code product X in that programming language? Why didn't we choose the other language instead? But, traditionally, the wisest choice is to stick with what you already have and avoid rewrite as hell. Everyone is saying this.
An alternative to manual rewrite
A software migrator is a complex tool that automatically translates applications written in a source language into semantically equivalent applications that use a different (target) language. The goal is to maintain the original look and behavior to the largest possible extent.Once the codebase has been successfully migrated, no one says that it has to stay frozen that way. In fact one of the reasons to migrate is to take advantage of the target language's ecosystem - features, available tools, specialists, community. So you can now refine, refactor, develop new features while resting assured that you are not ruining what you already had working.
A clean sheet, but without a catch
Automated software translation is a fast way to start over with a clean sheet, but without losing what you already have worked on. It's the fastest way to rewrite your application in a better-suited language while retaining all existing functionality.Just think of it: We have this insanely fast hardware that no one could have imagined two decades ago. Huge storage space, lots of RAM too. Yet we continue to rewrite software by hand. As if still coding in assembly because one wouldn't know better. Is this really acceptable?
We don't think so and started to do something about this - at least if you're stuck with some huge PHP codebase that probably should have been written in Java in the first place.
A time to migrate
Requirements? Joe, the salesman at ABC Ltd., needs a report that correlates the volume of tea sales to the amount of rainfall reported in the county. Design? Just a sketch on half a sheet (complete with tea stain). Then Mike, the engineer kid who hacked computers in high-school, crafts a simple script that does the job. Testing? Er, just tell Mike if you have a problem, he'll fix it in five minutes - smart kid!
Time passes and ABC Ltd. has grown into ABC Inc. Joe is now leading the sales department and Mike has an MBA. The script has grown as well, with various contributors adding up to a veritable jungle (Dan liked FoxPro, Louise only knew Oracle Forms). The reports still get the job done – they correlate tea with weather, galoshes with elections and so on. But the inner workings are undocumented, sometimes poorly written, virtually impossible to understand for the uninitiated. Something should be done, but nobody dares voice the thought – if it ain't broken why fix it, right?
This slightly cartoonish scenario is not that uncommon. Most applications live somewhere in-between total control and total chaos. Most of them will grow over time – the better they perform, the more features get added inside. They will be all things to all people, and the new things will have to be put in quickly. There's no time to plan refactorings, redesigns – the users want the stuff yesterday, the managers are pressuring, the developers do not risk their necks when deadlines are looming.
Technology changes too. From mainframes to dot-coms, in not so many years. Applications need to interface with everything except the kitchen sink (“in fact, if we use JINI.....”). They need to run fast, to scale well – maintainability is inevitably pushed into the background. Until the whole thing is like a card house ready to collapse when adding this one more little feature.
Finally, after the new SOAP module causes havoc, someone decides enough is enough. Resources are budgeted for a major re-engineering. Or better yet, a complete rewrite with a change of programming language(s), since Mike's scripting language of choice was not meant for ERPs and nobody knows FoxPro anymore – not even Dan, who became a wandering monk.
I'd hate to be in the shoes of a team that attempts to accomplish so many things at once. And is burdened with high expectations as well – from all stakeholders including the users who are familiarized with the old application even if they keep groaning about it. “Software migration” is not a syntagm to be spoken lightly, it makes veterans shudder with unpleasant memories. If somehow the process could be more controlled, to keep the managers from going gray. If somehow the transition to another programming language/technology could be separated from the structural/conceptual changes, to keep the developers from going gray. If only the resulting application would look and behave at least passably similar to the original, to keep the users from going gray.
But that's an idea for another post. Joe wants to correlate UFO sightings with chocolate sales, so I'd better run and write him Yet Another Ruby on Rails Tool ;-)
Automated translations from PHP to Java
The shortest route from PHP to Java
Some of the risks inherent to any software migration are avoided by using an automated translation tool. Our PHPtoJava product performs variable type inference, objectualization and other operations in a uniform manner, the resulting appearance and behavior being identical to what the users already know.
Of course, the human factor still plays an important role in the post-translation phases: application fine-tuning and functional testing. The speed and accuracy of the entire process surpass however those of a manual translation.
One of the applications we have migrated this way is the well-known forum engine phpBB. The translation result, nBB2, powers our own forum and was recently donated to the open-source community as a SourceForge project.
There are many technical and business aspects related to software migrations in general and from PHP to Java in particular. The purpose of this blog is to discuss them.