Open-source Web applications, PHP vs. Java (Part 1 of 2)
It is common knowledge that PHP does well in the open-source Web applications space. PHP has numerous representatives for most application categories, and for some it provides a clear leader, like WordPress. On the other hand, most Java counterparts have apparently failed to reach the same popularity.
Below is an overview of the existing open-source PHP and Java implementations for the following categories of applications: forum, blog, wiki and content management systems (CMS). These are among the most commonly used types of on-line software.
Forum Engines
There are lots of PHP forum engines in the open-source software arena, including:
-
phpBB which has 700+ mods and is the winner of the SourceForge.net 2007 Community Choice Awards for the Best Project for Communications category,
-
vBulletin with 1000+ mods,
-
punBB having 300+ projects and 150+ styles.
Notice the large amount of plugins for each of these projects. This denotes a large and healthy user base, devoted to both using and extending the core application.
Java has the JForum and JavaBB projects, but they seem to have very few users by comparison, not to mention plugin contributors.
Blog Engines
In the area of blogging, the PHP based WordPress is extremely well spread. Chances are that virtually any blog you read is powered by WordPress. The project has 2000+ plugins and at least 5 dedicated printed books. No other PHP blog engine comes close to the popularity of WordPress.
For Java, there are two relevant open-source blog applications: Apache Roller and Pebble. Roller is more feature-rich, but that comes at a considerable price: large footprint and difficult configuration. It is an enterprise application focused on very large blogging sites (e.g. the Sun blogs), but it seemingly fails to satisfy small-scale needs. I have yet to find a plugin developer community around it.
Pebble on the other hand is focused on the other end of the spectrum, providing a much simpler configuration and lower footprint. But I still couldn't find plugins.
Wiki Engines
In this category, PHP seems to have an overwhelming advantage. The MediaWiki project powers Wikipedia, the largest wiki by far. The project has lots of available extensions, of which 300+ are stable. There are lots of other PHP wiki engines one can choose from, but I just wanted to point out the most prominent one.
Java does not have too many production-ready wiki projects. In my opinion, JspWiki (with 50+ plugins) and XWiki are the most relevant. I wanted to mention SnipSnap as well, however it's development is officially stopped.
Content Management Systems
Content management systems are a handy way of building dynamic sites instead of starting from scratch. PHP seems to provide everything one needs, including a healthy competition among its foremost projects. These are Joomla (2900+ extensions, 14+ books published) and Drupal (3600+ modules, 9+ books). There are also other CMS projects e.g. Mambo and the ancient PHP-Nuke.
Because in most cases CMSs are deployed on a dedicated server, Java should not be at a disadvantage in this category. There are several open-source Java CMSs to choose from: Apache Jackrabbit, Apache Lenya, Alfresco, Liferay, OpenCms, Nuxeo, Magnolia, Jahia etc. Among these, Alfresco (2 printed books and 20+ stable extensions) and Liferay (portlets based, 1 printed book, 25+ portlet plugins) seem the more popular.
A special note for the Java Content Repository API defined by JSR-170. Both Jackrabbit and Magnolia implement it. This means that third-party tools can access their repositories in a standardized way. It is a very good step towards ensuring that information stored inside a JCR compliant repository can outlive a particular JCR implementation.
For now, the JCR API does not seem enough however. The PHP CMS projects are continuously gaining ground, probably because PHP hosting is cheap and easy to set up, and because the projects themselves are highly usable. Take into account that DZone and implicitly JavaZone (the former JavaLobby) are running on Drupal. And I'm sure that Rick and Matt tried to choose the best option while not easily dismissing the Java CMSs.
In the second part of this post, I will try to explore the reasons for the current state and to figure out a way to change it. Meanwhile feel free to add your input to the above, the list of projects is far from exhaustive.
JavaOne & open-source voting
We shall be showcasing our nTile PtoJ automated migrator - presenting features & technical benefits as well as the economical feasibility of automated migration.
Apart from that, we want to raise awareness about our open-source initiatives. nBB2 is just a first step in this direction, in the future we want to bring more popular PHP projects into the Java™ world.
We need your opinion regarding these open-source projects. Which of them would you like to see translated into Java™, either from the point of view of regular usage or even in order to contribute to its further development? Cast your vote at http://www.numiton.com/vote.
Hosting Java Web applications
The benchmark
Our company has recently launched the Java translation of phpBB 2, branded as nBB2. It's open-source and you can get it from nbb2.sourceforge.net. So what I had at hand was the Java equivalent of a popular PHP forum engine. Same look, same behavior, only backed by a different programming language. If one wants to compare the hosting options for small-to-medium PHP and Java Web applications, the phpBB/nBB2 pair looked to me like a good test material.
My reasoning was to use a Java application that is the very close counterpart of something that runs on even the cheapest PHP hosting options. By "close counterpart" I mean they both carry out the same operations on system resources (e.g. writing files, using sockets, accessing databases).
Another concern I had was performance. I wanted to find out not only whether the Java application can run on a certain hosting plan, but also how fast. Again, the most reasonable benchmark was to compare performance with the original PHP application running in the same environment.
I started by setting up a testing scenario using Apache JMeter which performs the following workflow: main page → login page → main page (automatic redirect) → view forum → view topic → post reply → log out → main page (automatic redirect).
This scenario is executed by 10 concurrent threads (users), 30 times each. This produces 300 new topic posts at the end of the performance test. All JMeter project files are available on the CVS here.
Reference tests
Before using the hosting providers, I conducted reference testing on two of our machines (one Linux, one Windows). The results were consistent: phpBB 2 was 10-15% faster than nBB2.
I ran PHP 5.2.3 through the Apache 2.x PHP module, a pretty common setup. For nBB2, I used Apache Tomcat 5.5 (on J2SE 6) and I imposed a heap limit of 32 MB, close to what one can expect in a remote hosting environment. More performance tuning could produce better nBB2 results, but this was enough to prove that the performance of the two applications is comparable.
The JMeter aggregate reports are also on the CVS. In case you wonder why Java doesn't blow out PHP for speed (or vice-versa), do not forget that both applications share the same architecture.
Shared hosting
For Java Web applications, the basic hosting options I have encountered feature a setup in which several users share the same instance of Tomcat 5.x. Perhaps Caucho Resin would be a better choice, but that's another discussion.
After many attempts, I gave up on GoDaddy's cheap Linux-based Java hosting account (the Deluxe Plan with $6.99 per month). I can only say what many others do, that it isn't suitable for anything but the simplest servlets or JSP pages.
Their JVM is set up to be very restrictive. As a rule of thumb, any part of the Java API that requires security permissions will probably not work. Think of write access to your own account (or /tmp), reading of system properties, creation of URL stream handlers, FTP connections. Before dismissing the importance of write access, think about logging. You'll be forced to log to a database. Log4j JDBCAppender is your friend, but it can't replace a simple log file. But I digress.
Other basic hosting accounts forbid the use of third-party JARs, like the case of HostIgnition's StarterFire plan.
Deployment/redeployment is another issue, since you don't have direct access to the JVM and so you cannot restart it at your convenience. According to GoDaddy's documentation, they have scheduled restarts each night at 1 AM Arizona time. In practice, they seem to do restarts more often than that. Other providers have on demand restarts. Each user can request a certain number of restarts each day. The problem is that other users you are sharing the JVM with can temporarily take down your application.
Shared hosting using a private JVM was the next logical step in my quest. This took the monthly charge well over $10 (e.g. $14.95 for HostIgnition's LiteFire with the Level 2 JSP/Servlets option). At this level one needs to get used to 32 MB of heap, at best. Above that, prices increase steeply (e.g. $59.95 for 64 MB of heap at HostIgnition). Having your own JVM, you may restart it at your convenience and can see even the Tomcat logs directory through FTP.
But even using the private JVM doesn't mean that all will go well. For instance, on my test Hostignition LiteFire account, I occasionally encountered the java.lang.OutOfMemoryError: GC overhead limit exceeded error during performance testing. This is documented to occur when the -XX:-UseGCOverheadLimit JVM option is enabled, more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered. Now this could happen if a certain part of the heap resided in swap and could not be restored to RAM in a time-effective manner.
On a positive note, the performance of nBB2 slightly exceeded that of phpBB on my HostIgnition account - configuration options do help.
After all this, I see the shared hosting with private JVM as the cheapest viable Java hosting alternative. While the basic hosting services just won't do, this one is reasonable but starts to be a little pricey.
Virtual Private Servers (VPS)
After finding a reasonable Java Web hosting solution, I thought about going upwards in the price range. The next step is represented by the Virtual Private Servers. You get your own machine (Linux or Windows) and have full root/admin access through SSH/Remote Desktop. The catch is that all this is virtualized and you are sharing a physical machine with others. Depending on how many users are allocated to the same physical machine and on what they are doing at a given time, performance varies.The downside is that, besides the JVM, you are in charge of the whole operating system. Setting up and maintaining the servers for DNS, email, database, Web etc. is no small task.
My Linux-based GoDaddy VPS (Economy Plan, $30+) had enough RAM (256 MB RAM + 1 GB of swap), but experienced occasional performance drops. This was really bad. I thought I'd try a different VPS provider.
On a LunarPages VPS also running on Linux, things were much better and consistent on the performance side. The problem was that I had only 512 MB of memory available, including swap space. I'd say that on a real machine this would be the equivalent of 128 MB RAM and 384 MB of swap. This memory was eaten up pretty fast by the running services and I had difficulties getting a JVM to fully use its 32 MB heap space (in client mode, since the server mode requires additional non-heap memory). This ate up all the memory and /proc/meminfo was consistently reporting low to critical memory resources. Under these circumstances, even simple Cron jobs were failing due to out of memory errors. Naturally, the JVM also required daily restarts, otherwise I'd get OutOfMemory errors there as well.
Had I skipped the Plesk administrative interface and disabled some non-essential services, I might have been able to fully use the 32 MB of heap space and have some memory to spare. But it seemed pointless to pay twice the price of a private JVM on a shared host to get the same level of performance at best.
Performance-wise, when the Java application DID work, the results were consistent with what I had obtained on our local machines.
Dedicated Hosting
It surely works great, but it is in a completely different price range. I find a $100+ monthly charge to be rarely justified for low to medium traffic sites running small sized Web applications. The JVM is snappy, while experiencing the same comparative performance as on my local tests. Memory is no issue since all dedicated plans come with at least 512 MB of RAM and a swap space for which you have exclusive use. It's a physical machine for your use only.
The conclusion
With all this effort, I managed to find out nothing new :-) Java runs well in an enterprise environment (good up-scaling) and lags outside of it (bad down-scaling). The memory overhead of the JVM translates into memory problems for the applications in a budget-friendly hosting environment, making the comparable performance (when/while things do run) a small consolation.
But PHP works well outside the enterprise. Its simple development and deployment life-cycles are ideal for shared hosting environments, just as if the language was been built with this in mind – well, actually, it was.
While Java comes out a little bruised from this comparison, future evolutions might change the balance of power. If the MVM (multi-tasking virtual machine) from JSR 121 that promises to run multiple isolated virtual machines inside the same JVM process will ever be released, this could make it easy and cost-effective for hosting providers to give full access to a JVM per account even in their basic hosting plans. Another possibility is for the embedded J2SE platform to be used by hosting providers, if it catches on.