The DOM traversal code is about four times faster than the XPath code. I'm going to use the XPath code, though.
That 4x speed multiple translates into about a half second to execute for the XPath code and about 0.13 seconds to execute for the DOM code when each is run 10,000 times. Since a typical use of this code will involve it running maybe 10 or 20 times during a request, I'm happy to sacrifice a few microseconds of processor time in exchange for simpler code. (Since, to me, the XPath expression is simpler and has the handy property of scooping up all the nodes that match -- the DOM code would have to be modified to loop through multiple nodes in $firstNode to get that behavior. If XPath gives you the willies and you prefer using the DOM functions, then you're in great shape.)
I have used simpleXML on two versions of a topic map viewer application that reads directly from an xml file. In addition, I added the computational burden of faceted classification. I've found the xpath approach to be 6X faster overall than the foreach() approach.
More precisely, I use a mix of the two.
If your function needs to scan the whole file, xpath will work well. If not, use foreach() and stop when done. It will be faster. By design, xpath will always scan the whole file.
When Apache has to examine static HTML files for things like PHP opening tags, it isn't simply a case of "oh well, we didn't find any, therefore there's little performance loss". When Apache doesn't have to parse the file for things like that, it can use the sendfile() syscall, which is not a negligable performance boost.
He's making the mistake of only looking at the performance of the PHP parser, and missing the ramifications of involving it at all.
Careful, Jim, or you'll fall into the same trap of only looking at the performance of one part of the system.
Let's say that the sendfile() optimization you mention lets your $500 server send 20 requests per second instead of 10. As your traffic grows from 10rps to 20rps, you've saved $500 since you don't have to buy a new server.
But simultaneously, perhaps a few employees spend, collectively, 5 extra hours over the same time period fixing bugs, writing extra code, or otherwise focusing on issues that arise because pages have the wrong extensions or their extensions changed. Oops, there went the $500 you saved on the server. (Along with the opportunity cost of whatever else those 5 hours could have been spent on.)
Of course, I've just made these numbers up, and YMMV, and you also need to sink time into measuring the performance difference and blah blah blah, but I think Adam's basic point -- "efficiency" needs to be a holistic measure incorporating not only raw computer performance, but the time/money/functionality balance across your entire organization/project -- is a good one.
> Adam's basic point -- "efficiency" needs to be a holistic measure incorporating not only raw computer performance, but the time/money/functionality balance across your entire organization/project -- is a good one.
Oh definitely, I was just quibbling about his claim that its " a very small performance hit because there's no complex logic". The fact is that it's not the logic that matters when it comes to the performance hit, it's having to examine the file in the first place.
I certainly agree that CPU efficiency is often overrated at the expense of programmer efficiency, and didn't mean to imply otherwise (hence the "well-meaning" bit, which I could have made more clear I suppose).