|
sklar.com/blog
...composed of an indefinite, perhaps infinite number of hexagonal galleries...
|
Wednesday, April 25. 2007
So I've got this string (in PHP) and I need to scan through it character by character. I can't scan byte by byte because it's 2007, our users write in all sorts of languages, and the string is UTF-8.
The PHP 5 solution uses mb_strlen() to find the length and then mb_substr() to grab each character:
$j = mb_strlen($theString);
for ($k = 0; $k < $j; $k++) {
$char = mb_substr($theString, $k, 1);
// do stuff with $char
}
In PHP 6, one would do:
foreach (new TextIterator($theString, TextIterator::CHARACTER) as $char) {
// do stuff with $char
}
Some rough benchmarks on a 1500 character (and 2900 byte) string (Linux, whatever processor is inside this Thinkpad T43 here, your mileage may vary, etc etc etc) give me about 61 scans/sec with PHP 5.2.1, where a "scan" is just moving through the loop above with mb_substr and doing one if() test comparing the char to '<'
Under PHP 6.0.0-dev with unicode.semantics=on, switching from mb_strlen() and mb_substr() to regular strlen() and substr() produces about the same result. And indexing with $theString[$k] is the same speed as substr().
However, the TextIterator case is much faster, about 450 scans/sec!
Nicely done!
I upgraded a machine to Ubuntu Feisty Fawn (7.10) today and was having trouble recompiling the Cisco VPN client. Thanks to this thread, I went to this blog post and applied this patch and everything was fine.
I must admit that applying some random patch to one's VPN client inspires a moderate amount of queasiness, but fortunately the patch is simple enough to understand so one can be confident no mischief is involved.
Friday, November 3. 2006
This story from George via Andrei is simultaneously hilarious and scary.
I'm not sure, if I were a PHP function, which function I'd be. Although figuring that out would be a lot easier if my name were, for example, Max Levenshtein.
Thursday, September 28. 2006
It's been a lot of hard work, so I'm quite excited that we've just released three great new Ning sites: Ning Videos, Ning Photos, and Ning Group.
I particularly like the embeddable slideshow that Ning Photos has, and its companion in Ning Videos, the embeddable player -- so you can put photos or videos on your blog or wherever. Both apps let you e-mail in content from your phone, too. Ning Group has some spiffy HTML parsing and file upload features so you can share documents with folks and incorporate music, pics, or anything else in the forums.
Plus, all three sites have the juicy bits that every site on the Ning platform gets -- things such as cloneability, complete customization, and built-in REST APIs. I've been watching the feeds for clones of photos and videos -- I suppose seeing who's cloned sites you care about is the Web 2.0 version of ego surfing.
More on the Ning Blog and from Kyle.
Tuesday, August 29. 2006
The new edition of PHP Cookbook is on the way! I got one copy yesterday, so it should be making its way into bookstores and online-bookstore-warehouses any day now.
There is lots of new material in this edition -- completely revamped XML and OOP sections, new stuff on PDO, Ajax, testing, performance tuning, regular expressions, and lots of other goodies.
Thursday, August 24. 2006
Some neat Ning + PHP related stuff recently: Ben and Elizabeth set up a Group clone for PHPCommunity -- http://phpcommunity.ning.com.
Ben also set up an app -- http://zendfw.ning.com -- where he installed the Zend Framework and made a few tweaks so it's runnng happily on the Ning Playground. I was pleased to see that our URL mapping support can handle everything that Zend Framework needs.
Thursday, July 27. 2006
The slides from my OSCON 2006 presentation, "I'm 200, You're 200: Codependency in the Age of the Mashup," are available at http://www.sklar.com/files/I'm-200-You're-200.pdf.
Tuesday, July 25. 2006
I'm heading to OSCON today. Things I'm looking forward to (in no particular order): Portland, giving a new talk, seeing friends, learning lots of new things. If you're at (or going to) OSCON, say hi and tell me how wonderful (or stinky) you think my blog is.
Monday, July 24. 2006
XMLHttpRequest Quirks and PHP has some tips on making HTTP requests from Javascript (and is not really PHP-focused, despite the title). The brief discussion of the benefits and drawbacks of using XML, HTML, or JSON for data exchange is worthwhile, and I suppose there's nothing wrong with any of the info about the XMLHTTPRequest object, but I think if you're doing any moderately serious Javascript stuff that requires HTTP requests (see what wonderful contortions I undergo to avoid saying "Ajax"! Oops.), you've got problems if you're interacting with XMLHTTPRequest (or the IE-equivalents via "new ActiveXObject()") directly.
Instead, use a library such as Dojo or Prototype. There are a lot of subtleties to making requests work properly -- things such as cross-browser support and using different request transports based on the kind of data that needs to be sent to the server. If you start down the road of trying to do all that yourself, you'll go nuts (or at a minimum, waste your time reinventing). If you ignore all of those subtleties, your app won't work correctly.
So take advantage of the hard work someone else has already done to solve these problems. We may argue over whether real programmers use emacs or vi or Microsoft Visual InterDev 2006 for .NET Enterprise Architect Edition, but there's not much support for the "I wave a tiny magnet back and forth just-so next to the hard drive" method of editing these days. Avoid waving the tiny magnet to make HTTP requests from Javascript.
Friday, July 21. 2006

I just received a copy of PHP5. Wprowadzenie, the Polish translation of Learning PHP 5. I don't know any Polish, but if my calculations are correct (from Listing 7.13), "Kurczak generała O'Tso" is a spicy dish with chicken, dried pepper pods, maybe some peanuts, green pepper, and other goodies.
Tuesday, July 18. 2006
I live in New York City, but I'm gearing up for some extended travel. From September 1 to the end of the year, I'll be in Palo Alto, CA. I'm looking forward to working at Ning HQ for a few months. Then, from the beginning of January 2007 to the end of May, I'll be in Paris, resuming my working-remotely existence.
I have some packing to do!
Thursday, June 29. 2006
The XML spec says that in XML documents, "Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646." That is:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
This means that the control characters under 0x20 (with the exception of the Three Wise Whitespace) are not allowed.
This restriction goes all the way back to the definition of a "character" in the W3C Working Draft of November 14, 1996.
Plenty of brilliance went into crafting that document, so I must be missing something extremely obvious here: why are those control characters outlawed? What is the reasoning behind the spec preventing me from including in an XML document:
<data></data>
Preventing terminal beeps from errant Ctrl-Gs?
Monday, June 26. 2006
Over the weekend we launched the "Ningbar" -- what Diego, Brian, and the rest of the crew at Ning have been cranking away on for while.
The Ningbar is not just a replacement for the sidebar that used to accompany all Ning apps. It's the control center for getting the most out of Ning, whether you're using an app, cloning an app (which now takes just exactly 2 clicks), or writing code behind an app. In the Ningbar, you can get stats about the app you're using (who else uses it, what do your friends do on the app), check your messages or send messages to others, customize your apps, or clone the app you're looking at.
Of course (we wouldn't have it any other way!) the Ningbar is completely customizable and programmable. You can tweak every aspect of its appearance and behavior. And because the Ningbar sits on top of our open Javascript and REST APIs, what the Ningbar can do is only limited by what cool stuff you can think of to build. (This panel showing the current weather was a fun quick hack.)
Gina's post on the Ning blog has lots of screen shots and gives some more background on Ningbar goodies. As always, http://documentation.ning.com has the programming and API details for how to make the Ningbar do your bidding. In particular, check out the sections on our new Javascript APIs and interface customization.
Monday, June 19. 2006
What would regular expressions for non-text look like? I think many of the quantifiers would be similar, but the notions of "character classes" and what goes in an atom would be totally different.
Two ways of specifying bits to match for audio could be score-oriented or sample-oriented. Score-oriented "classes" could match patterns consisting of particular notes, notes in particular keys, with particular duration, particular chords, parts of particular chords, and so on. These could be built up into multi-note patterns. Other notation would look for other parts of the score -- tempo, all those Italian words that describe how to play the notes, etc.
Sample-oriented classes could match particular samples (or subsets of samples), certain rhythms, melodies, etc. With fancy enough signal processing, the classes could match against particular words being sung, a particular singer doing it, or slices that "sound like" some other slice (where "sound like" is implemented by some pluggable algorithm.
A text-based notation might be doable for the score-oriented classes and some of the sample-oriented classes but to fully extend the analogy the expression of the audio-regex could/should be with audio (or visual representations of the audio) as well -- "Find the five bars on either side of any music that sounds like **this**" where **this** is some scoring or sample that's either taken literally or has been approrpriately "expression-ified" (with audio processing filters?), or, alternatively, is a visual representation of a score that has been similarly transformed.
Extending this to video is an expansion of the "sample-oriented" audio expression language, but with visual idioms in addition to the auditory ones -- some possible classes to match against are things such as frames with human faces in them, frames with a particular face, indoor frames, outdoor frames, frames that are predominantly a particular color, frames from the work identified on IMDB by ID XXX, and so on.
Thursday, June 15. 2006
Here are slides and sample code from my talk at NYPHPCon 2006.
|
|
|