At work, we added a language filter to Ning Pro last month. It lets Network Creators have naughty words (for the Network Creator's definition of "naughty") replaced with * characters.
A straightforward way to do this in PHP is to pass an array of words to look for and their replacements to a function like str_replace() or str_ireplace(). Or, similarly, use a regular expression that gloms the search terms together (and potentially checks word boundaries.) There are assorted WordPress plugins that work like this.
The problem with this approach is that it's really slow. Especially if you have a lot of words you're looking for. The amount of time it takes to do the search and replace grows in proportion to the number of words you're looking for. This is particularly unfortunate because usually, none of the words are ever found!
For our language filter, we took a different approach. We've packaged it up into a PHP extension called Boxwood and releasing it today as open source. (Find it on github: http://github.com/ning/boxwood.)
With Boxwood, you can have your list of search terms be as long as you like -- the search and replace algorithm doesn't get slower with more words on the list of words to look for. It works by building a trie of all the search terms and then scans your subject text just once, walking down elements of the trie and comparing them to characters in your text. It supports US-ASCII and UTF-8, case-sensitive or insensitive matching, and has some English-centric word boundary checking logic.
Take it for a drive and let us know what you think!
I would say something like "Of course, this entire exercise is just for fun and in practice is totally useless," but whenever I start out thinking that something actually productive eventually emerges. So perhaps this is totally useless, perhaps not!
I upgraded a machine to Ubuntu Feisty Fawn (7.10) today and was having trouble recompiling the Cisco VPN client. Thanks to this thread, I went to this blog post and applied this patch and everything was fine.
I must admit that applying some random patch to one's VPN client inspires a moderate amount of queasiness, but fortunately the patch is simple enough to understand so one can be confident no mischief is involved.
I particularly like the embeddable slideshow that Ning Photos has, and its companion in Ning Videos, the embeddable player -- so you can put photos or videos on your blog or wherever. Both apps let you e-mail in content from your phone, too. Ning Group has some spiffy HTML parsing and file upload features so you can share documents with folks and incorporate music, pics, or anything else in the forums.
Plus, all three sites have the juicy bits that every site on the Ning platform gets -- things such as cloneability, complete customization, and built-in REST APIs. I've been watching the feeds for clones of photos and videos -- I suppose seeing who's cloned sites you care about is the Web 2.0 version of ego surfing.
Plenty of brilliance went into crafting that document, so I must be missing something extremely obvious here: why are those control characters outlawed? What is the reasoning behind the spec preventing me from including in an XML document:
Over the weekend we launched the "Ningbar" -- what Diego, Brian, and the rest of the crew at Ning have been cranking away on for while.
The Ningbar is not just a replacement for the sidebar that used to accompany all Ning apps. It's the control center for getting the most out of Ning, whether you're using an app, cloning an app (which now takes just exactly 2 clicks), or writing code behind an app. In the Ningbar, you can get stats about the app you're using (who else uses it, what do your friends do on the app), check your messages or send messages to others, customize your apps, or clone the app you're looking at.
What would regular expressions for non-text look like? I think many of the quantifiers would be similar, but the notions of "character classes" and what goes in an atom would be totally different.
Two ways of specifying bits to match for audio could be score-oriented or sample-oriented. Score-oriented "classes" could match patterns consisting of particular notes, notes in particular keys, with particular duration, particular chords, parts of particular chords, and so on. These could be built up into multi-note patterns. Other notation would look for other parts of the score -- tempo, all those Italian words that describe how to play the notes, etc.
Sample-oriented classes could match particular samples (or subsets of samples), certain rhythms, melodies, etc. With fancy enough signal processing, the classes could match against particular words being sung, a particular singer doing it, or slices that "sound like" some other slice (where "sound like" is implemented by some pluggable algorithm.
A text-based notation might be doable for the score-oriented classes and some of the sample-oriented classes but to fully extend the analogy the expression of the audio-regex could/should be with audio (or visual representations of the audio) as well -- "Find the five bars on either side of any music that sounds like **this**" where **this** is some scoring or sample that's either taken literally or has been approrpriately "expression-ified" (with audio processing filters?), or, alternatively, is a visual representation of a score that has been similarly transformed.
Extending this to video is an expansion of the "sample-oriented" audio expression language, but with visual idioms in addition to the auditory ones -- some possible classes to match against are things such as frames with human faces in them, frames with a particular face, indoor frames, outdoor frames, frames that are predominantly a particular color, frames from the work identified on IMDB by ID XXX, and so on.
At the risk of getting beat up by elisp hoodlums the next time I find myself alone in a dark digital alley at night, I must admit that I am spending less time with XEmacs these days and more time with jEdit.
Jon turned me on to it at work* after it became his preferred editor for writing Ning apps. The initial thing that made me switch was that jEdit's SFTP support is excellent, while XEmacs's is (via tramp) is nonexistent.
You come for the SFTP, but you stay for the rest of the features. jEdit really has been an almost perfect balance of providing all the things I want but otherwise not getting in my way. There has been a bit of a learning curve as I figure out what key combos do what or which plugin is appropriate for a task, but it's not bad. Each time I think that jEdit doesn't have some feature I need, it turns out I'm wrong.
When I complained about missing isearch-forward, Jon pointed out that C-, does incremental search (and C-g moves to the next match.) When I thought, "a File-Open dialog box is so clunky compared to C-x o and then typing a path with tab completion", I discovered that the File-Open dialog box that C-o brings up in jEdit supports tab completion and auto-navigates to subdirectories as you type them.
That said, I am not completely gaga on the jEdit kool-aid yet. XEmacs has better source control system integration and its php-mode does better "indent to the right place (with spaces) when I hit tab". I also have a juicy set of elisp macros for writing Docbook XML and haven't tried any XML writing in jEdit yet. jEdit has somewhat better PHP intelligence than XEmacs -- its PHPPlugin can find various syntax errors, while XEmacs has nothing like that. However, jEdit's PHPPlugin isn't up to the code-completion of, e.g. Zend Studio, which is really nice.
* Does saying "at work" really make sense when I'm in NYC, he's in BC, and "the office" is in Palo Alto? I suppose "at" is virtual now, too.
We launched Ning yesterday -- a playground for building and using social applications.
I'm very excited to have this out and about now for experimentation and use. It's been incredibly fun to build and noodle on the consequences as we built it, but I suspect the feedback we now get from users and developers as well as the apps that new developers build will create an even bigger wave of neat ideas.
Working with such a spectacularteam of folks has been (and continues to be!) a thrill. (As well as with all the spectacular folks who don't have blogs to link to.)
If you don't know PHP, dive in and clone existing apps to get your own social apps up and running in seconds. If you do know PHP, use our PHP API and components to build a cool new app in (slightly more) seconds.
Plenty post-launch cleanup and coordination to do (as well as absorb all of the nice traffic from /., del.icio.us, digg, MeFi, boingboing, etc.) but I'll have plenty more to say in the coming days, weeks, and months.
As much I wanted my Windows Re-Education Camp efforts to succeed completely, my brain and my fingers have Unix idioms too deeply ingrained in them to make the increasing amount of adjustment effort worth it.
Instead of firing up my long-dormant VMWare installation, I thought I'd give CoLinux a try.