Category Archives: misc

Perl Makes You Cry Harder

Slashdot is linking to an article, Why Corporates Hate Perl. I think it’s interesting that while zealots exist for every language, you rarely encounter the kind of vehement hatred for a language the way you do for perl. Which probably just fuels the antagonism in both directions.

Many Slashdot comments point out that the requirements specific to corporate environments are rather idiosyncratic. Some stress the “right tool for the job” philosophy, which I totally agree with.
And here’s one that makes a wonderful characterization about what perl is and is not:

The problem is, Perl is just a programming language, not a conceptual system. Arguably it is the antithesis of a conceptual system. Many teams then create their own application frameworks atop it (e.g. Mason, POE), and it’s rare for these frameworks to be compatible since Perl offers so many variations in the construction of even standard programming artifacts like classes & objects.

In addition, the level of expression (i.e. TMTOWTDI) means in practice that highly varying programming styles occur throughout large, long-lived bodies of code.

As a result, significant Perl-based business applications tend to become hard-to-maintain hairballs of divergent style and subtly variegated concept.

The root cause: as I started with; the absence of a standard conceptual framework for Perl means that during the early phases of a project, it’s much harder to reason meaningfully about the eventual form of the system than it is with, say, Java or .NET where many of the design patterns are explicitly standardised.

I wouldn’t say that “Corporates Hate Perl”. It’s just the Perl as an application language doesn’t suit the formal design & architecture process we’re seeing increasingly as IT departments start to grow up and realise that they’re not the most important people in the company.

That doesn’t disqualify Perl from being a useful tool, and it’ll always have a place in data transformation, but it does mean that Perl isn’t going to be one of the general-purpose application programming languages of the future.

Bravo. I’d add that what the author identifies as a “problem” is also Perl’s strength. There’s more than one way to do it, so do it as you please. That definitely has allure for many programmers. As a project scales up, though, I think this does in fact become a detriment, and not only for corporate projects.

In response to someone who wrote, “chomp is not ambiguous. RTFM and stop crying,” here’s another awesome comment: []
This safer version of “chop” removes any trailing string that corresponds to the current value of $/ (also known as $INPUT_RECORD_SEPARATOR in the English module). It returns the total number of characters removed from all its arguments. It’s often used to remove the newline from the end of an input record when you’re worried that the final record may be missing its newline. When in paragraph mode ($/ = “” ), it removes all trailing newlines from the string. When in slurp mode ($/ = undef ) or fixed-length record mode ($/ is a reference to an integer or the like, see perlvar) chomp() won’t remove anything. If VARIABLE is omitted, it chomps $_ . Example:

If anything I’m crying harder after reading that.

Amen, brother.

Comparing Documentation Methods

I’ve always thought Javadoc was one of the best features of Java. The Javadoc pages for the core API are invaluable for finding what I need very quickly. The utility can be run on any Java source files to generate a nice set of HTML pages that gives you a thousand-foot view of packages, classes, members, and method signatures. Nothing special or extra is required. Of course, you’ll often want to add comments and descriptions and that’s done by following commenting conventions that Javadoc can recognize and insert automatically into its HTML output, but you don’t need to do this for Javadoc to work.

Python’s docstring conventions are not quite as elegant, in my opinion, but they work just as well. Documentation is so much more important in a dynamic language like Python because unlike Javadoc, the pydoc utility can’t determine types. So if a parameter for a function or object method is “user,” one needs to know whether to pass in a User object, a username string, an integer id… or whether any of those will work.

Both Python’s docstring and Javadoc let you document as-you-go, eliminating or reducing the need for documentation as a separate task. If you change something, the documentation is right there for you to update.

Perl’s POD format isn’t nearly as convenient. The markup is oriented more towards layout and formatting rather than following the structure of the code. You can write section headers, indent, and list items in the documentation, but you don’t really attach them to subroutines or methods. Well, you can, sort of, with “=item” but each item must be nested inside other markup, and it feels kludgey and weird. The consequence is that the documentation feels really much separate from the code, even if it resides in the same file. It doesn’t encourage documentation as you go.

In the perl project I worked on, I wrote some POD comments in the very beginning but it fell by the wayside. I should have kept up with it, but it felt like an extra thing to do. My client’s taken over the code, and he’s spending time reading a lot of code to figure out what the parameters should be for various calls. In a dynamic language, there’s no easy way around this if there’s no documentation. Plus perl’s subroutine syntax can make it very difficult to decipher parameter lists quickly. It’s frustrating. I can’t really blame Perl for my own failure to write extensive documentation, but I must say, the idiosyncrasies of POD don’t exactly make it easy.

“It Works”

This blog post, “The Worst Thing You Can Say About Software Is That It Works,” written by one Kenny Tilton, is pretty hilarious. This is the most beautiful thing I’ve read in a while:

if a pile of code does not work it is not software, we’ll talk about its merit when it works, OK? Therefore to say software works is to say nothing. Therefore anything substantive one can say about software is better than to say it works.

Reading this triggered flashbacks and PTSD. I’d mentioned to a manager recently that I wanted some time to do some badly needed refactoring. My explanation of why was met with a pause, then, “Let me get this straight. You want time to take something that already works, reorganize it, possibly break things, and we wouldn’t have anything new to even show for it?”

That last part was wrong–the value added comes from maintainability and extensibility, but I couldn’t get him to really grasp those ideas. He’s not a technology person. For all he knew, maybe this was an elaborate ruse on my part to be left undisturbed while I surfed porn at my desk for a few weeks.

I work in a very small shop with all non-technology people, so this sort of thing happens a lot. It’s frustrating. It’s sort of nice to know I’m not alone in encountering this mindset. But man… if even the fellow programmer in Kenny’s story doesn’t get it, I’m not sure there’s much hope for the rest of the world.

From Content to Community

Since the beginnings of the commercial web, people learned quickly that “content is king.” Appealing and unique content is a guarantee of raw traffic, and that hasn’t changed with Web 2.0.

What HAS changed is that traffic from content won’t necessarily result in return visitors and loyalty. Syndication feeds have made it increasingly easy to filter exposure to websites, so that a user can maximize only what they want to see. I won’t bother browsing around a website that’s got interesting content 75% of the time, when I can grab its feed and use my newsreader to view interesting content from many sources nearly 100% of the time.

Quality content needs vibrant community interaction around it to ensure that a website gets loyal return visitors. A lot of old media still hasn’t figured this out. They try to fool users with fancy-looking websites, attempting to masking the fact they’re still, well, old media.

One example is The San Francisco Chronicle’s upcoming redesign. While the visual feel is fairly clean and consistent, the page is horribly cluttered. The flawed rationale is pretty obvious: let’s put tons of crap on the screen and maybe someone will click something!

User feedback on the redesign is very mixed. I suspect that the positive responses are coming from non-tech savvy readers, people who are evaluating the layout based on its resemblance to a print newspaper. (They’ll soon change their minds when they can’t easily find anything.) That audience isn’t very large and it’s slowly dying out over time.

Interestingly, the negative responses aren’t just about layout clutter, but the lack of interactivity. Intelligent, web-savvy users aren’t interested in being passive readers. They want to be part of the news, to help shape it and to comment on it; they want their voices featured prominently on the site, and not ghettoized in tiny comments sections, sidebar polls, or letters to the editor. Being a truly integral part of a community makes engaging people feel appreciated, gives them a reason to come back, and makes them want to spread the word.

If Web 2.0 means anything at all, it means that people are realizing the web isn’t yet another publishing medium; it’s an interface for social interaction. And this means successful websites are increasingly distinguished by the kinds of community they foster, not just their content. In the world of technology news, for example, there are plenty of sites that publish decent, timely content, original or aggregated. Sure, they each have their own editorial styles, but in my mind, what truly separates them are the unique communities: Slashdot is mostly full of snarky, pro-Linux and anti-Microsoft ideologues; ars technica is a bit more neutral with a strong gamer and “power user” demographic; reddit tends to have good conversations about submitted links in their programming subsections.

There will always be a place for online newspapers and their model of publishing, but I think their core readership and audience will continue to decline, unless they’re willing to give up their monopoly on content production and focus on fostering distinctive communities.

Mashup: Google Maps + Amazon Reviews

Saturday morning I woke up with a rather odd idea: to create a mashup of google maps and Amazon book reviews. I was mildly surprised to discover it hadn’t been done yet. Here’s the result of spending a chunk of the weekend writing it in python (11/15 Update: should now work in Safari, Firefox2, and IE6):

It didn’t take long to write, since I’d recently been investigating the google maps API for a friend, and I’d also been doing bits of Amazon integration for a project. The biggest pain was dealing with the 1 call per second per IP address rate limiting mechanism of Amazon’s ECS API: the service returns error codes if you exceed the limit. So the application is pretty slow, especially if others are also using it.

But if you’re patient, it’s fun to watch the map markers gradually pop up. It’s also fun to look up a book with strong political leanings, and see how the ratings are distributed geographically. For example, you can look at The Communist Manifesto by Marx, and Bill O’Reilly’s Culture Warrior (*shudder*). Data for both are cached, so they should load very fast.

Who, me? The problem with a “do not call” list

Should there be a federally regulated “do not track” list for the internet, similar to the existing “do not call” lists? There’s an angle to this issue that I think proponents are missing.

As at least one person has already pointed out, the internet doesn’t work like a telephone system. It makes sense to say “do not call me”, since the “me” is the phone number. But how do you identify the “me” who’s using the web? Schemes using IP addresses and browser cookies aren’t adequate, since they can often be shared by several people.

Contextual advertising tries to make smart guesses about what might interest the user, but it’s only as good as its assumptions about whether it’s the same individual who generated the browsing patterns. The fact that advertisers are constantly extending their networks to probe more data and perpetually improving their algorithms speaks to how difficult this problem of identification is.

This is not simply a technical problem, but one that has broader social ramifications. The crux of it is this: in order to say “do not track me,” there needs to be a “me.” Supporters of this initiative are, in effect, implicitly also supporting the creation of a strong identification mechanism. Any federal regulation would need such an id in order to sign people up. Otherwise, how would you? Who’s the “me” that advertisers shouldn’t track?

A “do not track” list might successfully limit advertisers’ collection of web usage data, but it would certainly also improve government’s ability to do so. Would privacy really be improved then? The more practical solution is to encourage people to make use of ad-blockers and secure channels, and to educate them on how to be more savvy web users.

Is “race” a valid scientific construct?

James Watson, of Watson and Crick fame, has resigned this morning (Race row DNA scientist quits lab). No doubt it was due to the controversy over his incredibly offensive comments about people of African descent:

He was quoted as saying he was “inherently gloomy about the prospect of Africa” because “all our social policies are based on the fact that their intelligence is the same as ours – whereas all the testing says not really”.

He later issued a pretty feeble apology; actually more of a statement that he’d been misunderstood. On the subject of whether Africans were genetically inferior, he retracted his earlier statement: “there is no scientific basis for such a belief.”

The most interesting discussions so far about this incident resolve around the question: Is “race” a valid scientific construct for genetic research, or is it merely a social construct? It’s a deceptive question whose answer probably isn’t either/or: for example, I think it’s been shown that certain populations correlated with “race” are more genetically prone to having particular medical conditions.

I’m a skeptic when it comes to science. For me, the deeper question is: when does race get used in research and when doesn’t it? For what purposes? Science rarely takes place (if ever) in a vacuum of objectivity. Research gets funded, often in order to support human decisions about something. Note that Watson’s original comments reference “social policies.” He also mentioned what employers tend to think about the intelligence of people of African descent.

This is what’s truly scary about genetics in this day and age. It definitely can be a tool for helping people. But I suspect the greater likelihood is that it’ll be a tool for deciding who’s smart/able, who’s more deserving of opportunity, even who’s more deserving of a chance to live (health insurance companies love this stuff!).