Friday, February 25

search, research, and stealth advertising

If you already read the previous post -- two good reasons to kill yourself (but you haven't yet) -- then the first couple grafs below will be recap. However, I've subtly modified them to test if you're really paying attention. Now, class, #2 pencils at the ready? All right then, let us proceed...
There are two fundamental ways of measuring the usefulness of search results. One is technically termed precision, the other recall. In a search that could be characterized as having optimum precision, every document returned is a document you want. However, not all  the relevant documents you want are returned. It's so simple, it's a bit hard to grasp at first. Example: let's say you search for...
narcissism "new age" race
...as I just did (we'll get to why soon), and you get six hits, every one of which is relevant. Wow! You're happy. I'm happy (we'll get to why soon). However, the thing of it is, there are actually 50 documents in the database (let's pretend) that you would have liked to see, so you're missing 44 relevant documents that you may never find -- without a lot more effort. But OK, you're tough, you're determined, you're rarin' ta go. Say you're willing to put in that extra effort.

dig deeper

What you want is something like Total Recall. You want to make sure that you're going to get all 50 of those relevant documents. There's a way to do this, theoretically, if you come up with the right search terms and expand them using a very well organized thesaurus of semantically related terms and don't screw up the Boolean logic of your query. Such queries begin to look a lot like LISP -- the AI programming language of choice -- lots of deeply nested parentheses and ANDs and ORs and NOTs. Usually, these queries take considerable skill to build, and even then, it's something of a crapshoot.

But lets say you're able to construct the perfect query and get that total recall -- meaning that you do in fact retrieve all 50 of those relevant docs. Great! Except not so great. Because along with them you're going to get maybe 500 documents that are just noise. They are not what you want. So now you do "have" those 50 gems, but you're going to have to laboriously sift through all of those 550 things to determine which things are the gems.

This tradeoff is essentially reciprocal: greater precision means lower recall; greater recall means lower precision. Although search researchers come up with all kinds of slick algorithms to defeat this frustrating tradeoff -- and many schemes are quite slick; Google's probably got a ton of em -- the basic law of recall v. precision stands almost as firmly as the Second Law of Thermodynamics. (We'll skip over whether entropy can, in fact, be reversed; that's grist for another blog somewhere else.)

Enough theory. Those terms above -- narcissism "new age" race -- are what I just used to search the Highbeam database. Here, you can try it yourself -- though you won't be able to read the full articles unless you've subscribed. Ah well, console yourself with this: there are no stupid queries.


I feel like a total idiot for continuing with this, but OK (if you're still tracking here), my query returns 46 documents, and the first hit is -- for my purposes anyway -- pure gold! Titled "God, the future of American politics, and dieting" by Margaret Talbot in the December 8, 1997, issue of The New Republic, it's about the babe in the previous post, Marianne Faithful Williamson. I'll be quoting from it in a minute. Or two. Or next week. As David Weinberger says at the top of his blog: "Let's just see how it goes."

But before we get to that, now let me run the exact same search -- narcissism "new age" race -- on google. That returns (as of today) 947 documents. Let's see if I can figure out if the hit I got from Highbeam is among them. I can use google's "search within results" feature to do this. I'll query for...

god future politics dieting Margaret Talbot New Republic
Bingo! Here it is -- and you didn't even need to subscribe to HighBeam to find it! Now, Patrick Spain, CEO of that illustrious organization may want to shoot me for sharing that freebee workaround with this slice of Highbeam's intended "target market" (that would be you). But I doubt it. And here are three reasons why...
  1. You wouldn't have known what to query google for in the first place without the free Highbeam precis. And that might not be -- probably isn't -- enough to tell you whether the article is a gemstone or a dirt clod.
  2. Yes, you will find some of the Highbeam articles free on the web -- especially news stories in large city dailies. But a) you likely won't find stories going back years unless you pay for them on a one-time or subscription basis, and b) many you won't find at all.
  3. Whether or not you're successful in finding what you go looking for on google this way, it's a lot of extra work -- with no assurance it'll pay off at all.

Consider. When I first did this test last Friday night, I was a bit um confused about what the article title was. This is understandable given the way it's called out in The New Republic, which is:

AMERICA IMAGE DISORDER

By Margaret Talbot

God, the future of American politics, and dieting.

The Healing of America by Marianne Williamson
(Simon & Schuster, 366 pp., $24)

So, I used other search terms (which I now can't remember), and this is a record of what happened in the rat hole I went down -- a far cry from that "Bingo!" I got above.

Begin narrative transcript...

That [meaning whatever search terms I fed to google] ought to find it if it's there. Nope. Well, OK, let's try just the author's name, Talbot. That leaves 16 of the original 947 standing. I look through them for the New Republic piece or anything that looks related. Nah. I try "Margaret Talbot" as a qualifier to whittle down the 947, and this time I get one hit. Aha! This could be it. But nothing doing. Damn.

OK, let's get more specific. Let's search directly into the New Republic site. I try this...

http://google.com/search?q=site:tnr.com+Margaret+Talbot

...and it gives me all kindsa stuff Margaret Talbot has written, but not the article I'm looking for. Double damn!

End narrative transcript.

Notice that in those 947 docs my original query returned on google, there may have been a lot more articles I would have found useful. Better recall. However, to find them, I'd have to hunt through a lot of junk I'm not even marginally interested in.

Obviously, the better recall has a lot to do with the fact that google has about a jillion more documents than does the HighBeam Research database. But HighBeam's better precision -- recall that the "pure gold" article I initially found there was hit #1 -- has a lot to do with where its documents came from: published articles.

Look, there is no bigger fan of the wild and wooly web than myself. Or damn few. But there is a hurdle to getting published, and sometimes (granted, not always, but sometimes) clearing that hurdle suggests a degree of quality that is not automatically conferred on x-random blogger with x-random opinion on x-random topic. Granted moreover that there's much of wonder and high quality on the web that will never make it into print -- and I search that stuff too. I'm not trying to be elitist here (perhaps it just comes naturally), but there is high value in finding just the thing that speaks to concerns you are -- or in my case, I am -- researching.


I could go on with this. In fact, I did. For days and days. But I'm a search geek. To me it's a perverse form of fun. Most people would far rather pay the 99 bucks for a year and get what they're looking for the first time.

As to the "stealth advertising" in the title of this post, it should be obvious by now that the whole thing is a pitch to subscribe to Highbeam. Not only will you be glad you did, but even more important, I'll  be glad you did. As Highbeam underwrites this blog (think $$$), it will mean that the cat and I will continue to eat and not have to go squat in the dumpster compound.

But there's another, even more stealthy, form of advertising going on here. I've been on the lookout for images I can use here on CBO without the Kopyright Kops coming after me.

<Nixon>I am not a pirate!</Nixon>

So I became an affiliate at allposters.com. Therefore, the cool images you see here -- I'm particularly fond of the Borax miners -- are all straight-up street legal. Click through on a few. They're ads. Who woulda guessed, huh?

As to the Cluelessness poster, well... me and despair go way  back.

Oh yeah, and next time I'll tell you why I was so excited about finding the review of that book by Marianne Faithful Williamson. Because sometimes, to find real treasure, you just have to dig deeper.