Google search engine sucks

#GOOGLE SEARCH ENGINE SUCKS MOVIE#

Relevance ranking is just one of many basic search-engine functionalities missing from online catalogs. The users who complain that your online catalog is hard to search aren't stupid they are simply pointing out the obvious.

If the word million shows up several times in a catalog record, and it's not that common in the database, the item should rise to the top, as Endeca presents them in the NCSU catalog. Put TF and IDF together-the importance of a term in a document, and the uniqueness of the same term in an entire database-and you have basic relevance ranking. The fewer times the term million shows up in the entire database, the more important, or unique, it is.

IDF, for inverse document frequency, measures the importance of the word in the database you're searching. The more the term million shows up in the document-think of a catalog record for the book Million Little Pieces-the more important the term million is to the document. TF, for term frequency, measures the importance of the term in the item you're retrieving, whether you're searching a full-text book in Google or a catalog record. It's primarily determined by the magic of something every search-engine vendor will talk your ears off about: TF/IDF. Relevance ranking is actually fairly simple technology. I love that last hit from the NCSU catalog: it's an ebook “published 1629.” NCSU has always been ahead of the game!) (Or perhaps they just have a lot of vvit. Green groatsvvorth of vvit, bought with a million of repentanceįor a search-engine aficionado, those NCSU search results are mmm-mmm good.Black religion after the Million Man March.Compare those results with the same search for million in the NCSU library catalog, powered by the Endeca search engine.

But you shouldn't have to be a rocket scientist to use a library catalog in the first place. You don't have to be a rocket scientist to see these catalogs aren't using relevance ranking. (Sometimes a catalog will lamely explain that the results are sorted by “date.” That means "last-in first-out," and it's no excuse for relevance ranking.) Whatever was most recently cataloged will show up first. Without a search engine to provide relevance ranking, most catalog search results are simply ranked last in/first out (which is the way databases think).

The rock from Mars: a detective story on two planets / Kathy Sawyer.

Hog heaven: the story of the Harley-Davidson empire.

Call me picky, but the first page of hits-often the first or second hits-for those catalog searches should not include: Today I picked two dozen online catalogs from around the country and conducted keyword searches for the term million. We're so used to this we don't even think twice when Google's first page of hits for the term million returns satisfying results.īut compare that same search in your typical online catalog. That's relevance ranking: the cream of the search results rising to the top. Every product I've looked at offers relevance ranking, and every search-engine vendor tells me, bells and whistles aside, relevance ranking works pretty much the same everywhere.īy default, when a user conducts a search in a search engine-say, a search for the term million-the search engine should return best matches first.

#GOOGLE SEARCH ENGINE SUCKS MOVIE#

I'll start today with relevance ranking-the building block of search, found in any search engine, from Google to Amazon to Internet Movie Database to little old Librarians' Internet Index.Īt MPOW (My Place Of Work), as we say on the blogs, we're evaluating new search engines. But after talking to librarians who asked me, “So what did they get for doing that?” I realized I need to back-pedal and explain how a search engine makes an online catalog easier to use (or, as Andrew Pace puts it, "Why OPACs Suck"). I recently wrote about NCSU adding a search engine to its online catalog.