I often wonder if anything I say or think or write is original. Sometimes I will think of an allegedly great idea, but before I get too excited, I google a few keywords nervously — to see who beat me to it. Google is the ultimate humbler of humanity.
Today, though, I think I will blog about a subject, and I sincerely believe I am the first to do. (Feel free to prove me wrong!) And yet what I am about to write about is so familiar and prosaic to each of us that no one would bother to.
This morning I was taking a quick shower — and not thinking about chocolate — and when I stepped onto the bathroom floor, I dried myself thoroughly and began to assemble the necessary tools for shaving. But I happened to brush my hand against my head — only for a millisecond, mind you — but long enough to surmise that I hadn’t completely rinsed the shampoo suds from my hair. I immediately felt the gooey mess and heard the wrinkly sound of lather. Yes, it was true, my hair was only half-rinsed; so now I would need to return to the shower to finish the job.
It was only somewhat annoying, a slight detour in my day. It meant that drying myself would no longer be as satisfying as the first time, and the cool sensation of leaving a shower refreshed would be tempered by the paranoia that maybe my head of hair is not completely rinsed. (It has happened a few times; I return to the shower a second time and think that I rinsed everything out, and then to my horror discover that I had omitted one of the sides from this second cautionary rinse).
As I started to shave, I began calculating. I probably commit this kind of washing miscalculation once every 10 or 15 days (that’s 25 times a year!) I would say I have a good 70 years of 1750 washing miscalculations for my lifetime (assuming 1 shower a day). Out of the 7.1 billion people on this earth, let’s guess conservatively that they commit this same washing miscalculation 20 times a year. Let me see: that equals:
140 billion times in a year that people are stepping out of the shower without realizing that they have forgotten to rinse their hair.
That’s a really big number. Think about it: despite the fact that everyone is doing it, there doesn’t appear to be any web pages by or about people who have made this mistake. Don’t believe me? Try here or here. (Actually here or here does produce something relevant though not particularly meaningful or lasting). The event is so mundane that it has never occurred to anyone to write a separate article about this phenomenon.
One way to look at the thing is to say that this experience is something so vague that a search engine couldn’t possibly help you to find people’s descriptions of it. A friend and I were remarking at how useful search engines are for looking up and verifying facts. Sure. But that doesn’t imply that Google is actually useful. It’s like the paradox of not being able to verify the spelling of a word because you need to spell it correctly to look it up. Many of life’s questions are so vague and imprecise that search algorithms are practically useless. Even our proper names are no longer unique enough to find what we need. I’m sure AI and natural language processing will improve, but so will the amount of random garbage on the Internet, and so will the challenge of sifting through things. Many are so alarmed by the NSA and the Echelon System that it might not occur to ask whether the NSA is actually equipped to sort through all the noise.
Decline of the Search Paradigm
In 2009 I attended an education panel hosted by 4 undergraduates attending elite institutions. It was ironic, because the audience was packed with probably about 100 teachers or geeks. Most of the audience members felt that we had a good grasp of reality and Internet reality, but we still were curious about how college students were learning in this Internet-addled age. The students on the panel talked about collaboration, how they used social networking tools and how Internet changed the way they learned. It was fascinating; several talked about how it was changing the study of literature; another talked about the awesomeness of getting help from someone thousands of miles away.
During the talk one student mentioned how useful the Internet was in giving them suggestions about books to read and references to consult, I called out rather indecorously, “How do each of you find new authors to read?”
The students, slightly annoyed at my interruption, but willing to answer, said, “I just google it.” The other three students chimed in with the same answer, “Just google it” — and then they continued their prepared remarks.
10 minutes later was question time, and I jumped up to the front of the line to ask a follow up. “You said that google helps you to find out about new authors or musicians or artists. Can you explain?”
All four of them looked at me as though I were a crank. “Well, it’s not too hard really. Just go to the search box and type something. Then follow the links.”
“Excuse me, but what exactly do you type in the search box?”
“The name of the author.”
“And how do you know what name to type?”
The panelists shrugged. “Just follow any web page.””
I understand that the Internet can help you locate more information about a topic, but only when you know what you want. But how do you know what you want?
Being adept at devising a search term (such as “best American author” or “recommend a 20th century novel” + American) will get you only so far. But lately I’ve noticed that even google’s sturdy algorithm is being weakened by dictionary sites, spam sites and commercial interests. When every company is trying to optimize for search results, then it is possible to programmatically manipulate the results. Some kinds of inquiries don’t yield anything meaningful; it’s not always easy to think of a unique combination of words and phrases to get the results you need. With facebook, stackexchange, quora and other social media, you can receive lots of tips; but then again, people are responding to your questions; you are not finding these things out on your own, but relying on a certain number of people hanging out at these places who would be willing to provide some scaffolding for the edifice of your education.
Perhaps it’s an obvious point, but the topics which occupy most people’s attention are not necessarily the most helpful. Several of my conservative friends link to superficially optimistic articles about climate change, but who would seriously think that the URL most likely to appear on top of search results (presumably from search-optimized CNN or NYT) would also be the most authoritative or accurate? Even if we discount outright propaganda, the things displayed by search engines may be neither relevant or important. Remember: there are probably more websites about the Gilligan’s Island TV series than the movies of Ingmar Bergman.
It’s ironic that the things pressing for our attention at any given moment can also be the most transitory. Let’s see, current events today talk about the LAX shooting, the final day of the Virginia governor’s race, the abortion lawsuits, the new Hyundai, refinancing with lending tree, What does the fox say?, the Obamacare website, the new Netflix titles. All screaming for your attention today, and then 10 years from now will disappear. Perhaps this is a loss for us all, but the lesson to be learned here is that the things which appear to us so urgent today can easily disappear without a trace.
It’s commonly assumed that search engines are good at looking up names and titles and dates. But suppose I wanted to find the name of a novel whose title and author escaped me. One of my favorite novels was Nicholson Baker’s “The Mezzanine.” But what if I forgot the name of the author and title and tried to google it using some keywords? I remember the novel used a lot of footnotes, wasn’t particularly long, was clever, had a scene about drying one’s hands in the bathroom, had another scene about shoelaces and had a long series of digressions and ruminations about mundane things. If I typed “novella literary hand dryer clever mundane digress American bathroom shoelace ruminate footnotes” into google and bing and wolfram alpha, one might feel confident that someone somewhere has used many of these words to describe Nicholson Baker’s novel. My search query may be overlong, but it contains lots of distinctive words; even if a single web page is unlikely to contain ALL of these words, a good search engine should be able to compute the web page which is likely to be most relevant to these words.
So here’s the search results for that query.
Bing results are similar;
I don’t expect Google or Bing to get it exactly right, but we’re not even coming close. The search engines just provide awful and misleading results (and I’m not even including the ads). Although shortening the list of keywords does bring more interesting results, it is still nowhere close to the answer.
I will admit that my search term isn’t exactly the best. To vary my approach a bit, I chose a more generic search term Best American novel in the 1980s, and received decent relevance in results (although not THAT good).
When I recited those same keywords for the Nicholson Baker book over the phone to a literary friend, he correctly guessed the author (though not the book itself). If you were in a classroom with 20 well-read people, I suspect you would get better answers. If you asked on a site like Goodreads to name the book where a lot of people would see it, I suspect you’d get the right answer. This question seems esoteric, but for a moderately erudite audience, it is not esoteric at all.
But search engines are not particularly good at these fuzzy kinds of questions. Even in cases where a search engine can match a fuzzy question with an answer, the ordering and prominence is determined by how well the site was optimized for search engines — and also whether the company paid for ad placement. If anything, Google can find pages where the wording of your question appears prominently — like a forum or a stack exchange site. But if the way you phrase the question doesn’t parrot the way other people do, you are out of luck. In other words, in 2014 the ability to get useful search results depends mainly on how good a Family Feud contestant you are.
We used to believe Google was so amazing because 1)back then there were significantly fewer web pages and 2)Google presented lots of results. Do you remember when you could set Google to display 100 results on a single page? Even if Google didn’t bring the answer to your query, it nonetheless provided up to 100 different paths you could explore to find it. Perhaps at one time those Ivy league students on the panel could pick a random link in search results and follow things. But whenever I start from a search result, I have this uneasy feeling that it’s all one huge conspiracy to trap you inside a gigantic and self-contained network of advertising and promotion. On mobile devices it’s even worse — it becomes harder to tell the difference between ads and organic search results. Human laziness will make you choose whatever pops up in the first three results, no matter how commercial it seems.
Comparatively speaking, searching for proper nouns is easier than searching for concepts or abstract phrases. (That is why I end up going to wikipedia more times than not… I want to find some neutral site that doesn’t have a secret agenda to destroy someone’s reputation or laud him as a captain of the industry. But wikipedia waters down everything. It almost seems proud of the fact that nothing on the site is original or insightful).
I remember once talking to a translator in Albania. We had a delightful conversation, but he playfully scolded me for simplifying my language when talking to him. “Why is that bad?” I asked him. “Isn’t accurate communication the goal of teaching?”
“Not really,” he replied. “The thing which most interests the translator are those hard-to-translate or untranslatable expressions. These “untranslatables” are the most valuable part of the language and often the key to the cultural peculiarities of the people who speak it.”
I’m not sure I agree. But surely whatever is hard-to-express inside a language has value .. and certainly those linguistic qualities which make a web page easy for a search engine to parse also make it less interesting. It’s clear to me that search engines fail to provide relevant results fairly often — for various commercial and linguistic reasons. Perhaps human vanity fools us into thinking that our experiences are unique — rather than the more likely fact that Google isn’t providing an accurate picture of the world’s experiences and thoughts. Instead of expressing wonder at the ability of Google to turn up interesting results, we should be lamenting the fact that Google continues to lead us down well-known paths of stupidity.
I am less excited by the fact that search engines have given special prominence to Wikipedia because of its commitment to the “neutral point of view.” (NPOV) Enshrining the NPOV means that wikipedia page will exclude a lot of analyses and points of view; it shudders towards the obvious and noncontroversial. Even if that is better than commercial search engines, I can’t help but wonder if Wikipedia just helps to flee from one watered down path to another.
2022 Update. I am happy to report that my original search query about accidentally stepping out of the shower with soapy hair produces more meaningful results. 8 year later, search results produce What seems to have changed is that search results are dominated by major media sites, with little to no articles on indie sites.