04 February 2011

Google Claims Bing Copies Its Search Results

After noticing curious search results at Bing, then running a sting operation to investigate further, Google has concluded that Microsoft is copying Google search results into its own search engine.
That's the report from Search Engine Land's Danny Sullivan today, who talked to both companies about it and presented Google's evidence. According to the report, a mechanism could be the Suggested Sites feature of Internet Explorer and the Bing Toolbar for browsers, both of which can gather data about what links people click when running searches.
A Bing search result based on one of 
Google's hand-coded honeypot search 
results that previously appeared only 
in Google. (Credit: Google) (Click To Enlarge)              The story began with Google's team for correcting typographical errors in search terms, which monitors its own and rivals' performance closely. Typos that Google could correct would lead to search results based on the correction, but the team noticed Bing would also lead to those search results without saying it had corrected the typo.

The original hand-coded honeypot search result on Google.
Next came the sting, setting up a "honeypot" to catch the operation in action. Google created "one-time code that would allow it to manually rank a page for a certain term," then wired those results for particular, highly obscure search terms such as "hiybbprqag" and "ndoswiftjobinproduction," Sullivan said. With the hand coding, typing those search terms would produce recognizable Web
(Click to enlarge.) (Credit: Google)       pages in Google results that wouldn't show in  search results otherwise.

Next, Google had employees type in those search terms from home using Internet Explorer with both Suggested Sites and the Bing Toolbar enabled, clicking the top results as they went. Before the experiment, neither Bing nor Google returned the hand-coded results, but two weeks later, Bing showed the Google results that had been hand-coded.

Microsoft didn't say today whether it plans to continue the practice, but evidently it doesn't consider it "cheating," as Google does.

In a comment to ZDNet blogger Mary Jo Foley, Microsoft said, flatly, "We do not copy Google's results." However, that denial turns out to be more a matter of interpretation.
A blog post by Harry Shum, Microsoft's corporate vice president of Bing, offered some detail on what Microsoft did. He acknowledged monitoring what links users clicked but essentially described it as letting humans help gather data through crowdsourcing. 
We use over 1,000 different signals and features in our ranking algorithm. A small piece of that is clickstream data we get from some of our customers, who opt-in to sharing anonymous data as they navigate the web in order to help us improve the experience for all users.
To be clear, we learn from all of our customers. What we saw in today's story was a spy-novelesque stunt to generate extreme outliers in tail query [rare search query] ranking. It was a creative tactic by a competitor, and we'll take it as a back-handed compliment. But it doesn't accurately portray how we use opt-in customer data as one of many inputs to help improve our user experience.
The history of the web and the improvement of a broad array of consumer and business experiences is actually the story of collective intelligence, from sharing HTML documents to hypertext links to click data and beyond. Many companies across the Internet use this collective intelligence to make their products better every day.
Google made it clear it isn't happy about it.

"I've got no problem with a competitor developing an innovative algorithm. But copying is not innovation, in my book," Sullivan quotes Google Fellow and search expert Amit Singhal as saying. "It's cheating to me because we work incredibly hard and have done so for years but they just get there based on our hard work...Another analogy is that it's like running a marathon and carrying someone else on your back, who jumps off just before the finish line." 

And in a statement to CNET News, Singhal added that Google disagrees with Microsoft's position, speaking just as flatly as Microsoft denying copying:
Our testing has concluded that Bing is copying Google Web search results.
At Google we strongly believe in innovation and are proud of our search quality. We look forward to competing with genuinely new search algorithms out there, from Bing and others--algorithms built on core innovation and not on recycled search results copied from a competitor.
Google didn't respond to CNET questions about whether it plans any actions beyond publicizing the honeypot.

Google brought its concerns to Sullivan shortly before a Bing search event today.
Coincidentally or not, Google just shifted that event's agenda significantly. Indeed, the search-copying issue become the focus of a debate between Microsoft and Google representatives at the conference.

Stefan Weitz, director of Microsoft's Bing search engine, shared this response with Sullivan: "Opt-in programs like the [Bing] toolbar help us with clickstream data [information that shows Microsoft what links people click on], one of many input signals we and other search engines use to help rank sites. This 'Google experiment' seems like a hack to confuse and manipulate some of these signals." 

Hack, experiment, or honeypot, it's very revealing. Google created about 100 such hand-coded results, Sullivan said, so it's hard to imagine the act distorting search results in any significant way. The next relevant question will be to see whether Microsoft concludes it's time to update its own search algorithm so that a Bing search for "hiybbprqag" won't lead to ticket information for the Wiltern theater anymore.

Updated 4:20 p.m. PST: Google has officially commented on the matter via a blog post attributed to Singhal. In it, he writes "However you define copying, the bottom line is, these Bing results came directly from Google." Also, "And to those who have asked what we want out of all this, the answer is simple: we'd like for this practice to stop."
 
Updated several times with comment from Google and Microsoft, most recently at 4:10 p.m. PT.

 
Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | Blogger Templates