Tuesday, March 2, 2010
Recently the European Commission opened a preliminary inquiry into competition complaints. Part of the complaint alleges that Google operates without sufficient transparency into how and why web sites rank in our search results. The notion that Google isn't transparent is tough for me to swallow. Google has set the standard in how we communicate with web site publishers. Let me tell you about some of the ways we explain to sites how we rank them and why.
One of the most widely-discussed parts of Google's scoring has always been PageRank. That "secret ingredient" is hardly a secret. Here it is. That early paper not only gave the formula for PageRank, but mentioned many of the other signals in Google's ranking, including anchor text, the location of words within documents, the relative proximity of query words in a document, the size and type of fonts used, the raw HTML of each page, and capitalization of words. Google has continued to publish literally hundreds of research papers over the years. Those papers reveal many of the "secret formulas" for how Google works and document essential infrastructure that Google uses. Some of these papers have spurred not only open-source projects but entire companies in their own right.
Academic papers are one thing, but Google also aims to engage and educate in many other ways. In 1999, Sergey Brin participated in the first Search Engine Strategies conference for webmasters. In 2001, Google became one of the first search engines to engage online at a publisher forum called WebmasterWorld. One representative (GoogleGuy) has posted over 2800 times, while another (AdWordsAdvisor) has posted almost 5000 times.
Google's efforts at transparency and communication have evolved with the web. We started blogging in May 2004 and have written thousands of posts on our official blog. Google now has over 70 official blogs, including an official webmaster blog specifically to help site owners understand how Google works and help them rank appropriately in our search results. Google publishes more blog posts than almost any other large company. We also provide extensive public documentation on our web site with advice for publishers, in dozens of different languages.
As the head of Google's webspam team (which tries to stop attempts to violate our clearly documented, public webmaster guidelines), people often ask me questions about how Google works. That's why I started my own personal blog in 2005 and have written hundreds of posts about Google. The topics range from common web site mistakes to advice for new bloggers. I've had the pleasure of speaking to web site owners or doing public web site reviews at over 30 different search conferences. In fact, I'll be answering questions at another search conference this week - along with a dozen or so Google colleagues.
We've tried all sorts of experiments to help site owners understand how Google's search ranking works. We've done multiple live webmaster chats online with hundreds of simultaneous participants. We've experimented with tweeting. We've participated in podcasts. And here's one of my favorite ways we've helped to break out of the black box and give advice to publishers: in the past year, we've taken questions from the public and posted hundreds of video answers on a webmaster video channel. Those videos have been watched over 1.5 million times (!). We also engage online across the blogosphere to answer questions about Google's practices.
The list goes on and on. Google has reached out to other search engines on methods to make life easier for website owners. The resulting standards include specifying preferred web site url formats as well as Sitemaps, an easy way for webmasters to tell search engines about the pages on their site. Google provides a webmaster forum where both Google employees and helpful outside "superusers" hang out and answer questions about specific sites. We've run in-person website clinics to provide specific one-on-one feedback and advice in locations from San Francisco to India to Russia to virtual site clinics in Spanish. We've even confirmed ranking signals that Google doesn't use in our algorithms, such as the keywords meta tag, which saves site owners from doing needless work and helps avoid frivolous lawsuits.
The frustrating thing is that even if all 20,000 employees at Google worked full-time on answering questions from website publishers, we still couldn't talk to every site owner. Why not? Because the web has over 192 million domain names registered. That's why we introduced Google Webmaster Tools, a one-stop location to provide scalable, self-service information and to let webmasters provide us with data. Describing the powerful tools we provide to site owners for free would take an entire other blog post, but a number of the offerings include:
- Site owners can get recommendations about issues like duplicate meta descriptions or missing title tags.
- Site owners who we believe have violated our webmaster guidelines and where Google has taken corresponding action regarding their site in our index can submit a request for reconsideration.
- Site owners who have been hacked can get details about malware on their site. After they remove the hacked content, they can fetch pages from their site as Googlebot to make sure the malicious content is really gone.
- Site owners can find out about errors that Google encountered while crawling their site.
A Google employee recently blogged about using these free, public tools to diagnose an issue with his webhost where he had exceeded his bandwidth quota. Millions of webmasters have taken similar advantage of Google's free tools for site owners to get helpful information about their site.
At Google, we try to be as open as we can, even to the point of helping users export their data out of Google's products. At the same time, we don't think it's unreasonable for any business to have some trade secrets, not least because we don’t want to help spammers and crackers game our system. If people who are trying to game search rankings knew every single detail about how we rank sites, it would be easier for them to 'spam' our results with pages that are not relevant and are frustrating to users -- including porn and malware sites.
Ultimately, criticizing Google for its "secret formula" is an easy claim to make, but it just isn't true. Google has worked day after day for years to be open, to educate publishers about how we rank sites, and to answer questions from both publishers and our users. So if that's how people choose to define "secret," then ours must be the worst kept secret in the world of search.
Posted by Matt Cutts, Principal Engineer, Search Quality Team