MaximumPC 2004 04

(Dariusz) #1

T


o our delight, it turns out Google’s
original hardware comes from
one of our favorite Silicon Valley
institutions: Fry’s Electronics. We’re
talking old-school Fry’s, complete
with hitching post and wagon wheels.
Which bears little more than a passing
resemblance to the prettied-up chain
competing with Best Buy and CompUSA
in strip malls across North America.
According to our source, Google was
literally built upon the shoulders of
PCs: “We drove to Fry’s, picked out the
cheapest Linux desktops we could find,
and built it all ourselves.”
To this day, more than 10,000 PCs
running Red Hat Linux operate in
what our Google friend calls a “highly
replicated environment.” This means
that each computer is more like an
adjoining cell in a massive hive. The
engine is powered collectively yet remains
invulnerable to the failure of a single unit.
When a box does fail—and many can
and do throughout the day—the overall
system doesn’t miss a beat. New machines
can be added to the mix without a second
of downtime.
In fact, according to our source,
Google has the single largest installation
of Linux machines in the world. But
getting perfect searches isn’t as simple
as buying your own miniature server
farm—Google’s proprietary software
is the key to the company’s successful
distributed-computing model. After
purchase, every system in Google’s server
farm is optimized and connected to the
network via software designed specifically
to run the company’s search architecture

as efficiently as possible.
Don’t expect to drop by your local
branch and take a tour of the server
farms anytime soon, either. Aside from
saying that Google servers are spread
liberally around the world, our inside
source refused to tell us where any of the
clandestine installations are located. He
also told us these installations are housed
in state-of-the-art facilities using highly
sophisticated custom cooling solutions.
“Our data centers are as big as they get,”
he explained, “but most companies are
secretive about their business intelligence
and obviously so are we.” That these
machines are hidden around the globe
not only aids performance, but also
avoids risk. Should the east or west coast
experience a massive power blackout or
some other catastrophic event, Google
will still be there for the rest of the world.

Four Steps for Google...
Inexpensive hardware is merely the
outer casing of Google’s real magic.
The search engine’s inner workings
are the brainchild of Sergey Brin and
Lawrence Page, who co-founded Google
in 1998. In the course of their doctorate
programs at Stanford, the two computer
scientists developed a highly scalable
search solution at a time when other
sites’ search results were either useful or
complete, but not very timely.
Determined to pair quantity with
quality and timeliness, Brin and Page
devised an ingenuous approach that broke
web searching into four key components:
crawling, indexing, ranking, and results.

To pore over the 10 billion web pages
on the Internet, Brin and Page developed
an innovative and futuristic web crawler
called the Googlebot. The Googlebot,
which is actually many thousands of bots
running in parallel, visits web pages in
much the same way we do. It specifically
checks to make sure each page is still
valid, “reads” the current version of the
page, and visits all the links that appear
on the page. Because Google strives
to offer the newest and most relevant
data to its users, regularly updated sites

Inside Out


How Google searches the world in


two-tenths of a second



           


          


 


Average number of servers: 10,000
Operating system: Red Hat Linux
Company headquarters: Santa Clara,
California
Number of employees: About 1,000
Average query time: Less than half a
second
Number of queries Alta Vista handled
per day in 1997: 20 million
Number of Google searches per day
in 1998: 10,000
Current number of Google searches
per day: More than 200 million
Peak time: Between 6:00 am and
noon Pacific Time
Total web pages indexed by the origi-
nal Google prototype: 24 million
Current number of web pages
searched daily: More than 4 billion
Number of native language versions:
89 and growing

Google by the Numbers


APRIL 2004 MAXIMUMPC 45


W


e’ve read hundreds of articles that prattle on endlessly
praising Google’s virtues. Google as a cultural phenomenon.
Google as a verb. Google as a way of life. Blah. Blah. Blah.
But Maximum PC isn’t The Today Show. We prefer hard facts to
cultural context any day of the week. For example, how does a simple
search box rip through 200 million queries a day? How does it sort all
that information in a way that is truly relevant? And finally, how does
Google display those results in less than half a second?
And then there are the tough, behind-the-scenes questions. Are
the folks at Google running some sort of hopped-up supercomputer?

(Short answer: not exactly.) Do clandestine server farms operate
across the globe in unmarked data centers? (Yes.) How does virtually
every question or key word posed to this simple-looking search page
manage to locate information and news that is truly useful? (Um, that’s
complicated.)
To get deep inside the amazing technologies behind Google, we
found an employee willing to share some facts, on the condition he
remain anonymous.
Which means that, as usual, we’ve got answers.

We found an anonymous insider at Google to help us


explain the world’s best search engine. Our mission:


to uncover what software and hardware makes the


world’s best search engine tick.


BY ALICE HILL

Free download pdf