I N F O L O G Y: Exploring a ‘Deep Web’ That Google Can’t Grasp

Tuesday, February 24, 2009

Exploring a ‘Deep Web’ That Google Can’t Grasp

One day last summer, Google’s search engine trundled quietly past a milestone. It added the one trillionth address to the list of Web pages it knows about. But as impossibly big as that number may seem, it represents only a fraction of the entire Web.

Beyond those trillion pages lies an even vaster Web of hidden data: financial information, shopping catalogs, flight schedules, medical research and all kinds of other material stored in databases that remain largely invisible to search engines.

Now a new breed of technologies is taking shape that will extend the reach of search engines into the Web’s hidden corners. When that happens, it will do more than just improve the quality of search results — it may ultimately reshape the way many companies do business online.

Search engines rely on programs known as crawlers (or spiders) that gather information by following the trails of hyperlinks that tie the Web together. While that approach works well for the pages that make up the surface Web, these programs have a harder time penetrating databases that are set up to respond to typed queries. To extract meaningful data from the Deep Web, search engines have to analyze users’ search terms and figure out how to broker those queries to particular databases. Google’s Deep Web search strategy involves sending out a program to analyze the contents of every database it encounters. As the major search engines start to experiment with incorporating Deep Web content into their search results, they must figure out how to present different kinds of data without overcomplicating their pages. This poses a particular quandary for Google, which has long resisted the temptation to make significant changes to its tried-and-true search results format.

Beyond the realm of consumer searches, Deep Web technologies may eventually let businesses use data in new ways. This level of data integration could eventually point the way toward something like the Semantic Web, the much-promoted — but so far unrealized — vision of a Web of interconnected data. Deep Web technologies hold the promise of achieving similar benefits at a much lower cost, by automating the process of analyzing database structures and cross-referencing the results.

1 comment:

DarcyFebruary 26, 2009 at 3:12 PM
This was an interesting article, although Deep Web Technologies have been around for years. My company has been surfacing the deep web with sites like Science.gov, WorldWideScience.org and Mednar.com since 2002. Our twist- we don't index the web but search all the sources in real-time so data is not stale.
ReplyDelete
Replies

Add comment

Faculty Members

Dr. Mohamed Haneefa K.
Dr. Vasudevan T.M.
Mrs. Bincy O.K.
Ms. Sajana K.P.
Mr. Ashraf K.

Courses

MLISc
MPhil
Phd

Thrust Areas

Information and Communication Technology
Library Automation and Networking
Knowledge Organization and Management
Information Processing and Retrieval
Marketing of Information Products and Services

PhD Awardees

Dr. Soman K.N.
Dr. Asha B.
Dr. Nasirudheen T.P.O.
Dr. Dineshan Koovakkai
Dr. Arifa K.
Dr. Mohamed Haneefa K.
Dr. M.G. Sreekumar
Dr. Hemachandran Nair G.
Dr. Abdul Majeed K.C.
Dr. Veerankutty C.
Dr. Muraleedharan P.
Dr. Rekha Rani Varghese
Dr. Vinod V.M.
Dr. A.C. Rajan
Dr. Meena V.
Dr. Sheeja N.K.
Dr. Abdul Azeez T.A.
Dr. Vijayakumar A.
Dr. Dinesan K.
Dr. Manzoor Babu V.
Dr. Abdul Salam M.P.
Dr. Suja K.
Dr. George P.V.
Dr. Vasudevan T.M.
Dr. Mehaboobullah K.

DELISAA

The Department of Library and Information Science Alumni Association (DELISAA) was formed on November 05 2003. The Alumni Association is dedicated to the promotion of professional development and building a strong network among the alumni around the world. DELISAA provides a wide range of opportunities, services and resources, special alumni events, continuing education programs, alumni career services, conferences, alumni contact information, reunion programs, etc. The Department alumni are occupying librarian positions in well reputed institutions and companies like VSSC, IIM, IIT, NIT, IIITMK, TCS, IBM and various Universities in India. The alumni maintain a close relation with the Department, helping each other in their growth.

I N F O L O G Y

Tuesday, February 24, 2009

Exploring a ‘Deep Web’ That Google Can’t Grasp

1 comment:

Department of Library and Information Science

Dr. Mohamed Haneefa K. Associate Professor & Head

Search This Blog

Faculty Members

Courses

Thrust Areas

PhD Awardees

DELISAA

L I N K S

Blog Followers

Blog Archive