From 20d538007085643a78bdcab6899b45326483fe1b Mon Sep 17 00:00:00 2001 From: Jidong Xiao Date: Tue, 17 Oct 2023 16:38:32 -0400 Subject: [PATCH] question mark to comma --- hws/06_search_engine/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hws/06_search_engine/README.md b/hws/06_search_engine/README.md index 328750c..0560d2e 100644 --- a/hws/06_search_engine/README.md +++ b/hws/06_search_engine/README.md @@ -9,7 +9,7 @@ In this assignment you will develop a simple search engine called New York Searc ## Background -When talking about Google Search Engine? what words come to your mind? Page Ranking? Inverted Indexing? Web Crawler? +When talking about Google Search Engine, what words come to your mind? Page Ranking? Inverted Indexing? Web Crawler? When developing a search engine, the first question we want to ask is, where to start? When you type "Selena Gomez" or "Tom Brady" in the search box in Google, where does Google start? Does Google start searching from one specific website? The answer is Google does not start from one specific website, rather they maintain a list of URLs which are called Seed URLs. These Seed URLs are manually chosen which represent a diverse range of high-quality, reputable websites. Search engines usually have a component called web crawler, which crawls these URLs and then follow links from these web pages to other web pages. As the web crawler crawls these other web pages, it collects links from these other web pages to more web pages, and then follow these links to crawl more web pages. This process continues, ultimately, the goal is to discover as many web pages as possible. Once all pages are visited, the search engine will build a map, which is known as the inverted index, which maps terms (i.e., individual words) to web pages (also known as Documents). Below is an example: