adding description on input and output

This commit is contained in:
Jidong Xiao
2023-10-24 00:22:32 -04:00
parent aeed157ff0
commit 05d2196ea7

View File

@@ -184,7 +184,7 @@ But it is not a valid document if the user does a phrase search - "*Tom Cruise*"
All the input files are HTML files, and they are provided under the [html_files](html_files) directory. Among these HTML files, there is only one HTML file which will be provided via the command line, and this file will be considered as the Seed file, and the path of this file (i.e. html_files/index.html) therefore will be used as the Seed URL. Your web crawler should search this HTML file and find links contained in this HTML file, and then follow these links to crawl other HTML files, and repeat this process until you can not reach any more files. Keep in mind that links which take you to an HTML file which you have already crawled, should be skipped, otherwise you will get into an infinite loop situation.
### Output File Format and Order
## Output File Format
The output of your program should go to the output file.