update the formula, to make it clear

This commit is contained in:
Jidong Xiao
2023-11-03 14:38:24 -04:00
parent e1ba3caf6b
commit 033ef43743

View File

@@ -100,30 +100,30 @@ Keyword Density Score = (Number of Times Keyword Appears) / (Total Content Lengt
Here, we consider the content of each document as a string. Also, here "Total Content Length" means the total length of the whole document, not just the length of the <body> section; and the "Number of Times Keyword Appears" means the number of times the keyword appears in the whole document, not just in the <body> section. Similarly, when calculating the "Keyword Density Across All Documents", you should also consider the whole document, not just the <body> section.
Let's explain this formula with an example: let's say we have 3 documents in total, and the user wants to search *Tom Cruise*. Assume the first document has 50 characters (i.e., the document length of the first document is 50), and the second document has 40 characters, and the third document has 100 characters. The keyword *Tom* appears in the first document 2 times, appears in the second document 3 times, appears in the third document 4 times. Then for this keyword *Tom*, the density across all documents would be:
Let's explain this formula with an example: let's say we have 4 documents in total, and the user wants to search *Tom Cruise*. Assume the first document has 50 characters (i.e., the document length of the first document is 50), the second document has 40 characters, the third document has 100 characters, and the fourth document has 200 characters. The keyword *Tom* appears in the first document 2 times, appears in the second document 3 times, appears in the third document 4 times, and appears in the fourth document 0 times. Then for this keyword *Tom*, the density across all documents would be:
```console
(2 + 3 + 4) / (50 + 40 + 100) = 0.047
(2 + 3 + 4 + 0) / (50 + 40 + 100 + 200) = 0.023
```
and the keyword density score for this keyword *Tom* in the first document, would be:
```console
2 / (50 * 0.047) = 0.851
2 / (50 * 0.023) = 1.739
```
and the keyword density score for this keyword *Tom* in the second document, would be:
```console
3 / (40 * 0.047) = 1.596
3 / (40 * 0.023) = 3.261
```
and the keyword density score for this keyword *Tom* in the third document, would be:
```console
4 / (100 * 0.047) = 0.851
4 / (100 * 0.023) = 1.739
```