Compare commits

..

25 Commits

Author SHA1 Message Date
Jidong Xiao
9d1335bc18 renaming 2025-04-02 12:43:20 -04:00
Jidong Xiao
0f006e5511 remove some questions 2025-04-02 12:43:20 -04:00
Jidong Xiao
a02407b35d making notes about the function pointer usage 2025-04-02 12:43:20 -04:00
Jidong Xiao
7a7c44ed25 deterministic must be there 2025-04-02 12:43:20 -04:00
Jidong Xiao
92a4145c43 making the program name visible 2025-04-02 12:43:20 -04:00
Jidong Xiao
0c9dd2e191 adding the hash test code 2025-04-02 12:43:20 -04:00
Jidong Xiao
b33e6b0376 move const to before type 2025-04-02 12:43:20 -04:00
Jidong Xiao
51d388be7c adjust indentation 2025-04-02 12:43:20 -04:00
Jidong Xiao
861382884c 22 to 23 2025-04-02 12:43:20 -04:00
Jidong Xiao
a6aef228a6 renaming 23 to 22 2025-04-02 12:43:20 -04:00
Corbin
2b01a1987d Improve wording of C1Q3 in Lab8 2025-04-02 12:43:20 -04:00
Corbin
28f18a41a3 Improve consistency of quotes vs. italics for command names and functions 2025-04-02 12:43:20 -04:00
Samson Kempiak
dad525687d Capitalize company names and pluralize some words
Some company names were lower cased. Also fixed minor pluralization issues.
2025-04-02 12:43:20 -04:00
Jidong Xiao
8aa0663bcb not shianne anymore 2025-04-02 12:43:20 -04:00
Jidong Xiao
7a5376102f adding test info 2025-04-02 12:43:20 -04:00
Jidong Xiao
900469a9ec pre -> post 2025-04-02 12:43:20 -04:00
Jidong Xiao
f5c43c914e adding the iterative approach using stacks 2025-04-02 12:43:20 -04:00
Jidong Xiao
f634f39d05 indentation 2025-04-02 12:43:20 -04:00
Jidong Xiao
eb0d7868dc time complexity for all 3 2025-04-02 12:43:20 -04:00
Jidong Xiao
4bf8776b54 adding post order code and notes 2025-04-02 12:43:20 -04:00
Jidong Xiao
7bcbffc344 adding time complexity 2025-04-02 12:43:20 -04:00
Jidong Xiao
2e1604f3bd adding preorder notes and code 2025-04-02 12:43:20 -04:00
Jidong Xiao
4536070bcd adding binary tree test case 2025-04-02 12:43:20 -04:00
Jidong Xiao
ab5f6417ec adding in order code 2025-04-02 12:43:20 -04:00
Jidong Xiao
38bacc2860 adding rb trees 2025-04-02 12:43:20 -04:00
20 changed files with 1163 additions and 119 deletions

View File

@@ -1,6 +1,6 @@
# Homework 8 — Managing Youtube Comments # Homework 8 — Managing Youtube Comments
In this assignment you will develop a program to manage youtube comments, let's call this program New York Comments. Please read the entire handout before starting to code the assignment. In this assignment you will develop a program to manage YouTube comments, let's call this program New York Comments. Please read the entire handout before starting to code the assignment.
## Learning Objectives ## Learning Objectives
@@ -9,9 +9,9 @@ In this assignment you will develop a program to manage youtube comments, let's
## Background ## Background
A reddit user complained about this: [Why are YouTube comments not threaded like reddit comments? Why is there only one level of nestedness?](https://www.reddit.com/r/youtube/comments/8uei3n/why_are_youtube_comments_not_threaded_like_reddit/). A Reddit user complained about this: [Why are YouTube comments not threaded like Reddit comments? Why is there only one level of nestedness?](https://www.reddit.com/r/youtube/comments/8uei3n/why_are_youtube_comments_not_threaded_like_reddit/).
The complaint is saying that on reddit you will get a nested comment chain like this: The complaint is saying that on Reddit you will get a nested comment chain like this:
``` ```
A: This video is fake A: This video is fake
@@ -29,9 +29,9 @@ A: This video is fake
C: How can you be so dumb? C: How can you be so dumb?
``` ```
Now, is C replying to B or to A? In fact, on YouTube, even if C relies to B, you will still get something like this. The problem is, YouTube does manage their comments in trees, but they only allow the trees to have two levels: parent and children, but there are no grandchildren, and that's what this user refers to as "only one level of nestedness". In order to support multiple level of nestedness, we need to create trees with more than two levels, and that is what you do in this assignment, your goal is to write a program to make youtube display comments the better way, so users can see which comment is a reply to which comment. Now, is C replying to B or to A? In fact, on YouTube, even if C replies to B, you will still get something like this. The problem is, YouTube does manage their comments in trees, but they only allow the trees to have two levels: parent and children, but there are no grandchildren, and that's what this user refers to as "only one level of nestedness". In order to support multiple level of nestedness, we need to create trees with more than two levels, and that is what you do in this assignment, your goal is to write a program to make YouTube display comments the better way, so users can see which comment is a reply to which comment.
If you are still not clear about this problem, try to reply to a comment on youtube, and make sure you reply to a comment which is already a reply to another comment. If you are still not clear about this problem, try to reply to a comment on YouTube, and make sure you reply to a comment which is already a reply to another comment.
## Supported Commands ## Supported Commands
@@ -44,11 +44,11 @@ nycomments.exe input1.json input2.txt output.txt
Here: Here:
- *nycomments.exe* is the executable file name. - *nycomments.exe* is the executable file name.
- input1.json contains existing comments to a youtube video. In this README we will refer to this file as **the json file**. - input1.json contains existing comments to a YouTube video. In this README we will refer to this file as **the json file**.
- input2.txt contains operations we want to perform. In this README we will refer to this file as **the second input file**, or just the **input2** file. - input2.txt contains operations we want to perform. In this README we will refer to this file as **the second input file**, or just the **input2** file.
- output.txt is where to print your output to. - output.txt is where to print your output to.
To summerize what your program does: your program reads all existing comments from **the json file**, store them in trees, and read the operations from **the second input file**, and then perform these operations, and every time there is a "display_comment" operation in **the second input file**, you program display the specified comment into output.txt. If there are multiple *display_comment* operations in **the second input file**, then your program will display all of them in *output.txt*, one by one. To summarize what your program does: your program reads all existing comments from **the json file**, store them in trees, and read the operations from **the second input file**, and then perform these operations, and every time there is a "display_comment" operation in **the second input file**, your program displays the specified comment into output.txt. If there are multiple *display_comment* operations in **the second input file**, then your program will display all of them in *output.txt*, one by one.
## Format of input1.json ## Format of input1.json
@@ -60,9 +60,9 @@ input1.json represents the json files, it stores all existing comments. Each lin
The line is enclosed with a pair of curly braces. And every line has these same fields: The line is enclosed with a pair of curly braces. And every line has these same fields:
- *video_id*: youtube assign each video an id. - *video_id*: YouTube assigns each video an id.
- author: username of the author. - author: username of the author.
- *comment_id*: youtube assign each comment an id. - *comment_id*: YouTube assigns each comment an id.
- *like_count*: how many likes this comment gets. - *like_count*: how many likes this comment gets.
- *reply_count*: how many comments are a reply to this comment. - *reply_count*: how many comments are a reply to this comment.
- *is_reply*: is this a reply to an existing comment? If not, then it's a comment to the video; in other words, every comment, is either a reply to an existing comment (*is_reply* will be true), or is a comment to the original video (*is_reply* will be false). - *is_reply*: is this a reply to an existing comment? If not, then it's a comment to the video; in other words, every comment, is either a reply to an existing comment (*is_reply* will be true), or is a comment to the original video (*is_reply* will be false).
@@ -88,7 +88,7 @@ As can be seen from this above example, a comment which is a direct response to
see the *is_reply* field is true here. see the *is_reply* field is true here.
Our data set includes 6 json files, just to satisfy your curiosity, they include comments corresponding to the following 6 youtube videos: Our data set includes 6 json files, just to satisfy your curiosity, they include comments corresponding to the following 6 YouTube videos:
- hold_me_closer.json is corresponding to this video titled [Elton John, Britney Spears - Hold Me Closer (Official Video)](https://www.youtube.com/watch?v=qExVlz3zb0k). - hold_me_closer.json is corresponding to this video titled [Elton John, Britney Spears - Hold Me Closer (Official Video)](https://www.youtube.com/watch?v=qExVlz3zb0k).
@@ -123,7 +123,7 @@ Here:
2. reply to a comment 2. reply to a comment
A line which starts with the string *reply_to_comment" means this line describes the operation of "reply to a comment". Here is an example: A line which starts with the string "reply_to_comment" means this line describes the operation of *reply to a comment*. Here is an example:
```console ```console
reply_to_comment Ugzsyj0jivPUQdfy_Y94AaABAg Ugzsyj0jivPUQdfy_Y94AaABAg.0 @user1 "Britney is back!" reply_to_comment Ugzsyj0jivPUQdfy_Y94AaABAg Ugzsyj0jivPUQdfy_Y94AaABAg.0 @user1 "Britney is back!"
@@ -141,7 +141,7 @@ This whole lines means that this user *user1* is making a comment with the conte
3. like a comment 3. like a comment
A line which starts with the string *like_comment" means this line describes the operation of "like a comment". Here is an example: A line which starts with the string "like_comment" means this line describes the operation of *like a comment*. Here is an example:
```console ```console
like_comment Ugzsyj0jivPUQdfy_Y94AaABAg.0.1.5.8.888 like_comment Ugzsyj0jivPUQdfy_Y94AaABAg.0.1.5.8.888
@@ -167,9 +167,9 @@ Here:
**Definition of deleting a comment**: in this assignment, the definition of "deleting a comment" means delete this current comment, as well as all its descendants. For example, if A is a comment, B is a reply to A, C is a reply to B, D is also a reply to B, E is a reply to D, F is a reply to E, then the operation of "deleting A" means deleting A, B, C, D, E, and F, i.e., deleting A, and all of its descendants. **Definition of deleting a comment**: in this assignment, the definition of "deleting a comment" means delete this current comment, as well as all its descendants. For example, if A is a comment, B is a reply to A, C is a reply to B, D is also a reply to B, E is a reply to D, F is a reply to E, then the operation of "deleting A" means deleting A, B, C, D, E, and F, i.e., deleting A, and all of its descendants.
The following three pictures from youtube demonstrate the visual effect of this delete process. The following three pictures from YouTube demonstrate the visual effect of this delete process.
- Before delete, we have four comments: "test", "test2", "test3", "test4". "test" and "test4" are both comments to the video, thus they are siblings, and have no parents."test2" is a reply comment to "test", thus "test2" is the child of "test", and "test" is the parent of "test2". "test3" is a reply comment to "test2", thus "test3" is the child of "test2", and "test2" is the parent of "test3". (Question: if "test2" is the parent of "test3", then why do "test2" and "test3" have the same indentation? Well, this is exactly the problem youtube has and it is exactly what we want to you solve in this assignment.) - Before delete, we have four comments: "test", "test2", "test3", "test4". "test" and "test4" are both comments to the video, thus they are siblings, and have no parents."test2" is a reply comment to "test", thus "test2" is the child of "test", and "test" is the parent of "test2". "test3" is a reply comment to "test2", thus "test3" is the child of "test2", and "test2" is the parent of "test3". (Question: if "test2" is the parent of "test3", then why do "test2" and "test3" have the same indentation? Well, this is exactly the problem YouTube has and it is exactly what we want to you solve in this assignment.)
![alt text](before_delete.png "before delete") ![alt text](before_delete.png "before delete")
- Now we want to delete "test". Based on our definition of delete, this should cause the deletion of "test", "test2", and "test3". - Now we want to delete "test". Based on our definition of delete, this should cause the deletion of "test", "test2", and "test3".
@@ -180,7 +180,7 @@ The following three pictures from youtube demonstrate the visual effect of this
5. display comment 5. display comment
A line which starts with the string "display_comment" means this line describes the operation of "display a comment". Here is an example: A line which starts with the string "display_comment" means this line describes the operation of *display a comment*. Here is an example:
```console ```console
display_comment Ugw2rL586Lv-OZNS6E94AaABAH display_comment Ugw2rL586Lv-OZNS6E94AaABAH
@@ -209,7 +209,7 @@ To summarize the rules, in this homework, no sorting is needed, but you need to
### Indentation ### Indentation
Just like youtube, we use indentations to display the tree structure of the comments. The following image is an example from youtube: Just like YouTube, we use indentations to display the tree structure of the comments. The following image is an example from YouTube:
![alt text](comments_indentation.png "comments indentation") ![alt text](comments_indentation.png "comments indentation")
@@ -299,7 +299,7 @@ You can test (but not view) the instructor's code here: [instructor code](http:/
Q1: Why sometimes the reply count does not match with the number of replies displayed? Q1: Why sometimes the reply count does not match with the number of replies displayed?
A1: On youtube, some inappropriate comments are not displayed but they still contribute to the reply count. And such comments are not included in our json files. A1: On YouTube, some inappropriate comments are not displayed but they still contribute to the reply count. And such comments are not included in our json files.
Q2: Can I use the <nlohmann/json.hpp> to parse the json file? Q2: Can I use the <nlohmann/json.hpp> to parse the json file?

View File

@@ -1,4 +1,4 @@
# Lab 12 — Hash Tables # Lab 9 — Hash Tables
<!--In this lab, you will first experiment with our hash table implementation of a set. The key differences between the ds_set class (based on a binary search tree) and the ds_hashset class (based on a hash table, of course), are the performance of insert/find/erase: O(log n) vs. O(1), and the order that the elements are traversed using iterators: the set was in order, while the hashset is in no apparent order.--> <!--In this lab, you will first experiment with our hash table implementation of a set. The key differences between the ds_set class (based on a binary search tree) and the ds_hashset class (based on a hash table, of course), are the performance of insert/find/erase: O(log n) vs. O(1), and the order that the elements are traversed using iterators: the set was in order, while the hashset is in no apparent order.-->

View File

@@ -8,15 +8,15 @@ Pair up with one other student in your lab section and complete the exercises be
problem 1: Draw a binary tree with 4 levels with the integers 1-7 such that the sum of elements on every level of the tree is the same. problem 1: Draw a binary tree with 4 levels with the integers 1-7 such that the sum of elements on every level of the tree is the same.
problem 2: Create a exactly balanced binary search tree with 7 color words (order the colors alphabetically). problem 2: Create an exactly balanced binary search tree with 7 color words (order the colors alphabetically).
problem 3: Arrange the following items of clothing in a tree with 3 levels such that the parent of every node is generally donned before the child when dressing in the morning: jacket, pants, shoes, shirt, undergarments, socks, and belt. problem 3: Arrange the following items of clothing in a tree with 3 levels such that the parent of every node is generally donned before the child when dressing in the morning and nodes at the same level could be donned in any order: jacket, pants, shoes, shirt, undergarments, socks, and belt.
problem 4: Draw a binary search tree with the integers 1-7, where 3 has no parent and 5 has no children, and there are no other elements at the same level as 5. problem 4: Draw a binary search tree with the integers 1-7, where 3 has no parent and 5 has no children, and there are no other elements at the same level as 5.
problem 5: What is the sum of the leaf nodes in a perfectly balanced binary search tree containing the powers of 2 less than 128? problem 5: What is the sum of the leaf nodes in a perfectly balanced binary search tree containing the powers of 2 less than 128?
problem 6: Draw a exactly-balanced binary search tree containing the letters of the word: uncopyrightable problem 6: Draw an exactly-balanced binary search tree containing the letters of the word: uncopyrightable
&nbsp; &nbsp;
&nbsp; &nbsp;
@@ -28,7 +28,7 @@ problem 6: Draw a exactly-balanced binary search tree containing the letters of
What is the pre-order traversal of the tree above? What is the pre-order traversal of the tree above?
problem 7: Now draw a exactly-balanced binary tree of characters such that a post-order traversal spells the word: uncopyrightable problem 7: Now draw an exactly-balanced binary tree of characters such that a post-order traversal spells the word: uncopyrightable
&nbsp; &nbsp;

View File

@@ -1,129 +1,243 @@
# Lecture 21 --- Trees, Part IV # Lecture 21 --- Trees, Part IV
Review from Lecture 20 ## Test 3 Information
- Breadth-first and depth-first tree search
- Increement/decrement operator - Test 3 will be held Thursday, April 3rd, 2025 from 6-7:50pm.
- Tree height, longest-shortest paths, breadth-first search - Students assigned test room, row, and seat assignments will be re-randomized. If on Tuesday evening you still dont have a seating assignment when you log onto Submitty, let us know via the ds_instructors list.
- Last piece of ds_set: removing an item, erase - Coverage: Maps, Sets, Trees, as well as concepts learned prior to test 2. Please remember Recursion and Big O notation goes hand in hand with all the topics taught after Test 2.
- Erase with parent pointers, increment operation on iterators - OPTIONAL: you are allowed to bring one physical piece of 8.5x11” paper, thats two “sides”. We will check at the start of the exam that you do not have more than one piece of paper for your notes!
- Limitations of our ds set implementatioN - All students must bring their Rensselaer photo ID card.
- Bring pencil(s) & eraser (pens are ok, but not recommended).
- Practice problems from previous tests are available on the [course materials](https://submitty.cs.rpi.edu/courses/s25/csci1200/course_materials) page on Submitty.
## Todays Lecture ## Todays Lecture
- Red Black Trees - Binary Tree In-order, Pre-order, Post-order Iterative Traversal
- B+ Trees - Binary Tree Morris Traversal
## 21.1 Red-Black Trees ## 21.1 Binary Tree In-order, Pre-order, Post-order Iterative Traversal
In addition to the binary search tree properties, the following
red-black tree properties are maintained throughout all
modifications to the data structure:
- Each node is either red or black. A common way to traverse a binary tree without recursion, is using an extra container, such as a stack or a queue. Here we will use a stack to perform an in-order traversal of a binary tree; and use a stack to perform a pre-order traversal of a binary tree; but we will need to use two stacks to perform a post-order traversal of a binary tree.
- The root node is always black.
- The NULL child pointers are black.
- Both children of every red node are black.
- Thus, the parent of a red node must also be black.
- All paths from a particular node to a NULL child pointer contain the same
number of black nodes.
![alt text](Red_Black.png "Red_Black Tree example") We will use this binary tree as a test case:
What tree does our **ds_set** implementation produce if we insert the ![alt text](binaryTree.png "Binary Tree Test Case")
numbers 1-14 **in order**?
&nbsp;
&nbsp;
&nbsp;
&nbsp;
&nbsp;
&nbsp;
The tree at the top is the result using a red-black tree. Notice how the tree is still quite balanced. ## 21.1.1 In-order Iteratively
Visit these links for an animation of the sequential insertion and re-balancing: An in-order traversal program is provided here: [inorder_iterative.cpp](inorder_iterative.cpp).
http://babbage.clarku.edu/~achou/cs160fall03/examples/bst_animation/RedBlackTree-Example.html ## 21.1.2 Pre-order Iteratively
https://www.cs.usfca.edu/~galles/visualization/RedBlack.html A pre-order traversal program is provided here: [preorder_iterative.cpp](preorder_iterative.cpp).
http://www.youtube.com/watch?v=vDHFF4wjWYU&noredirect=1 ## 21.1.3 Post-order Iteratively
What is the best/average/worst case height of a red-black tree with $n$ nodes? A post-order traversal program is provided here: [postorder_iterative.cpp](postorder_iterative.cpp).
&nbsp; The best way to understand these programs is to walk through the code using the test case.
&nbsp;
&nbsp;
&nbsp;
&nbsp;
&nbsp;
What is the best/average/worst case shortest-path from root to leaf node in a red-black tree with $n$ nodes? ## 21.2 Existing Binary Tree Traversals
&nbsp; In-order, Pre-order, Post-Order recursively: Time Complexity: O(n), Space Complexity: best case: O(log n), balanced tree; worst case: O(n), completely skewed tree. The space consumption is mostly because of the recursive call stack usage.
&nbsp;
&nbsp;
&nbsp;
&nbsp;
&nbsp;
## Exercise 21.2 In-order, Pre-order, Post-Order iteratively: Time Complexity: O(n); Space Complexity: best case: O(log n), worst case: O(n). The space consumption is mostly because of the stack usage.
Fill in the tree on the right with the integers 1-7 to make a binary search tree. Also, color each node "red" or "black" so that the tree also fulfills the requirements of a Red-Black tree.
![alt text](Red_Black_fillin.png "Red_Black Tree Fill In example") Level-order, iteratively: Time Complexity: O(n), Space Complexity: best case: O(1), for a skewed tree; worst case: O(n), if every node either has zero children, or exactly two children.
Which nodes are red? Can we traverse a binary tree with an O(n) time complexity and O(1) space complexity?
**Note:** Red-Black Trees are just one algorithm for **self-balancing binary search tress**. We have many more, including the AVL trees that we discussed last week. **Note**: when considering space complexity of the traverse, we do not count the memory space used by the tree itself; meaning that the space refers to the extra space, which is introduced by the traverse function.
## 21.3 Trinary Tree ## 21.3 Morris Traversal
A **trinary tree** is similar to a binary tree except that each node has at most 3 children. Morris Traversal is a tree traversal algorithm that allows in-order, pre-order, and post-order traversal of a binary tree without using recursion or a stack/queue, achieving O(1) space complexity. It modifies the tree temporarily but restores it afterward.
Write a **recursive** function named **EqualsChildrenSum** that takes one argument, a pointer to the root of a trinary tree, and returns true if the value at each non-leaf node is the sum of the values of all of its children and false otherwise. In Instead of using extra memory (like recursion stack or an explicit stack), Morris Traversal utilizes threaded binary trees by:
the examples below, the tree on the left will return true and the tree on the right will return false.
- Finding the inorder predecessor of the current node.
- Temporarily modifying the tree structure by creating threads (links) to the current node.
- Using these links to traverse back instead of a recursive call.
- In Morris Traversal:
- for each node which has no left subtree, we visit this node once;
- for each node which has a left subtree, we visit this node twice; and in between the first visit and the second visit of this node, the traverse of the entire left subtree occurs.
## 21.4 Morris Traversal - In Order
- Start from the root.
- If the left subtree is NULL, print the node and move to the right.
- If the left subtree exists, find the inorder predecessor (rightmost node in the left subtree):
- If the predecessors right child is NULL, set it to the current node (threading) and move left.
- If the predecessors right child points to the current node (thread already exists), remove the thread, print the current node, and move right.
- Repeat until you traverse the entire tree.
```cpp ```cpp
class Node { void inorderTraversal(TreeNode* root) {
public: TreeNode *current=root;
int value; TreeNode *rightmost;
Node* left; while(current!=NULL){
Node* middle; if(current->left!=NULL){
Node* right; rightmost=current->left;
}; while(rightmost->right!=NULL && rightmost->right!=current){
rightmost=rightmost->right;
}
if(rightmost->right==NULL){ /* first time */
rightmost->right=current;
current=current->left;
}else{ /* second time */
std::cout << current->val << " ";
rightmost->right=NULL;
current=current->right;
}
}else{ /* nodes which do not have left child */
std::cout << current->val << " ";
current=current->right;
}
}
return;
}
``` ```
![alt text](Trinary_trees.png "Trinary Trees example")
## 21.4 B+ Trees You can test the above function using this program: [inorder_main.cpp](inorder_main.cpp).
Unlike binary search trees, nodes in B+ trees (and their predecessor, the B tree) have up to b children. Thus For this test case,
B+ trees are very flat and very wide. This is good when it is very expensive to move from one node to another.
- B+ trees are supposed to be associative (i.e. they have key-value pairs), but we will just focus on the keys.
- Just like STL map and STL set, these keys and values can be any type, but keys must have an operator<
defined.
- In a B tree key-value pairs can show up anywhere in the tree, in a B+ tree all the key-value pairs are in the
leaves and the non-leaf nodes contain duplicates of some keys.
- In either type of tree, all leaves are the same distance from the root.
- The keys are always sorted in a B/B+ tree node, and there are up to b 1 of them. They act like b 1 binary
search tree nodes mashed together.
- In fact, with the exception of the root, nodes will always have between roughly b/2 and b 1 keys (in our
implementation).
- If a B+ tree node has k keys key0, key1, key2, . . . , keyk1, it will have k + 1 children. The keys in the leftmost
child must be < key0, the next child must have keys such that they are ≥key0 and < key1, and so on up to
the rightmost child which has only keys ≥keyk1.
A B+ tree visualization can be seen at: https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html
![alt text](BplusTrees.png "Bplus_Trees example")
Considerations in a full implementation:
- What happens when we want to add a key to a node that's already full?
- How do we remove values from a node?
- How do we ensure the tree stays balanced?
- How to keep laves linked together? WHy would we want this?
- How to represent key-value pairs?
## 21.5 Exercise
Draw a B+ tree with b=3 with values inserted in the order 1,2,3,4,5,6. Now draw a B+ tree with b=3 and values inserted in the order 6,5,4,3,2,1. The two trees have a different number of levels.
The testing program prints:
```console
$ g++ inorder_main.cpp
$ ./a.out
Inorder Traversal using Morris Traversal:
4 2 6 5 7 1 3 9 8
```
## 21.5 Morris Traversal - Pre Order
To perform preorder traversal:
Print the node before going left instead of after restoring links.
```cpp
void preorderTraversal(TreeNode* root) {
TreeNode *current=root;
TreeNode *rightmost;
while(current != nullptr){
if(current->left != nullptr){
rightmost=current->left;
while(rightmost->right!=nullptr && rightmost->right!=current){
rightmost=rightmost->right;
}
if(rightmost->right==nullptr){ /* visiting the right most node for the first time */
std::cout << current->val << " ";
rightmost->right=current;
current=current->left;
}else{ /* visiting the right most node for the second time */
rightmost->right=nullptr;
current=current->right;
}
}else{ /* nodes which do not have left child */
std::cout << current->val << " ";
current=current->right;
}
}
return;
}
```
You can test the above function using this program: [preorder_main.cpp](preorder_main.cpp).
For above test case, the testing program prints:
```console
$ g++ preorder_main.cpp
$ ./a.out
Preorder Traversal using Morris Traversal:
1 2 4 5 6 7 3 8 9
```
## 21.6 Morris Traversal - Post Order
Post order is different, and we need to write some helper functions here.
```cpp
// function to reverse the right-edge path of a subtree
TreeNode* reverse(TreeNode* head) {
TreeNode* prev = nullptr;
TreeNode* next = nullptr;
while (head != nullptr) {
next = head->right;
head->right = prev;
prev = head;
head = next;
}
return prev;
}
// function to traverse and collect nodes along a reversed right edge
void reverseTraverseRightEdge(TreeNode* head) {
TreeNode* tail = reverse(head);
TreeNode* current = tail;
while (current != nullptr) {
std::cout << current->val << " ";
current = current->right;
}
reverse(tail); // restore the original tree structure
}
// Morris Postorder Traversal
void postorderTraversal(TreeNode* root) {
TreeNode* current = root;
TreeNode* rightmost;
while (current != nullptr) {
if (current->left != nullptr) {
rightmost = current->left;
while (rightmost->right != nullptr && rightmost->right != current) {
rightmost = rightmost->right;
}
if (rightmost->right == nullptr) {
rightmost->right = current;
current = current->left;
} else {
rightmost->right = nullptr;
reverseTraverseRightEdge(current->left);
current = current->right;
}
} else {
current = current->right;
}
}
reverseTraverseRightEdge(root); // traverse the final right edge
return;
}
```
You can test the above function using this program: [postorder_main.cpp](postorder_main.cpp).
For above test case, the testing program prints:
```console
$ g++ postorder_main.cpp
$ ./a.out
Postorder Traversal using Morris Traversal:
4 6 7 5 2 9 8 3 1
```
## Time and Space Complexity in Morris Traversal (in-order, pre-order, post-order)
- Time Complexity: O(N) (each node is visited at most twice)
- Space Complexity: O(1) (no extra space used except for modifying pointers)

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

View File

@@ -0,0 +1,47 @@
#include <iostream>
#include <stack>
class TreeNode {
public:
int val;
TreeNode* left;
TreeNode* right;
TreeNode(int x) : val(x), left(nullptr), right(nullptr) {}
};
// in order traverse a binary tree, iteratively.
void inorderTraversal(TreeNode* root) {
std::stack<TreeNode*> st;
TreeNode* current = root;
while (current != nullptr || !st.empty()) {
while (current != nullptr) { // reach leftmost node
st.push(current);
current = current->left;
}
current = st.top(); // process node
st.pop();
std::cout << current->val << " ";
current = current->right; // move to right subtree
}
}
int main() {
TreeNode* root = new TreeNode(1);
root->left = new TreeNode(2);
root->right = new TreeNode(3);
root->left->left = new TreeNode(4);
root->left->right = new TreeNode(5);
root->left->right->left = new TreeNode(6);
root->left->right->right = new TreeNode(7);
root->right->right = new TreeNode(8);
root->right->right->left = new TreeNode(9);
std::cout << "Inorder Traversal: ";
inorderTraversal(root);
std::cout << std::endl;
return 0;
}

View File

@@ -0,0 +1,53 @@
#include <iostream>
class TreeNode {
public:
int val;
TreeNode* left;
TreeNode* right;
TreeNode(int value) : val(value), left(NULL), right(NULL) {}
};
void inorderTraversal(TreeNode* root) {
TreeNode *current=root;
TreeNode *rightmost;
while(current!=NULL){
if(current->left!=NULL){
rightmost=current->left;
while(rightmost->right!=NULL && rightmost->right!=current){
rightmost=rightmost->right;
}
if(rightmost->right==NULL){ /* first time */
rightmost->right=current;
current=current->left;
}else{ /* second time */
std::cout << current->val << " ";
rightmost->right=NULL;
current=current->right;
}
}else{ /* nodes which do not have left child */
std::cout << current->val << " ";
current=current->right;
}
}
return;
}
int main() {
TreeNode* root = new TreeNode(1);
root->left = new TreeNode(2);
root->right = new TreeNode(3);
root->left->left = new TreeNode(4);
root->left->right = new TreeNode(5);
root->left->right->left = new TreeNode(6);
root->left->right->right = new TreeNode(7);
root->right->right = new TreeNode(8);
root->right->right->left = new TreeNode(9);
std::cout << "Inorder Traversal using Morris Traversal:\n";
inorderTraversal(root);
std::cout << std::endl;
return 0;
}

View File

@@ -0,0 +1,50 @@
#include <iostream>
#include <stack>
class TreeNode {
public:
int val;
TreeNode* left;
TreeNode* right;
TreeNode(int x) : val(x), left(nullptr), right(nullptr) {}
};
// post order traverse a binary tree, iteratively.
void postorderTraversal(TreeNode* root) {
if (root == nullptr) return;
std::stack<TreeNode*> st1, st2;
st1.push(root);
while (!st1.empty()) {
TreeNode* current = st1.top();
st1.pop();
st2.push(current);
if (current->left) st1.push(current->left);
if (current->right) st1.push(current->right);
}
while (!st2.empty()) {
std::cout << st2.top()->val << " ";
st2.pop();
}
}
int main() {
TreeNode* root = new TreeNode(1);
root->left = new TreeNode(2);
root->right = new TreeNode(3);
root->left->left = new TreeNode(4);
root->left->right = new TreeNode(5);
root->left->right->left = new TreeNode(6);
root->left->right->right = new TreeNode(7);
root->right->right = new TreeNode(8);
root->right->right->left = new TreeNode(9);
std::cout << "Inorder Traversal: ";
postorderTraversal(root);
std::cout << std::endl;
return 0;
}

View File

@@ -0,0 +1,83 @@
#include <iostream>
class TreeNode {
public:
int val;
TreeNode* left;
TreeNode* right;
TreeNode(int value) : val(value), left(NULL), right(NULL) {}
};
// function to reverse the right-edge path of a subtree
TreeNode* reverse(TreeNode* head) {
TreeNode* prev = nullptr;
TreeNode* next = nullptr;
while (head != nullptr) {
next = head->right;
head->right = prev;
prev = head;
head = next;
}
return prev;
}
// function to traverse and collect nodes along a reversed right edge
void reverseTraverseRightEdge(TreeNode* head) {
TreeNode* tail = reverse(head);
TreeNode* current = tail;
while (current != nullptr) {
std::cout << current->val << " ";
current = current->right;
}
reverse(tail); // restore the original tree structure
}
// Morris Postorder Traversal
void postorderTraversal(TreeNode* root) {
TreeNode* current = root;
TreeNode* rightmost;
while (current != nullptr) {
if (current->left != nullptr) {
rightmost = current->left;
while (rightmost->right != nullptr && rightmost->right != current) {
rightmost = rightmost->right;
}
if (rightmost->right == nullptr) {
rightmost->right = current;
current = current->left;
} else {
rightmost->right = nullptr;
reverseTraverseRightEdge(current->left);
current = current->right;
}
} else {
current = current->right;
}
}
reverseTraverseRightEdge(root); // traverse the final right edge
return;
}
int main() {
TreeNode* root = new TreeNode(1);
root->left = new TreeNode(2);
root->right = new TreeNode(3);
root->left->left = new TreeNode(4);
root->left->right = new TreeNode(5);
root->left->right->left = new TreeNode(6);
root->left->right->right = new TreeNode(7);
root->right->right = new TreeNode(8);
root->right->right->left = new TreeNode(9);
std::cout << "Postorder Traversal using Morris Traversal:\n";
postorderTraversal(root);
std::cout << std::endl;
return 0;
}

View File

@@ -0,0 +1,45 @@
#include <iostream>
#include <stack>
class TreeNode {
public:
int val;
TreeNode* left;
TreeNode* right;
TreeNode(int x) : val(x), left(nullptr), right(nullptr) {}
};
// pre order traverse a binary tree, iteratively.
void preorderTraversal(TreeNode* root) {
if (root == nullptr) return;
std::stack<TreeNode*> st;
st.push(root);
while (!st.empty()) {
TreeNode* current = st.top();
st.pop();
std::cout << current->val << " ";
if (current->right) st.push(current->right); // push right first
if (current->left) st.push(current->left); // then push left
}
}
int main() {
TreeNode* root = new TreeNode(1);
root->left = new TreeNode(2);
root->right = new TreeNode(3);
root->left->left = new TreeNode(4);
root->left->right = new TreeNode(5);
root->left->right->left = new TreeNode(6);
root->left->right->right = new TreeNode(7);
root->right->right = new TreeNode(8);
root->right->right->left = new TreeNode(9);
std::cout << "Inorder Traversal: ";
preorderTraversal(root);
std::cout << std::endl;
return 0;
}

View File

@@ -0,0 +1,53 @@
#include <iostream>
class TreeNode {
public:
int val;
TreeNode* left;
TreeNode* right;
TreeNode(int value) : val(value), left(NULL), right(NULL) {}
};
void preorderTraversal(TreeNode* root) {
TreeNode *current=root;
TreeNode *rightmost;
while(current != nullptr){
if(current->left != nullptr){
rightmost=current->left;
while(rightmost->right!=nullptr && rightmost->right!=current){
rightmost=rightmost->right;
}
if(rightmost->right==nullptr){ /* first time */
std::cout << current->val << " ";
rightmost->right=current;
current=current->left;
}else{ /* second time */
rightmost->right=nullptr;
current=current->right;
}
}else{ /* nodes which do not have left child */
std::cout << current->val << " ";
current=current->right;
}
}
return;
}
int main() {
TreeNode* root = new TreeNode(1);
root->left = new TreeNode(2);
root->right = new TreeNode(3);
root->left->left = new TreeNode(4);
root->left->right = new TreeNode(5);
root->left->right->left = new TreeNode(6);
root->left->right->right = new TreeNode(7);
root->right->right = new TreeNode(8);
root->right->right->left = new TreeNode(9);
std::cout << "Preorder Traversal using Morris Traversal:\n";
preorderTraversal(root);
std::cout << std::endl;
return 0;
}

View File

@@ -0,0 +1,361 @@
# Lecture 22 --- Hash Tables
## Todays Lecture
- Hash Tables, Hash Functions, and Collision Resolution <!--(leetcode 1, 705, 706)-->
- Performance of: Hash Tables vs. Binary Search Trees
- Collision resolution: separate chaining vs open addressing
- STLs unordered_set and unordered_map
- Using a hash table to implement a set/map
<!-- Hash functions as functors/function objects (leetcode 1451: Rearrange Words in a Sentence)
Iterators, find, insert, and erase-->
## 22.1 Definition: Whats a Hash Table?
- A table implementation with constant time access.
- Like a set, we can store elements in a collection. Or like a map, we can store key-value pair associations in the hash table. But its even faster to do find, insert, and erase with a hash table! However, hash tables do not store the data in sorted order.
- A hash table is implemented with an array at the top level.
- Each element or key is mapped to a slot in the array by a hash function.
## 22.2 Definition: Whats a Hash Function?
- A simple function of one argument (the key) which returns an integer index (a bucket or slot in the array).
- Ideally the function will “uniformly” distribute the keys throughout the range of legal index values (0 → k-1).
- Whats a collision?
- When the hash function maps multiple (different) keys to the same index.
- How do we deal with collisions?
- One way to resolve this is by storing a linked list of values at each slot in the array.
## 22.3 Example: Caller ID
- We are given a phonebook with 50,000 name/number pairings. Each number is a 10 digit number. We need to
create a data structure to lookup the name matching a particular phone number. Ideally, name lookup should
be O(1) time expected, and the caller ID system should use O(n) memory (n = 50,000).
- Note: In the toy implementations that follow we use small datasets, but we should evaluate the system scaled
up to handle the large dataset.
- The basic interface:
```cpp
// add several names to the phonebook
add(phonebook, 1111, "fred");
add(phonebook, 2222, "sally");
add(phonebook, 3333, "george");
// test the phonebook
std::cout << identify(phonebook, 2222) << " is calling!" << std::endl;
std::cout << identify(phonebook, 4444) << " is calling!" << std::endl;
```
<!--- Well review how we solved this problem in Lab 9 with an STL vector then an STL map. Finally, well implement the system with a hash table.-->
## 22.4 Caller ID with an STL Vector
```cpp
// create an empty phonebook
std::vector<std::string> phonebook(10000, "UNKNOWN CALLER");
void add(std::vector<std::string> &phonebook, int number, std::string name) {
phonebook[number] = name;
}
std::string identify(const std::vector<std::string> &phonebook, int number) {
return phonebook[number];
}
```
Exercise: What's the memory complexity for the vector-based Caller ID system?
What's the expected runtime complexity for identify, insert, and erase?
## 22.5 Caller ID with an STL Map
```cpp
// create an empty phonebook
std::map<int,std::string> phonebook;
void add(std::map<int,std::string> &phonebook, int number, std::string name) {
phonebook[number] = name;
}
std::string identify(const std::map<int,std::string> &phonebook, int number) {
map<int,std::string>::const_iterator tmp = phonebook.find(number);
if (tmp == phonebook.end()){
return "UNKNOWN CALLER";
}else{
return tmp->second;
}
}
```
Exercise: What's the memory complexity for the map-based Caller ID system?
What's the expected runtime complexity for identify, add, and erase?
## 22.6 Now let's implement Caller ID with a Hash Table
![alt text](phonebook.png "phonebook")
```cpp
#define PHONEBOOK_SIZE 10
class Node {
public:
int number;
string name;
Node* next;
};
// create the phonebook, initially all numbers are unassigned
Node* phonebook[PHONEBOOK_SIZE];
for (int i = 0; i < PHONEBOOK_SIZE; i++) {
phonebook[i] = NULL;
}
// corresponds a phone number to a slot in the array
int hash_function(int number) {
}
// add a number, name pair to the phonebook
void add(Node* phonebook[PHONEBOOK_SIZE], int number, string name) {
}
// given a phone number, determine who is calling
std::string identify(Node* phonebook[PHONEBOOK_SIZE], int number) {
}
```
## 22.7 Exercise: Hash Table Performance
- What's the memory complexity for the hash-table-based Caller ID system?
- What's the expected runtime complexity for identify, insert, and erase?
## 22.8 What makes a Good Hash Function?
- Deterministic same input always produces the same hash.
- Goals: fast O(1) computation and a random, uniform distribution of keys throughout the table,
despite the actual distribution of keys that are to be stored.
- For example, using: f(k) = abs(k)%N as our hash function satisfies the first requirement, but may not
satisfy the second.
- Another example of a dangerous hash function on string keys is to add or multiply the ascii values of each char:
```cpp
unsigned int hash(const std::string& k, unsigned int N) {
unsigned int value = 0;
for (unsigned int i=0; i<k.size(); ++i) {
value += k[i]; // conversion to int is automatic
}
return value % N;
}
```
The problem is its high collision rate:
1. Anagrams (e.g., "listen" and "silent") get the same hash.
2. Many different strings can sum to the same value, leading to collisions.
- This can be improved through multiplications that involve the position and value of the key:
```cpp
unsigned int hash(const std::string& k, unsigned int N) {
unsigned int value = 0;
for (unsigned int i=0; i<k.size(); ++i) {
value = value*31 + k[i]; // conversion to int is automatic
}
return value % N;
}
```
- The 2nd method is better, but can be improved further. The theory of good hash functions is quite involved and beyond the scope of this course.
- You can run this program [hash_test.cpp](hash_test.cpp) which will show that the second hash function produces a lower collision rate:
```console
$ g++ -Wall -Wextra hash_test.cpp
$ ./a.out
Testing badHash (Summing ASCII values):
Total Collisions: 4914
Execution Time: 0.000142 seconds
Testing betterHash (Multiplication by 31, a prime):
Total Collisions: 4000
Execution Time: 0.000148 seconds
```
## 22.9 How do we Resolve Collisions? METHOD 1: Separate Chaining
- Each table location stores a linked list of keys (and values) hashed to that location (as shown above in the phonebook hashtable). Thus, the hashing function really just selects which list to search or modify.
- This works well when the number of items stored in each list is small, e.g., an average of 1. Other data structures, such as binary search trees, may be used in place of the list, but these have even greater overhead considering the (hopefully, very small) number of items stored per bin.
## 22.10 How do we Resolve Collisions? METHOD 2: Open Addressing
- In open addressing, when the chosen table location already stores a key (or key-value pair), a different table location is sought in order to store the new value (or pair).
- Here are three different open addressing variations to handle a collision during an insert operation:
Linear probing: If i is the chosen hash location then the following sequence of table locations is tested (“probed”) until an empty location is found:
```console
(i+1)%N, (i+2)%N, (i+3)%N, ...
```
Quadratic probing: If i is the hash location then the following sequence of table locations is tested:
```console
(i+1)%N, (i+2*2)%N, (i+3*3)%N, (i+4*4)%N, ...
```
More generally, the jth “probe” of the table is (i + c<sub>1</sub>j + c<sub>2</sub>j<sup>2</sup>) mod N where c<sub>1</sub> and c<sub>2</sub> are constants.
Secondary hashing: when a collision occurs a second hash function is applied to compute a new table location. This is repeated until an empty location is found.
- For each of these approaches, the find operation follows the same sequence of locations as the insert operation. The key value is determined to be absent from the table only when an empty location is found.
- When using open addressing to resolve collisions, the erase function must mark a location as “formerly occupied”. If a location is instead marked empty, find may fail to return elements in the table. Formerly occupied locations may (and should) be reused, but only after the find operation has been run to completion.
- Problems with open addressing:
Slows dramatically when the table is nearly full (e.g. about 80% or higher). This is particularly problematic for linear probing.
Fails completely when the table is full.
Cost of computing new hash values.
## 22.11 Hash Table in STL?
- The Standard Template Library standard and implementation of hash table have been slowly evolving over
many years. Unfortunately, the names “hashset” and “hashmap” were spoiled by developers anticipating the
STL standard, so to avoid breaking or having name clashes with code using these early implementations...
- STLs agreed-upon standard for hash tables: unordered_set and unordered_map.
- You can use std::unordered_set the same way as you use std::set, even though the internal of these two are different, the external interface are the same.
- You can use std::unordered_map the same way as you use std::map, even though the internal of these two are different, the external interface are the same.
<!--- Depending on your OS/compiler, you may need to add the -std=c++11 flag to the compile line (or other
configuration tweaks) to access these more recent pieces of STL. (And this will certainly continue to evolve
in future years!) Also, for many types STL has a good default hash function, so you may not always need to
specify both template parameters!-->
<!--## 20.13 Our Copycat Version: A Set As a Hash Table
- The class is templated over both the key type and the hash function type.
```cpp
template < class KeyType, class HashFunc >
class ds_hashset { ... };
```
- We use separate chaining for collision resolution. Hence the main data structure inside the class is:
```cpp
std::vector< std::list<KeyType> > m_table;
```
- We will use automatic resizing when our table is too full. Resize is expensive of course, so similar to
the automatic reallocation that occurs inside the vector push_back function, we at least double the size of
underlying structure to ensure it is rarely needed.
## 20.14 Our Hash Function (as a Functor or Function Object)
- Next lecture well talk about “function objects” or “functors”.... A functor is just a class wrapper around a
function, and the function is implemented as the overloaded function call operator for the class.
- Often the programmer/designer for the program using a hash function has the best understanding of the
distribution of data to be stored in the hash function. Thus, they are in the best position to define a custom
hash function (if needed) for the data & application.
- Heres an example of a (generically) good hash function for STL strings, wrapped up inside of a class:
```cpp
class hash_string_obj {
public:
unsigned int operator() (std::string const& key) const {
// This implementation comes from
// http://www.partow.net/programming/hashfunctions/
unsigned int hash = 1315423911;
for(unsigned int i = 0; i < key.length(); i++){
hash ^= ((hash << 5) + key[i] + (hash >> 2));
}
return hash;
}
};
```
- Once our new type containing the hash function is defined, we can create instances of our hash set object
containing std::string by specifying the type hash_string_obj as the second template parameter to the
declaration of a ds_hashset. E.g.,
```cpp
ds_hashset<std::string, hash_string_obj> my_hashset;
```
- Alternatively, we could use function pointers as a non-type template argument.
(We dont show that syntax here!).
## 20.15 Hash Set Iterators
- Iterators move through the hash table in the order of the storage locations rather than the ordering imposed by (say) an operator<. Thus, the visiting/printing order depends on the hash function and the table size.
- Hence the increment operators must move to the next entry in the current linked list or, if the end of the current list is reached, to the first entry in the next non-empty list.
- The declaration is nested inside the ds_hashset declaration in order to avoid explicitly templating the iterator
over the hash function type.
- The iterator must store:
- A pointer to the hash table it is associated with. This reflects a subtle point about types: even though
the iterator class is declared inside the ds_hashset, this does not mean an iterator automatically knows
about any particular ds_hashset.
- The index of the current list in the hash table.
- An iterator referencing the current location in the current list.
- Because of the way the classes are nested, the iterator class object must declare the ds_hashset class as a friend, but the reverse is unnecessary.
## 20.16 Implementing begin() and end()
- begin(): Skips over empty lists to find the first key in the table. It must tie the iterator being created to
the particular ds_hashset object it is applied to. This is done by passing the this pointer to the iterator
constructor.
- end(): Also associates the iterator with the specific table, assigns an index of -1 (indicating it is not a normal
valid index), and thus does not assign the particular list iterator.
- Exercise: Implement the begin() function.
## 20.17 Iterator Increment, Decrement, & Comparison Operators
- The increment operators must find the next key, either in the current list, or in the next non-empty list.
- The decrement operator must check if the iterator in the list is at the beginning and if so it must proceed to
find the previous non-empty list and then find the last entry in that list. This might sound expensive, but
remember that the lists should be very short.
- The comparison operators must accommodate the fact that when (at least) one of the iterators is the end, the
internal list iterator will not have a useful value.
## 20.18 Insert & Find
- Computes the hash function value and then the index location.
- If the key is already in the list that is at the index location, then no changes are made to the set, but an iterator
is created referencing the location of the key, a pair is returned with this iterator and false.
- If the key is not in the list at the index location, then the key should be inserted in the list (at the front is
fine), and an iterator is created referencing the location of the newly-inserted key a pair is returned with this
iterator and true.
- Exercise: Implement the insert() function, ignoring for now the resize operation.
- Find is similar to insert, computing the hash function and index, followed by a std::find operation.
## 20.19 Erase
- Two versions are implemented, one based on a key value and one based on an iterator. These are based on
finding the appropriate iterator location in the appropriate list, and applying the list erase function.
## 20.20 Resize
- Must copy the contents of the current vector into a scratch vector, resize the current vector, and then re-insert
each key into the resized vector. Exercise: Write resize().
## 20.21 Hash Table Iterator Invalidation
- Any insert operation invalidates all ds_hashset iterators because the insert operation could cause a resize of
the table. The erase function only invalidates an iterator that references the current object.-->
## 22.12 Leetcode Exercises
- [Leetcode problem 1: Two Sum](https://leetcode.com/problems/two-sum/). Solution: [p1_twosum_hash_table.cpp](../../leetcode/p1_twosum_hash_table.cpp).
**Note**: make sure you understand this longest consecutive sequence problem and its solution, because you will re-write this function in the lab.
- [Leetcode problem 128: Longest Consecutive Sequence](https://leetcode.com/problems/longest-consecutive-sequence/). Solution: [p128_longest_consecutive_sequence.cpp](../../leetcode/p128_longest_consecutive_sequence.cpp).

View File

@@ -0,0 +1,26 @@
int hash_function(int number) {
//return 5; /// BAD: always the same
//return number / 1000000000; /// BAD: first bad
return number % PHONEBOOK_SIZE; // REASONABLY GOOD:
}
void add(Node* phonebook[PHONEBOOK_SIZE], int number, const std::string& name) {
int index = hash_function(number) % PHONEBOOK_SIZE;
Node* tmp = new Node;
tmp->name = name;
tmp->number = number;
tmp->next = phonebook[index];
phonebook[index] = tmp;
// what about duplicate / repeated add?
}
std::string identify(Node* phonebook[PHONEBOOK_SIZE], int number) {
Node* current = phonebook[ hash_function(number) % PHONEBOOK_SIZE ];
while ( current != NULL && current->number != number ) {
current = current->next;
}
if (current == NULL) return "UNKNOWN CALLER";
return current->name;
}

View File

@@ -0,0 +1,83 @@
#include <iostream>
#include <vector>
#include <unordered_map>
#include <set>
#include <string>
#include <ctime>
unsigned int badHash(const std::string& k, unsigned int N) {
unsigned int value = 0;
for (unsigned int i = 0; i < k.size(); ++i) {
value += k[i]; // simple sum of ASCII values
}
return value % N;
}
unsigned int betterHash(const std::string& k, unsigned int N) {
unsigned int value = 0;
unsigned int prime = 31;
for (unsigned int i = 0; i < k.size(); ++i) {
value = value * prime + k[i]; // use multiplications which involve the position and value of the key; also uses prime for better distribution
}
return value % N;
}
// a good hash function should distribute values evenly across N buckets.
// note that function pointers are used here
void testCollisions(unsigned int (*hashFunc)(const std::string&, unsigned int),
const std::vector<std::string>& testStrings, unsigned int N) {
std::unordered_map<unsigned int, int> bucketCounts;
for (const std::string& str : testStrings) {
unsigned int hashValue = hashFunc(str, N);
bucketCounts[hashValue]++;
}
// count how many buckets have collisions
int collisions = 0;
for (const std::pair<unsigned int, int> entry : bucketCounts) {
if (entry.second > 1) {
collisions += (entry.second - 1);
}
}
std::cout << "Total Collisions: " << collisions << std::endl;
}
// generate many test strings and see how well they spread over N buckets.
std::vector<std::string> generateTestStrings(int count) {
std::vector<std::string> testStrings;
for (int i = 0; i < count; i++) {
std::string str = "str" + std::to_string(i); // example: "str0", "str1"...
testStrings.push_back(str);
}
return testStrings;
}
// note that function pointers are used here
void benchmark(unsigned int (*hashFunc)(const std::string&, unsigned int),
const std::vector<std::string>& testStrings, unsigned int N) {
clock_t start = clock();
for (const std::string& str : testStrings) {
hashFunc(str, N);
}
clock_t end = clock();
double timeTaken = double(end - start) / CLOCKS_PER_SEC;
std::cout << "Execution Time: " << timeTaken << " seconds" << std::endl;
}
int main() {
unsigned int N = 1000; // hash table size
std::vector<std::string> testStrings = generateTestStrings(5000);
std::cout << "Testing badHash (Summing ASCII values):\n";
testCollisions(badHash, testStrings, N);
benchmark(badHash, testStrings, N);
std::cout << "\nTesting betterHash (Multiplication by 31, a prime):\n";
testCollisions(betterHash, testStrings, N);
benchmark(betterHash, testStrings, N);
return 0;
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

View File

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 25 KiB

129
lectures/rb_trees/README.md Normal file
View File

@@ -0,0 +1,129 @@
# Lecture 21 --- Trees, Part IV
Review from Lecture 20
- Breadth-first and depth-first tree search
- Increement/decrement operator
- Tree height, longest-shortest paths, breadth-first search
- Last piece of ds_set: removing an item, erase
- Erase with parent pointers, increment operation on iterators
- Limitations of our ds set implementatioN
## Todays Lecture
- Red Black Trees
- B+ Trees
## 21.1 Red-Black Trees
In addition to the binary search tree properties, the following
red-black tree properties are maintained throughout all
modifications to the data structure:
- Each node is either red or black.
- The root node is always black.
- The NULL child pointers are black.
- Both children of every red node are black.
- Thus, the parent of a red node must also be black.
- All paths from a particular node to a NULL child pointer contain the same
number of black nodes.
![alt text](Red_Black.png "Red_Black Tree example")
What tree does our **ds_set** implementation produce if we insert the
numbers 1-14 **in order**?
&nbsp;
&nbsp;
&nbsp;
&nbsp;
&nbsp;
&nbsp;
The tree at the top is the result using a red-black tree. Notice how the tree is still quite balanced.
Visit these links for an animation of the sequential insertion and re-balancing:
http://babbage.clarku.edu/~achou/cs160fall03/examples/bst_animation/RedBlackTree-Example.html
https://www.cs.usfca.edu/~galles/visualization/RedBlack.html
http://www.youtube.com/watch?v=vDHFF4wjWYU&noredirect=1
What is the best/average/worst case height of a red-black tree with $n$ nodes?
&nbsp;
&nbsp;
&nbsp;
&nbsp;
&nbsp;
&nbsp;
What is the best/average/worst case shortest-path from root to leaf node in a red-black tree with $n$ nodes?
&nbsp;
&nbsp;
&nbsp;
&nbsp;
&nbsp;
&nbsp;
## Exercise 21.2
Fill in the tree on the right with the integers 1-7 to make a binary search tree. Also, color each node "red" or "black" so that the tree also fulfills the requirements of a Red-Black tree.
![alt text](Red_Black_fillin.png "Red_Black Tree Fill In example")
Which nodes are red?
**Note:** Red-Black Trees are just one algorithm for **self-balancing binary search tress**. We have many more, including the AVL trees that we discussed last week.
## 21.3 Trinary Tree
A **trinary tree** is similar to a binary tree except that each node has at most 3 children.
Write a **recursive** function named **EqualsChildrenSum** that takes one argument, a pointer to the root of a trinary tree, and returns true if the value at each non-leaf node is the sum of the values of all of its children and false otherwise. In
the examples below, the tree on the left will return true and the tree on the right will return false.
```cpp
class Node {
public:
int value;
Node* left;
Node* middle;
Node* right;
};
```
![alt text](Trinary_trees.png "Trinary Trees example")
## 21.4 B+ Trees
Unlike binary search trees, nodes in B+ trees (and their predecessor, the B tree) have up to b children. Thus
B+ trees are very flat and very wide. This is good when it is very expensive to move from one node to another.
- B+ trees are supposed to be associative (i.e. they have key-value pairs), but we will just focus on the keys.
- Just like STL map and STL set, these keys and values can be any type, but keys must have an operator<
defined.
- In a B tree key-value pairs can show up anywhere in the tree, in a B+ tree all the key-value pairs are in the
leaves and the non-leaf nodes contain duplicates of some keys.
- In either type of tree, all leaves are the same distance from the root.
- The keys are always sorted in a B/B+ tree node, and there are up to b 1 of them. They act like b 1 binary
search tree nodes mashed together.
- In fact, with the exception of the root, nodes will always have between roughly b/2 and b 1 keys (in our
implementation).
- If a B+ tree node has k keys key0, key1, key2, . . . , keyk1, it will have k + 1 children. The keys in the leftmost
child must be < key0, the next child must have keys such that they are ≥key0 and < key1, and so on up to
the rightmost child which has only keys ≥keyk1.
A B+ tree visualization can be seen at: https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html
![alt text](BplusTrees.png "Bplus_Trees example")
Considerations in a full implementation:
- What happens when we want to add a key to a node that's already full?
- How do we remove values from a node?
- How do we ensure the tree stays balanced?
- How to keep laves linked together? WHy would we want this?
- How to represent key-value pairs?
## 21.5 Exercise
Draw a B+ tree with b=3 with values inserted in the order 1,2,3,4,5,6. Now draw a B+ tree with b=3 and values inserted in the order 6,5,4,3,2,1. The two trees have a different number of levels.

View File

Before

Width:  |  Height:  |  Size: 90 KiB

After

Width:  |  Height:  |  Size: 90 KiB

View File

Before

Width:  |  Height:  |  Size: 74 KiB

After

Width:  |  Height:  |  Size: 74 KiB

View File

Before

Width:  |  Height:  |  Size: 56 KiB

After

Width:  |  Height:  |  Size: 56 KiB