From 7c8f9a78e06ce4a0b131ab46f99a14f93b7f800d Mon Sep 17 00:00:00 2001 From: Jidong Xiao Date: Sat, 25 Nov 2023 00:03:18 -0500 Subject: [PATCH] adding lecture 25, garbage collection --- lectures/25_garbage_collection/README.md | 281 +++++++++++++++++++++++ 1 file changed, 281 insertions(+) create mode 100644 lectures/25_garbage_collection/README.md diff --git a/lectures/25_garbage_collection/README.md b/lectures/25_garbage_collection/README.md new file mode 100644 index 0000000..edc7912 --- /dev/null +++ b/lectures/25_garbage_collection/README.md @@ -0,0 +1,281 @@ +# Lecture 25 --- Garbage Collection & Smart Pointers + +## Today’s Lecture + +- What is Garbage? +- 3 Garbage Collection Techniques +- Smart Pointers + +## 25.1 What is Garbage? + +- Not everything sitting in memory is useful. Garbage is anything that cannot have any influence on the future +computation. +- With C++, the programmer is expected to perform explicit memory management. You must use delete when +you are done with dynamically allocated memory (which was created with new). +- In Java, and other languages with “garbage collection”, you are not required to explicitly de-allocate the +memory. The system automatically determines what is garbage and returns it to the available pool of memory. +Certainly this makes it easier to learn to program in these languages, but automatic memory management does +have performance and memory usage disadvantages. +- Today we’ll overview 3 basic techniques for automatic memory management. + +## 25.2 The Node class + +- For our discussion today, we’ll assume that all program data is stored in dynamically-allocated instances of the +following simple class. This class can be used to build linked lists, trees, and graphs with cycles: +class Node { +public: + Node(char v, Node* l, Node* r) : + value(v), left(l), right(r) {} + char value; + Node* left; + Node* right; +}; + +## 25.3 Garbage Collection Technique #1: Reference Counting + +1. Attach a counter to each Node in memory. +2. When a new pointer is connected to that Node, increment the counter. +3. When a pointer is removed, decrement the counter. +4. Any Node with counter == 0 is garbage and is available for reuse. + +## 25.4 Reference Counting Exercise + +- Draw a “box and pointer” diagram for the following example, keeping a “reference counter” with each Node. + +```cpp +Node *a = new Node('a', NULL, NULL); +Node *b = new Node('b', NULL, NULL); +Node *c = new Node('c', a, b); +a = NULL; +b = NULL; +c->left = c; +c = NULL; +``` + +- Is there any garbage? + +## 25.5 Memory Model Exercise + +- In memory, we pack the Node instances into a big array. In the toy example below, we have only enough room in memory to store 8 Nodes, which are addressed 100 -> 107. 0 is a NULL address. +- For simplicity, we’ll assume that the program uses only one variable, root, through which it accesses all of the +data. Draw the box-and-pointer diagram for the data accessible from root = 105. + +```console +address 100 101 102 103 104 105 106 107 +value a b c d e f g h +left 0 0 100 100 0 102 105 104 +right 0 100 103 0 105 106 0 0 +root: 105 +``` + +- What memory is garbage? + +## 25.6 Garbage Collection Technique #2: Stop and Copy + +1. Split memory in half (working memory and copy memory). +2. When out of working memory, stop computation and begin garbage collection. +(a) Place scan and free pointers at the start of the copy memory. +(b) Copy the root to copy memory, incrementing free. Whenever a node is copied from working memory, +leave a forwarding address to its new location in copy memory in the left address slot of its old location. +(c) Starting at the scan pointer, process the left and right pointers of each node. Look for their locations +in working memory. If the node has already been copied (i.e., it has a forwarding address), update the +reference. Otherwise, copy the location (as before) and update the reference. +(d) Repeat until scan == free. +(e) Swap the roles of the working and copy memory. + +## 25.7 Stop and Copy Exercise + +Perform stop-and-copy on the following with root = 105: + +```console +WORKING MEMORY +address 100 101 102 103 104 105 106 107 +value a b c d e f g h +left 0 0 100 100 0 102 105 104 +right 0 100 103 0 105 106 0 0 +``` + +```console +COPY MEMORY +address 108 109 110 111 112 113 114 115 +value +left +right +``` + +```console +root: 105 +scan: +free: +``` + +## 25.8 Garbage Collection Technique #3: Mark-Sweep + +1. Add a mark bit to each location in memory. +2. Keep a free pointer to the head of the free list. +3. When memory runs out, stop computation, clear the mark bits and begin garbage collection. +4. Mark +(a) Start at the root and follow the accessible structure (keeping a stack of where you still need to go). +(b) Mark every node you visit. +(c) Stop when you see a marked node, so you don’t go into a cycle. +5. Sweep +(a) Start at the end of memory, and build a new free list. +(b) If a node is unmarked, then it’s garbage, so hook it into the free list by chaining the left pointers. + +## 25.9 Mark-Sweep Exercise + +Let’s perform Mark-Sweep on the following with root = 105: + +```console +address 100 101 102 103 104 105 106 107 +value a b c d e f g h +left 0 0 100 100 0 102 105 104 +right 0 100 103 0 105 106 0 0 +marks +``` + +```console +root: 105 +free: +stack: +``` + +## 25.10 Garbage Collection Comparison + +- Reference Counting: + + fast and incremental + – can’t handle cyclical data structures! + ? requires ∼33% extra memory (1 integer per node) +- Stop & Copy: + – requires a long pause in program execution + + can handle cyclical data structures! + – requires 100% extra memory (you can only use half the memory) + + runs fast if most of the memory is garbage (it only touches the nodes reachable from the root) + + data is clustered together and memory is “de-fragmented” +- Mark-Sweep: + – requires a long pause in program execution + + can handle cyclical data structures! + + requires ∼1% extra memory (just one bit per node) + – runs the same speed regardless of how much of memory is garbage. + It must touch all nodes in the mark phase, and must link together all garbage nodes into a free list. + +## 25.11 Practical Garbage Collection Methodology in C++: Smart Pointers + +Garbage collection looks like an attractive option both when we are quickly drafting a prototype system and +also when we are developing big complex programs that process and rearrange lots of data. + Unfortunately, general-purpose, invisible garbage collection isn’t something we can just tack onto C++, an +enormous beast of a programming language (but that doesn’t stop people from trying!). So is there anything +we can do? Yes, we can use Smart Pointers to gain some of the features of garbage collection. + Some examples below are modified from these nice online references: +http://ootips.org/yonat/4dev/smart-pointers.html +http://www.codeproject.com/KB/stl/boostsmartptr.aspx +http://en.wikipedia.org/wiki/Smart_pointer +http://www.boost.org/doc/libs/1_48_0/libs/smart_ptr/smart_ptr.htm +http://www.acodersjourney.com/2016/05/top-10-dumb-mistakes-avoid-c-11-smart-pointers/ + +## 25.12 What’s a Smart Pointer? + +- The goal is to create a widget that works just like a regular pointer most of the time, except at the beginning +and end of its lifetime. The syntax of how we construct smart pointers is a bit different and we don’t need to +obsess about how & when it will get deleted (it happens automatically). +- Here’s one flavor of a smart pointer (simplified from STL): + +```cpp +template +class auto_ptr { +public: +explicit auto_ptr(T* p = NULL) : ptr(p) {} /* prevents cast/conversion */ +~auto_ptr() { delete ptr; } +T& operator*() { return *ptr; } +T* operator->() { return ptr; } /* fakes being a pointer */ +private: +T* ptr; +}; +``` + +- And let’s start with some example code without smart pointers: + +```cpp +void foo() { +Polygon* p(new Polygon(/* stuff */)); +p->DoSomething(); +delete p; +} +``` +- Here’s how we can re-write the same example with our auto_ptr: + +```cpp +void foo() { +auto_ptr p(new Polygon(/* stuff */); +p->DoSomething(); +} +``` + +- We don’t have to call delete! There’s no memory leak or memory error in this code. Awesome! + +## 25.13 So, What are the Advantages of Smart Pointers? + +- With practice, smart pointers can result in code that is more concise and elegant with fewer errors. Why? ... +- With thoughtful use, smart pointers make it easier to follow the principles of RAII and make code exception safe. In the auto_ptr example above, if DoSomething throws an exception, the memory for object p will be properly deallocated when we leave the scope of the foo function! This is not the case with the original version. +- The STL shared_ptr flavor implements reference counting garbage collection. Awesome2! +- They play nice with STL containers. Say you make an std::vector (or std::list, or std::map, etc.) of regular pointers to Polygon objects, Polygon* (especially handy if this is a polymorphic collection of objects!). +You allocate them all with new, and when you are all finished you must remember to explicitly deallocate each +of the objects. + +```cpp +class Polygon { /*...*/ }; +class Triangle : public Polygon { /*...*/ }; +class Quad : public Polygon { /*...*/ }; +std::vector polys; +polys.push_back(new Triangle(/*...*/)); +polys.push_back(new Quad(/*...*/)); +for (unsigned int i = 0; i < polys.size(); i++) { +delete polys[i]; +} +polys.clear(); +``` + +In contrast, with smart pointers they will be deallocated automagically! + +```cpp +std::vector > polys; +polys.push_back(shared_ptr(new Triangle(/*...*/))); +polys.push_back(shared_ptr(new Quad(/*...*/))); +polys.clear(); // cleanup is automatic! +``` + +## 25.14 Why are Smart Pointers Tricky? + +- Smart pointers do not alleviate the need to master pointers, basic memory allocation & deallocation, copy constructors, destructors, assignment operators, and reference variables. +- You can still make mistakes in your smart pointer code that yield the same types of memory corruption, +segmentation faults, and memory leaks as regular pointers. +- There are several different flavors of smart pointers to choose from (developed for different uses, for common +design patterns). You need to understand your application and the different pitfalls when you select the +appropriate implementation. + +## 25.15 What are the Different Types of Smart Pointers? + +Like other parts of the C++ standard, these tools are still evolving. The different choices reflect different ownership +semantics and different design patterns. There are some smart pointers in STL, and also some in Boost (a C++ +library that further extends the current STL). A quick overview: +- auto_ptr +When “copied” (copy constructor), the new object takes ownership and the old object is now empty. Deprecated +in new C++ standard. +- unique_ptr +Cannot be copied (copy constructor not public). Can only be “moved” to transfer ownership. Explicit ownership +transfer. Intended to replace auto_ptr. std::unique ptr has memory overhead only if you provide it with some +non-trivial deleter. It has time overhead only during constructor (if it has to copy the provided deleter) and +during destructor (to destroy the owned object). +- scoped_ptr (Boost) +“Remembers” to delete things when they go out of scope. Alternate to auto_ptr. Cannot be copied. +- shared_ptr +Reference counted ownership of pointer. Unfortunately, circular references are still a problem. Different subflavors based on where the counter is stored in memory relative to the object, e.g., intrusive_ptr, which +is more memory efficient. std::unique ptr has memory overhead only if you provide it with some non-trivial +deleter. It has time overhead in constructor (to create the reference counter), in destructor (to decrement +the reference counter and possibly destroy the object) and in assignment operator (to increment the reference +counter). +- weak_ptr +Use with shared_ptr. Memory is destroyed when no more shared_ptrs are pointing to object. So each time +a weak_ptr is used you should first “lock” the data by creating a shared_ptr. +- scoped_array and shared_array (Boost) +