Ownership and Smart Pointers¶
Raw Pointer¶
The raw pointer allows us to directly manipulate the memory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | struct PlainData
{
int buffer[1024*8];
}; /* end struct PlainData */
std::ostream & put_ptr(std::ostream & out, void * ptr)
{
out << std::internal << std::setw(18) << std::setfill('0') << ptr;
return out;
}
int main(int, char **)
{
std::cout << "PlainData pointer initialized : ";
// It is a good practice to initialize a raw pointer to nullptr.
PlainData * ptr = nullptr;
// Although nullptr will be integer 0, do not use the integer literal 0 or
// the infamous macro NULL to represent nullity.
put_ptr(std::cout, ptr) << std::endl;
// The reason to not use 0 or NULL for the null pointer: they are not even
// of a pointer type!
static_assert(!std::is_pointer<decltype(0)>::value, "error");
static_assert(!std::is_pointer<decltype(NULL)>::value, "error");
// 0 is int
static_assert(std::is_same<decltype(0), int>::value, "error");
// int cannot be converted to a pointer.
static_assert(!std::is_convertible<decltype(0), void *>::value, "error");
// NULL is long
static_assert(std::is_same<decltype(NULL), long>::value, "error");
// long cannot be converted to a pointer, either.
static_assert(!std::is_convertible<decltype(NULL), void *>::value, "error");
// Although nullptr is of type std::nullptr_t, not exactly a pointer ...
static_assert(std::is_same<decltype(nullptr), std::nullptr_t>::value, "error");
static_assert(!std::is_pointer<decltype(nullptr)>::value, "error");
// It can be converted to a pointer.
static_assert(std::is_convertible<decltype(nullptr), void *>::value, "error");
static_assert(std::is_convertible<decltype(nullptr), PlainData *>::value, "error");
// Allocate memory for PlainData and get the returned pointer.
std::cout << "PlainData pointer after malloc: ";
ptr = static_cast<PlainData *>(malloc(sizeof(PlainData)));
put_ptr(std::cout, ptr) << std::endl;
// After free the memory, the pointer auto variable is not changed.
std::cout << "PlainData pointer after free : ";
free(ptr);
put_ptr(std::cout, ptr) << std::endl;
// Use new to allocate for and construct PlainData and get the returned
// pointer.
std::cout << "PlainData pointer after new : ";
ptr = new PlainData();
put_ptr(std::cout, ptr) << std::endl;
// After delete, the pointer auto variable is not changed, either.
std::cout << "PlainData pointer after delete: ";
delete ptr;
put_ptr(std::cout, ptr) << std::endl;
return 0;
}
|
Execution Results
code/01_pointer/01_raw_pointer.cpp
01_raw_pointer.cpp
¶$ g++ 01_raw_pointer.cpp -o 01_raw_pointer -std=c++17 -g -O3 -m64 -Wall -Wextra -Werror
01_raw_pointer
¶1 2 3 4 5 6 | $ ./01_raw_pointer
PlainData pointer initialized : 0x0000000000000000
PlainData pointer after malloc: 0x00007fdd5e809800
PlainData pointer after free : 0x00007fdd5e809800
PlainData pointer after new : 0x00007fdd5e809800
PlainData pointer after delete: 0x00007fdd5e809800
|
Reference¶
When we see a reference, we know that we should not deallocate / destruct the object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | struct PlainData
{
int buffer[1024*8];
}; /* end struct PlainData */
std::ostream & put_ptr(std::ostream & out, void * ptr)
{
out << std::internal << std::setw(18) << std::setfill('0') << ptr;
return out;
}
// The factory function for PlainData.
PlainData * make_data()
{
PlainData * ptr = new PlainData();
// (... work to be done before returning.)
return ptr;
}
void manipulate_with_reference(PlainData & data)
{
std::cout << "Manipulate with reference : ";
put_ptr(std::cout, &data) << std::endl;
for (size_t it=0; it < 1024*8; ++it)
{
data.buffer[it] = it;
}
// (... more meaningful work before returning.)
// We cannot delete an object passed in with a reference.
}
int main(int, char **)
{
PlainData * ptr = nullptr;
// Obtain the pointer to the object ('resource').
ptr = make_data();
std::cout << "PlainData pointer after factory: ";
put_ptr(std::cout, ptr) << std::endl;
manipulate_with_reference(*ptr);
// A good habit when using raw pointer: destruct the object in the scope
// that we obtain the pointer. In this way, we don't forget to delete it
// and avoid potential resource leak.
delete ptr;
std::cout << "PlainData pointer after delete : ";
put_ptr(std::cout, ptr) << std::endl;
return 0;
}
|
Execution Results
code/01_pointer/02_reference.cpp
02_reference.cpp
¶$ g++ 02_reference.cpp -o 02_reference -std=c++17 -g -O3 -m64 -Wall -Wextra -Werror
02_reference
¶1 2 3 4 | $ ./02_reference
PlainData pointer after factory: 0x00007fe94a808800
Manipulate with reference : 0x00007fe94a808800
PlainData pointer after delete : 0x00007fe94a808800
|
RAII¶
A better way to manage the resource life cycle than the manual control shown above is to use the technique of RAII (resource acquisition is initialization). The basic concept of RAII is to use the object life cycle to control the resource life cycle.
With RAII, we can relax the treatment of always deleting the object in the same function creating it. RAII is directly related to the concept of ownership we are introducing immediately.
Ownership¶
In a complicated system, memory is not free immediately after allocation. Consider the following example, where there are two worker functions with different memory management behaviors.
Data Class¶
Our data object is large, and we don’t want the expensive overhead from frequent allocation and deallocation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | class Data
{
public:
constexpr const static size_t NELEM = 1024*8;
using iterator = int *;
using const_iterator = const int *;
Data()
{
std::fill(begin(), end(), 0);
std::cout << "Data @" << this << " is constructed" << std::endl;
}
~Data()
{
std::cout << "Data @" << this << " is destructed" << std::endl;
}
const_iterator cbegin() const { return m_buffer; }
const_iterator cend() const { return m_buffer+NELEM; }
iterator begin() { return m_buffer; }
iterator end() { return m_buffer+NELEM; }
size_t size() const { return NELEM; }
int operator[](size_t it) const { return m_buffer[it]; }
int & operator[](size_t it) { return m_buffer[it]; }
bool is_manipulated() const
{
for (size_t it=0; it < size(); ++it)
{
const int v = it;
if ((*this)[it] != v) { return false; }
}
return true;
}
private:
// A lot of data that we don't want to reconstruct.
int m_buffer[NELEM];
}; /* end class Data */
void manipulate_with_reference(Data & data)
{
std::cout << "Manipulate with reference: " << &data << std::endl;
for (size_t it=0; it < data.size(); ++it)
{
data[it] = it;
}
// In a real consumer function we will do much more meaningful operations.
// However, we cannot destruct an object passed in with a reference.
}
|
Separate Memory Operations¶
The memory allocation and deallocation is not consistent in worker1()
and
worker2()
. This kind of problems are commonplace.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | Data * worker1()
{
// Create a new Data object.
Data * data = new Data();
// Manipulate the Data object.
manipulate_with_reference(*data);
return data;
}
/*
* Code in this function is intentionally made to be lack of discipline to
* demonstrate how ownership is messed up.
*/
void worker2(Data * data)
{
// The prerequisite for the caller to write correct code is to read the
// code and understand when the object is alive.
if (data->is_manipulated())
{
delete data;
}
else
{
manipulate_with_reference(*data);
}
}
int main(int, char **)
{
Data * data = worker1();
std::cout << "Data pointer after worker 1: " << data << std::endl;
worker2(data);
std::cout << "Data pointer after worker 2: " << data << std::endl;
// You have to read the code of worker2 to know that data could be
// destructed. In addition, the Data class doesn't provide a
// programmatical way to detect whether or not the object is alive. The
// design of Data, worker1, and worker2 makes it impossible to write
// memory-safe code.
#ifdef CRASHME // The fenced code causes double free.
delete data;
std::cout << "Data pointer after delete: " << data << std::endl;
#endif
}
|
Execution Results
code/01_pointer/03_ownership.cpp
03_ownership.cpp
¶$ g++ 03_ownership.cpp -o 03_ownership -std=c++17 -g -O3 -m64 -Wall -Wextra -Werror
03_ownership
¶1 2 3 4 5 6 | $ ./03_ownership
Data @0x7fb287008800 is constructed
Manipulate with reference: 0x7fb287008800
Data pointer after worker 1: 0x7fb287008800
Data @0x7fb287008800 is destructed
Data pointer after worker 2: 0x7fb287008800
|
03_ownership.cpp
with the crashing behavior¶$ g++ 03_ownership.cpp -o 03_ownership -std=c++17 -g -O3 -m64 -Wall -Wextra -Werror -DCRASHME
03_ownership
¶1 2 3 4 5 6 7 8 9 | $ ./03_ownership
Data @0x7f8ef9808800 is constructed
Manipulate with reference: 0x7f8ef9808800
Data pointer after worker 1: 0x7f8ef9808800
Data @0x7f8ef9808800 is destructed
Data pointer after worker 2: 0x7f8ef9808800
Data @0x7f8ef9808800 is destructed
03_ownership(75158,0x114718e00) malloc: *** error for object 0x7f8ef9808800: pointer being freed was not allocated
03_ownership(75158,0x114718e00) malloc: *** set a breakpoint in malloc_error_break to debug
|
What Is Ownership¶
The above example shows the problem of lack of ownership. “Ownership” isn’t officially a language construct in C++, but is a common concept in many programming language for dynamic memory management.
To put it simply, when the object is “owned” by a construct or piece of code, it is assumed that it is safe for the piece of code to use that object. The ownership assures the life of the object, and the object is not destructed when it is owned by someone. It also means that the owner is responsible for making sure the object gets destructed when it should be.
As we observed in the above example code, there is no way for us to let the
code to know the ownership, and it is unsafe to use the data
object after
worker2()
is called. The way C++ handles the situation is to use smart
pointers.
unique_ptr
¶
(Modern) C++ provides two smart pointers: unique_ptr
and shared_ptr
.
We start with unique_ptr
because it is lighter-weight. A unique_ptr
takes the same number of bytes of a raw pointer. It may be a drop-in replace
with a raw pointer.
unique_ptr
should be used when there can only be one owner of the pointed
object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | static_assert(sizeof(Data *) == sizeof(std::unique_ptr<Data>), "unique_ptr should take only a word");
std::unique_ptr<Data> worker1()
{
// Create a new Data object.
std::unique_ptr<Data> data = std::make_unique<Data>();
// Manipulate the Data object.
manipulate_with_reference(*data);
return data;
}
void worker2(std::unique_ptr<Data> data)
{
if (data->is_manipulated())
{
data.reset();
}
else
{
manipulate_with_reference(*data);
}
}
int main(int, char **)
{
std::unique_ptr<Data> data = worker1();
std::cout << "Data pointer after worker 1: " << data.get() << std::endl;
#ifdef COPYNOWORK
worker2(data);
#else
worker2(std::move(data));
#endif
std::cout << "Data pointer after worker 2: " << data.get() << std::endl;
data.reset();
std::cout << "Data pointer after delete: " << data.get() << std::endl;
}
|
Execution Results
04_unique.cpp
¶$ g++ 04_unique.cpp -o 04_unique -std=c++17 -g -O3 -m64 -Wall -Wextra -Werror
04_unique
¶1 2 3 4 5 6 7 | $ ./04_unique
Data @0x7fee5a008800 is constructed
Manipulate with reference: 0x7fee5a008800
Data pointer after worker 1: 0x7fee5a008800
Data @0x7fee5a008800 is destructed
Data pointer after worker 2: 0x0
Data pointer after delete: 0x0
|
04_unique.cpp
¶$ g++ 04_unique.cpp -o 04_unique -std=c++17 -g -O3 -m64 -Wall -Wextra -Werror -DCOPYNOWORK
04_unique.cpp:97:13: error: call to implicitly-deleted copy constructor of 'std::unique_ptr<Data>'
worker2(data);
^~~~
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/memory:2518:3: note: copy constructor is implicitly deleted because
'unique_ptr<Data, std::__1::default_delete<Data> >' has a user-declared move constructor
unique_ptr(unique_ptr&& __u) _NOEXCEPT
^
04_unique.cpp:79:36: note: passing argument to parameter 'data' here
void worker2(std::unique_ptr<Data> data)
^
1 error generated.
Raw Pointers vs Smart Pointers¶
The rule of thumb is to always start with smart pointers. When in doubt, use
unique_ptr
. unique_ptr
forces a developer to think clearly about
whether or not multiple owners are necessary. Only use shared_ptr
when it
is absolutely necessary. The reference counter is much more expensive than it
looks.
Cyclic Reference¶
When two object use a pair of shared_ptr
to point to each other, the cyclic
reference will create a memory leak:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | class Data
: public std::enable_shared_from_this<Data>
{
public:
std::shared_ptr<Child> child() const { return m_child; }
std::shared_ptr<Child> & child() { return m_child; }
private:
std::shared_ptr<Child> m_child;
};
class Child
: public std::enable_shared_from_this<Child>
{
private:
class ctor_passkey {};
public:
Child() = delete;
Child(std::shared_ptr<Data> const & data, ctor_passkey const &) : m_data(data) {}
static std::shared_ptr<Child> make(std::shared_ptr<Data> const & data)
{
std::shared_ptr<Child> ret = std::make_shared<Child>(data, ctor_passkey());
data->child() = ret;
return ret;
}
private:
std::shared_ptr<Data> m_data;
};
int main(int, char **)
{
std::shared_ptr<Data> data = Data::make();
std::shared_ptr<Child> child = Child::make(data);
std::cout << "data.use_count(): " << data.use_count() << std::endl;
std::cout << "child.use_count(): " << child.use_count() << std::endl;
std::weak_ptr<Data> wdata(data);
std::weak_ptr<Child> wchild(child);
data.reset();
std::cout << "wdata.use_count() after data.reset(): " << wdata.use_count() << std::endl;
std::cout << "wchild.use_count() after data.reset(): " << wchild.use_count() << std::endl;
child.reset();
std::cout << "wdata.use_count() after child.reset(): " << wdata.use_count() << std::endl;
std::cout << "wchild.use_count() after child.reset(): " << wchild.use_count() << std::endl;
// Oops, the reference count doesn't reduce to 0!
}
|
Execution Results
04_cyclic.cpp
¶$ g++ 04_cyclic.cpp -o 04_cyclic -std=c++17 -g -O3 -m64 -Wall -Wextra -Werror
04_cyclic
¶1 2 3 4 5 6 7 8 | $ ./04_cyclic
Data @0x7f8f48d00018 is constructed
data.use_count(): 2
child.use_count(): 2
wdata.use_count() after data.reset(): 1
wchild.use_count() after data.reset(): 2
wdata.use_count() after child.reset(): 1
wchild.use_count() after child.reset(): 1
|
Use weak_ptr
to Break Cyclic Reference¶
In the above demonstration we use weak_ptr
to get the reference count without
increasing it. weak_ptr
can also be used to break the cyclic reference. In
the following example, the Child
object replaces shared_ptr
with weak_ptr
to point to Data
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | class Child
: public std::enable_shared_from_this<Child>
{
private:
class ctor_passkey {};
public:
Child() = delete;
Child(std::shared_ptr<Data> const & data, ctor_passkey const &) : m_data(data) {}
static std::shared_ptr<Child> make(std::shared_ptr<Data> const & data)
{
std::shared_ptr<Child> ret = std::make_shared<Child>(data, ctor_passkey());
data->child() = ret;
return ret;
}
private:
// Replace shared_ptr with weak_ptr to Data.
std::weak_ptr<Data> m_data;
};
int main(int, char **)
{
std::shared_ptr<Data> data = Data::make();
std::shared_ptr<Child> child = Child::make(data);
std::cout << "data.use_count(): " << data.use_count() << std::endl;
std::cout << "child.use_count(): " << child.use_count() << std::endl;
std::weak_ptr<Data> wdata(data);
std::weak_ptr<Child> wchild(child);
child.reset();
std::cout << "wdata.use_count() after child.reset(): " << wdata.use_count() << std::endl;
std::cout << "wchild.use_count() after child.reset(): " << wchild.use_count() << std::endl;
data.reset();
std::cout << "wdata.use_count() after data.reset(): " << wdata.use_count() << std::endl;
std::cout << "wchild.use_count() after data.reset(): " << wchild.use_count() << std::endl;
}
|
Execution Results
05_weak.cpp
¶$ g++ 05_weak.cpp -o 05_weak -std=c++17 -g -O3 -m64 -Wall -Wextra -Werror
05_weak
¶1 2 3 4 5 6 7 8 9 | $ ./05_weak
Data @0x7fe6f8500018 is constructed
data.use_count(): 1
child.use_count(): 2
wdata.use_count() after child.reset(): 1
wchild.use_count() after child.reset(): 1
Data @0x7fe6f8500018 is destructed
wdata.use_count() after data.reset(): 0
wchild.use_count() after data.reset(): 0
|
Reminder: Avoid weak_ptr
¶
Using weak_ptr
to break cyclic reference should only be considered as a
workaround, rather than a full resolution. We sometimes need it since the
reference cycle may not be as obvious as it is in our example. For example,
there may be 3 or 4 levels of references in the cycle. weak_ptr
has a
similar interface to shared_ptr
. When we are troubleshooting
resource-leaking issues, replacing shared_ptr
with weak_ptr
can work as
a quick-n-dirty hotfix.
The right treatment is to sort out the ownership. It’s not easy when the
system is complex. The rule of thumb is that, as we mentioned earlier, you
should avoid using shared_ptr
unless you really need it. And most of the
time the need appears in a higher-level and heavy-weight container, rather than
the lower-level small objects. For small objects, we should try to limit the
lifecycle and use raw pointers or stack.
Exercises¶
- Write code so that when
std::unique_ptr
is destructed, the object it points to doesn’t destruct. - Create vectors of 1,000,000 elements of (i) raw pointers, (ii)
unique_ptr
, and (iii)shared_ptr
, respectively, and measure difference of the performance. - Compare the runtime performance between
shared_ptr(new Type)
andmake_shared<Type>
. Explain why there is a difference of performance.