SatView: Pointer Perfect, Part 2: Construction / Destruction

In this second article in our series about pointers, J. Nakamura explains malloc and free, and new and delete. He also examines the this pointer and discusses the difference between copying pointers and copying memory (and the hazards of shallow vs. deep copying). Then he points out the potential problems when crossing DLL boundaries, and finally takes us into the exotic(?) world of pointer pointers.

Contributed by
Rating: 5 stars5 stars5 stars5 stars5 stars / 24
December 06, 2004
Rate this Article:
MEH MEH++


SEARCH ASP FREE
TOOLS YOU CAN USE

advertisement

When you use the address dereference operator for pointer creation as mentioned in part 1, you are using pointers to objects that were created on the stack. More often you want to create objects on the heap (since the stack has a limited size), at which time you will have to use malloc/free or new/delete respectively.

The stack is that section of memory used for the storage of local variables and function parameters (among other things) and is used to keep track of function execution in your application. This means that upon leaving a function, all local variables created in that function will be thrown from the stack, i.e. destroyed.

You can use the heap when you need large chunks of data (e.g. large arrays of objects) that you don’t want to store on the stack to prevent it from overflowing. Or maybe you would like to keep the data alive across function calls or maybe you want to delay the construction of large objects.

Malloc & Free vs. New & Delete

malloc & free originate from C, while new & delete only work with a C++ compiler. Here lies their most important difference as well: malloc & free don’t understand constructors and destructors.

Lets take a look at the following two lines of code:

MyObj *pObj1 = static_cast<MyObj*>( malloc ( sizeof ( MyObj ) ) );
MyObj *pObj2 = new MyObj;

The first line creates a pointer to a chunk of memory that is large enough to hold MyObj, but there was no MyObj object constructed in that piece of memory. The second line creates a pointer to memory where MyObj has been constructed.

Now if you could somehow construct a MyObj object in the memory pObj1 points at, you would still have problems freeing that memory. Just as malloc doesn’t construct, free doesn’t destruct: you would end up with a memory leak.

When you are using C-functions, you might run into situations where you are given the responsibility to clean up chunks of memory (when you are using strdup for example). Be careful to match free with malloc and delete with new, because the results will be disastrous when you start mixing them!

Now take look at the following two lines of code:

  MyObj *pObjArray = new MyObj[100];
  /* some code here */
  delete pObjArray;

What is wrong with this picture? Well, new MyObj[100] allocates enough memory for 100 MyObj objects and constructs them in that chunk of memory. The behavior of delete pObjArray is undefined, however. Most likely 99 of the 100 MyObj objects are not destructed! The proper way to delete allocated arrays on the heap is with the delete[] operator:

  delete[] pObjArray;

The This Pointer

Every object in C++ has a hidden parameter also known as the this pointer. It points to the instantiated class (object) and was therefore appropriately named "this". The compiler inserts this pointer implicitly in your code when you are accessing members and functions of an object.

MyClass::foo() {
myVar = 20;
}

can therefore be read as

MyClass::foo() {
this->myVar = 20;
}

You will also encounter the this pointer when writing the assignment operator for a class:

MyClass& MyClass::operator =(MyClass const &rhs) {
if (&rhs != this)
myVar = rhs.myVar;
return *this;
}

Note how the reference operator on the this pointer used in the return statement, returns a reference of that very same object!

Copying Pointers vs. Copying Memory

If a member variable points to a piece of memory, you can either copy the variable (i.e. the pointer) or the memory it points to. When you copy the pointer, the data instantiated in the memory becomes shared between different objects, giving nasty, unexpected side effects (such as memory exceptions). It is what we call a "shallow copy". When you copy the memory instead of the pointer, we call it a "deep copy".

It is important to know that when your class doesn’t define a copy constructor or assignment operator, the compiler will implicitly create these for you. Since it will always perform a bit wise copy, the copy will be shallow! So if your class contains pointers as member variables, make sure that you always declare a copy constructor and an assignment operator... even if it was just to declare them private to make the class non-copyable. E.g.:

class MyNonCopyable {
public:
    MyNonCopyable();
  ~MyNonCopyable();
    void foo();
private:
  /* this hides the copy constructor. */
    MyNonCopyable(MyNonCopyable const &);
    /* this hides the assignment operator. */
  MyNonCopyable& operator=(MyNonCopyable const&);
  private:
    MyObj *m_pMyObj;
};

A deep copy requires you to free the memory pointed at by the member variable first, before you acquire new memory to contain a copy of the data from rhs (otherwise your application would be leaking memory). When the object is assigned to itself, this would mean we would lose the data because we cannot copy the data from ourselves after it has been freed! Not exactly what we had in mind... therefore it can be useful to compare the address of rhs with the address of this to prevent this silly case from happening.

The following use of the this pointer should make you cringe or at least raise your eyebrows:

delete this;

Think about it! Do you realize the implications of this statement? I won’t tell you not to use it, but you must understand its consequences. It's all about when you use it.

When you are working with COM, you will run into this statement a lot. The simple reason for this is that classes have to be derived from the IUnknown interface, which forces classes to be reference counted. This means that you do not construct/destruct objects using new/delete, but that you must obtain pointers to their instances through the QueryInterface(REFIID, void**) function and ‘free’ them using Release(). You are no longer directly responsible for the memory allocated. Here's a code example:

  HRESULT MyCOMClass::QueryInterface(REFIID riid, void** interface) {
  if (riid == IID_IUnknown)
    interface = static_cast<IUnknown*>(this);
else if (riid == IID_IMyInterface01)
  interface = static_cast<IMyInterface01*>(this);
    else if (riid == IID_IMyInterface02)
      interface = static_cast<IMyInterface02*>(this);
    else
      interface = NULL;

    if (interface) {
reinterpret_cast<IUnknown*>(*interface)->AddRef();
return S_OK;
}
else
  return E_NOINTERFACE;
}

COM forces you to implement a pattern also known as “the extension interface.” This allows you to export multiple interfaces per component, which prevents bloating of these interfaces. It also makes it possible for you to easily extend and modify the functionality of these components, without breaking them. I will describe this pattern in more detail in the future.

The reference counting is done through the AddRef() and Release() functions. For example:

  ULONG MyCOMClass::AddRef() {
    return ++m_RefCount;
  }

  ULONG MyCOMCLass::Release() {
  ULONG result = --m_RefCount;
  if (result == 0)
    delete this;
  return result;
  }

So, did you notice the delete this? Even though it is dangerous to use, it is quite necessary here.

You can easily introduce a bug here (and trust me... I've found them in COM books!) by writing the following code:

  ULONG MyCOMClass::Release() {
  if (--m_RefCount == 0)
    delete this;
  return m_RefCount;
}

Do you see the potential danger in this code? Think about what delete this does. It calls the destructor of the object of which we are currently executing a member function! This means that returning the contents of a member variable after that very same object has been destructed can yield unexpected results!

Sure, this will work fine during development, but don’t be surprised if it blows up in your client's production environment.

Potential Problems When Crossing DLL Boundaries

It is not possible to allocate memory in a DLL (either explicitly with malloc / new or implicitly with aforementioned strdup) and pass the pointer across the DLL boundary into the process space to free it, when the DLL and the application are using different copies of the CRT (C-RunTime) libraries. This will cause a nasty memory access violation or worse: corrupt the heap! When you bump into an assert in the CRT lib when it tries to execute free(), you will know you have run into this very problem.

Memory is only valid for the copy of the CRT where they were allocated. This means that when you are using two different CRT libraries (LIBCMT.LIB and MSVCRT.LIB for example) together, you cannot expect one to correctly free something the other has allocated. Because they most probably are using different heap managers, the heap runs the risk of becoming corrupted when you run the application in release mode! Pity the programmer who has to track the bug that crashes his application this way.

It seems like something trivial to avoid, but I can guarantee you will run into that assert in free() when you are using COM and utilizing SAFEARRAYs and BSTRs. Hopefully I have just saved you some future headaches.

Pointer Pointers

So a pointer is a variable that stores an address and can be null. This means that a pointer can store the address to another pointer. That’s right: a pointer to a pointer.

If the term pointer pointers sounds exotic to you, don’t worry. You have been using them all along:

int main(int argc, char *argv[]) { return 0; }

See, you have been using them all the time! Argv is a pointer to an array of char and could have been written as char **argv (main is written many different ways, actually). If you define an array char arry[25]; then arry can be used and seen as a char* simply because an array points to the address of the first element; thus argv is a pointer to a char*. You can roughly translate argv as a list of a list of characters.

Using the techniques we already know we can create pointer pointers expanding on the first example from Part 1:

int main(int argc, char *argv[]) {
int myVar = 10;
int *pPtr = &myVar;
int *ppPtr = &pPtr;
return 0;
}

Again we compile it and gaze at it in the debugger:

0x0012FEB8  cccccccc  ÌÌÌÌ
0x0012FEBC  0012fec8  Èþ..  ppPtr  addr:0x0012FEBC  value:0x0012FEC8
0x0012FEC0  cccccccc  ÌÌÌÌ
0x0012FEC4  cccccccc  ÌÌÌÌ
0x0012FEC8  0012fed4  Ôþ..  pPtr  addr:0x0012FEC8  value:0x0012FED4
0x0012FECC  cccccccc  ÌÌÌÌ
0x0012FED0  cccccccc  ÌÌÌÌ
0x0012FED4  0000000a  ....  myVar addr:0x0012FED4  value:10
0x0012FED8  cccccccc  ÌÌÌÌ

You can see that ppPtr holds the address to pPtr (0x0012FEC8), which holds the address to myVar (0x0012FED4). The concept holds!

Let's take a little sidestep here. If you were not allowed to use references, but would like to return a value via a function parameter, how would you do it? Here is a function declaration that returns a boolean value to tell you whether the function succeeded or not and a counter:

  bool foo(int *count);

By passing the address of an int variable you declared before calling the function, a result can be returned from inside the function by assigning a value to be held at that address. What if you want to return an address to something that is much larger than a primitive? Then you can use pointer pointers. Use them when a function wants to return the address to a dynamically allocated piece of memory through one of its function parameters. (COM actually forces you to do it this way, since pointers are capable of crossing programming language barriers and C++ references not.)

Since functions in COM always return HRESULT, the only way to return a polymorphic object (e.g. the pointer to the base class of a newly created derived object) is by storing that pointer in a pointer that can hold it:

  HRESULT foo (MyBaseClass **newObj);

In the foo function you can create a new MyBaseClass inside the newObj pointer pointer:

  *newobj = reinterpret_cast<MyBaseClass*>(new MyDerivedClass);

Pointer arithmetic combined with pointer pointers allow you to do some pretty nifty things. If we assume pMem to be a character pointer to a piece of memory that was read binary from a file and pOffset a character pointer to some memory that contains the offsets of objects inside that file, you could calculate pMem to certain offsets with the following statement (using C-style casting on purpose to maintain readability):

*((char **)(&pMem))=((char *)((unsigned int)pOffset)+((unsigned int)pMem));

There are better ways to construct binary file formats, but this one shows nicely how complicated the use of pointers can become. Next time we will look at smart pointers; C++ classes that help you manage all these crazy pointers.

blog comments powered by Disqus
CODE EXAMPLES ARTICLES

- Bipartite Graphs
- Connectivity in Graphs
- The Ford-Fulkerson Algorithm
- Critical Paths
- The Bellman-Ford and Roy-Floyd Algorithms
- Shortest Path Algorithms in Graphs
- Minimum Spanning Tree
- Articulation Edges and Vertexes
- Circles and Connectivity in Graphs
- Depth-First Search in Graphs
- Breadth-First Search in Graphs
- The Prufer Code and the Floyd-Warshall Algor...
- An Insight into Graphs
- Coding a Custom Object with WSC
- Creating a Custom Object with WSC

ASP Web Hosting ASP.Net Web Hosting Windows Web Hosting
ASP Free Forums 
 RSS  Tutorials RSS
 RSS  Forums RSS
 RSS  All Feeds
Site Map 
Request Media Kit
Write For Us Get Paid 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Privacy Policy 
Support 


© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 11 - Follow our Sitemap
Most Popular Topics
All ASP.Net Tutorials