SatView: Pointer Perfect, part 4

In our final article in the series covering pointers, J. Nakamura discusses the various pitfalls of wild pointers and how to avoid them.

Contributed by
Rating: 3 stars3 stars3 stars3 stars3 stars / 6
December 27, 2004
Rate this Article:
MEH MEH++


SEARCH ASP FREE
TOOLS YOU CAN USE

advertisement

In the previous articles I have shown you how to use pointers, pointer pointers and smart pointers. To round it off I thought it would be nice to take a look at some of the common pitfalls you might wander into using pointers. It is quite easy to say that using pointers can be quite troublesome, but I learn best from examples; so let me demonstrate what not to do with pointers.

Wild Pointers

A wild pointer is a pointer that refers to garbage. We have seen how we can retrieve pointers to objects and how we can create them. The most common error you could make when using a pointer is using one that is uninitialized.

Uninitialized Pointers

Sometimes it is not necessary to immediately instantiate a pointer.

class Base {
public:
virtual ~Base();
/* more stuff here */
};

class DerivedA : public Base {
public:
virtual ~DerivedA();
/* more stuff here */
};

class DerivedB : public Base {
public:
virtual ~DerivedB();
/* more stuff here */
};

Base* foo(EFlag flag) {
  Base *result;
switch (flag) {
case FLAGA:
result = reinterpret_cast<Base*>(new DerivedA);
break;
case FLAGB:
result = reinterpret_cast<Base*>(new DerivedB);
break;
}
return result;
}

The code above demonstrates bad usage of the switch statement. I would like to mention that you should always implement the default case (even when you are sure that only FLAGA and FLAGB exist!) and fire an assert (or return/deal with an error code) when that code is hit. My purpose however is to demonstrate what an uninitialized pointer looks like, and it is clear that when flag != FLAGA or flag != FLAGB, the pointer it returns is uninitialized.

At the time of implementation, the programmer who wrote this might have been certain that only FLAGA and FLAGB are used for EFlag… but this might (read: will) change in the future. When a FLAGC is introduced and this code is overlooked, the contents of the result are uninitialized. This means that it points to garbage. This can become really troublesome to detect, as your application might still function correctly for a while…or every other time! It is better to immediately crash than have trash around. Always initialize pointers you are not using yet to NULL; this will give you the opportunity to check the validity of a pointer before you try to use it:

void test() {
  Base *pBase = foo();
  assert(pBase);
  PBase->MoreStuffHere();
}

Fence Post Errors

When you construct an array of objects in memory, there is always the possibility that you might accidentally try to access or set an object outside the bounds of this array. C/C++ arrays can be confusing to index when you are not used to thinking of the first object as having index 0. The ‘off by one’ error is easily made:

char arry[10];   // declare array of chars
arry[10]=1;     // mistake! This is the 11th element of arry!

This is the reason why loops iterate from 0 to size-1:

for (int idx=0; idx<10; ++idx) { /* do operation */ }

Errors like these can be very subtle, however, and since we are dealing with pointers, let's see how a fence post error can create a wild pointer:

struct Fence {
  int arry[10];
  int *pInt;
};

void test() {
  Fence fence;
  memset(&fence, 0, sizeof(Fence)); // initializing all fence vars to 0!
  fence.pInt = new int(2);
  fence.arry[10] = 1;
  (void)printf(“fence.pInt = %d.\n”, *fence.pInt);
  delete fence.pInt;
}

If you compile and run the code above, Win32 will come up with the following complaint:

The instructions at “0x00414137” referenced memory at “0x00000001”. The memory could not be “read”.

How is it possible that fence.pInt, which was properly initialized with new int(2), suddenly points at address 0x00000001? To understand this you must look at the way the compiler constructs the memory layout for Fence:

0x0012FDA0  00000000  ....  <- fence.arry[0]
0x0012FDA4  00000000  ....  <- fence.arry[1]
0x0012FDA8  00000000  ....  <- fence.arry[2]
0x0012FDAC  00000000  ....  <- fence.arry[3]
0x0012FDB0  00000000  ....  <- fence.arry[4]
0x0012FDB4  00000000  ....  <- fence.arry[5]
0x0012FDB8  00000000  ....  <- fence.arry[6]
0x0012FDBC  00000000  ....  <- fence.arry[7]
0x0012FDC0  00000000  ....  <- fence.arry[8]
0x0012FDC4  00000000  ....  <- fence.arry[9]
0x0012FDC8  002f11b0  °./.  <- fence.pInt pointing at 0x002F11B0

The address of arry is 0x0012FDA0 and arry+10*sizeof(int) (arry[10]) equals 0x12FDC8… the address of fence.pInt! So by setting arry[10] to 1, we are effectively changing the pointer fence.pInt and are therefore crashing when we try to reference it.

Dangling Pointers

A dangling pointer is a pointer that once was pointing to the right object, but was referenced after the object it pointed at was freed. The easiest way to demonstrate this is:

int *ptr = new int(2);
delete ptr;
*ptr = 3; // oops accessing dangling pointer!

Well nobody is going to make a mistake as obvious as this one (at least not that often), but unfortunately mistakes sometimes only turn out to be obvious after you’ve made them! The next piece of code should be fairly easy to recognize as being wrong too:

std::list<int>* foo() {
  std::list<int> result;
  /* perform operations to fill result list here */
  return &result;
}

The idea behind this function might be clear, but it was clearly forgotten that the result is constructed on the stack. This means that the memory it occupies is freed when we leave the function, which leaves the pointer we returned dangling in the wild!

A nasty way to create dangling pointers is to forget that the compiler implicitly creates a copy constructor and assignment operator for your class when you don’t declare them. This does become a problem when you use pointers as member variables, since the compiler only facilitates in bitwise copiers for you, which create shallow copies of your class. And a shallow copy of your class has copied the pointer address instead of the memory at which it is pointing! This can lead to all sorts of trouble.

You either have to hide the copy constructor and assignment operator by making them private and thus making your class non-copyable, or define them to make a copy of the memory the pointer members are holding (creating a deep copy of your class).

Let's demonstrate a shallow copy.

class Shallow {
public:
  explicit Shallow(int value) : m_pInt(new int(value)) {}
  ~Shallow() { try { delete m_pInt; } catch (…) {} }
  int *m_pInt;
};

void foo(Shallow param) {
/* perform some operations here */
}

void test() {
  Shallow shallow(2);
  foo(shallow);
  (void)printf(“shallow.m_pInt has value 0x%x.\n”, *shallow.m_pInt);
}

Running the code above in debug mode will yield the following output (or something similar):

pointer shallow.m_pInt is 0x002F0930.
pointer param.m_pint is 0x002F0930.
shallow.m_pInt has value 0xdddddddd.

and a “Debug Assertion Failed!” when using Microsoft Visual Studio .Net 2003. It captured the fact that shallow was trying to delete its m_pInt which already was deleted by param when foo() exited! Because shallow was passed by value to foo() a shallow copy was constructed for it and the resulting (shallow) param inherited the pointer which it correctly freed upon leaving the function.

Basically the error demonstrated in the code above looks a lot like what could happen when you use the std::auto_ptr. Maybe this provides a good explanation as to why the Standards Committee has chosen to make the auto_ptr behave the way it does?

Performance Hit When Initializng an Array of Objects

Something you might easily overlook is that declaring an array of objects invokes the constructor for each object in that array. When this object takes some time to create, declaring an array of these objects might claim a hit on the performance of your application.

When you are not sure whether you will need all 1000 objects in the array, it might be wiser to reserve 1000 pointers instead of the objects themselves. Of course don’t forget to initialize all the pointers to NULL and to create/delete them as you need.

Programmers aware of this behavior are sometimes still caught by surprise when using the std::vector. The vector is very handy to use instead of built-in arrays, because it can be treated like a built-in array (this is in fact one of its requirements) and can grow dynamically as well. Don’t let this mislead you into thinking that you will only be creating as many objects as you want to put into it when you are not taking the necessary precautions.

It is in fact cheaper to construct a built-in array of 10 objects than to use a plain std::vector when you will only need 5 of the objects! Let me demonstrate.

class MyClass {
public:
  MyClass() { (void)printf(“>> MyClass constructed <<\n”); }
  MyClass(MyClass const &other)
 { (void)printf(“** MyClass copy constructed **”); }
MyClass& operator=(MyClass const &other)
 { (void)printf(“** MyClass re-assigned **\n”); return *this; }
~MyClass() { (void)printf(“>> MyClass destructed <<\n”); }
};

void test()
{
MyClass myClass;
(void)printf(“creating an array[5].\n”);
MyClass arry[5];
int idx;
for (idx=0; idx<5; ++idx)
{
  (void)printf(“add #%d.\n”, idx);
  arry[idx] = myClass;
}
(void)printf(“done.\n\n”);

(void)printf(“creating a plain vector.\n”);
std::vector<MyClass> vect;
for (idx=0; idx<5; ++idx)
{
  (void)printf(“add #%d.\n”, idx);
  vect.push_back(myClass);
}
(void)printf(“DONE.\n\n”);

(void)printf(“creating a vector… ”);
std::vector<MyClass> vect2;
(void)printf(“and reserving space for 5.\n”);
vect2.reserve(5);
for (idx=0; idx<5; ++idx)
{
  (void)printf(“add #%d.\n”, idx);
  vect2.push_back(myClass);
}
(void)printf(“DONE.\n\n”);
}

Let's take a look at what the first part, creating an array[5], generates for output:

creating an array[5].

>> MyClass constructed <<
>> MyClass constructed <<
>> MyClass constructed <<
>> MyClass constructed <<
>> MyClass constructed <<
add #0.
** MyClass re-assigned **
add #1.
** MyClass re-assigned **
add #2.
** MyClass re-assigned **
add #3.
** MyClass re-assigned **
add #4.
** MyClass re-assigned **
DONE

Nothing surprising here…exactly what we expected: MyClass was constructed 5 times, just by declaring an array[5] of it. Then myClass was copied into 5 objects in that array by using the assignment operator. Now take a look at the output of the second part; the plain vector getting MyClass pushed into it 5 times:

creating a plain vector.

>> MyClass constructed <<
add #0.
** MyClass copy constructed **
** MyClass copy constructed **
>> MyClass destructed <<
add #1.
** MyClass copy constructed **
** MyClass copy constructed **
** MyClass copy constructed **
>> MyClass destructed <<
>> MyClass destructed <<
add #2.
** MyClass copy constructed **
** MyClass copy constructed **
** MyClass copy constructed **
** MyClass copy constructed **
>> MyClass destructed <<
>> MyClass destructed <<
>> MyClass destructed <<
add #3.
** MyClass copy constructed **
** MyClass copy constructed **
** MyClass copy constructed **
** MyClass copy constructed **
** MyClass copy constructed **
>> MyClass destructed <<
>> MyClass destructed <<
>> MyClass destructed <<
>> MyClass destructed <<
add #4.
** MyClass copy constructed **
** MyClass copy constructed **
** MyClass copy constructed **
** MyClass copy constructed **
** MyClass copy constructed **
** MyClass copy constructed **
>> MyClass destructed <<
>> MyClass destructed <<
>> MyClass destructed <<
>> MyClass destructed <<
>> MyClass destructed <<
DONE.

What is going on here you might wonder? It is quite easy to explain: std::vector reallocates enough memory to hold its current number of objects plus one extra, every time we push another myClass into it. The first “copy constructed” message and the last “MyClass destructed” message are for a temporary object holding a copy of myClass.

The other messages are for copy constructing the current objects into the reallocated memory plus one for the object we are pushing into the vector. The destruction messages are for the objects being freed from the current allocated memory since they are not needed anymore.

It is pretty clear that using a built-in array of 10 objects is much cheaper than the way we are using the vector in the sample above! The third part shows proper usage of the std::vector.

creating a vector... and reserving space for 5.
add #0.
** MyClass copy constructed **
add #1.
** MyClass copy constructed **
add #2.
** MyClass copy constructed **
add #3.
** MyClass copy constructed **
add #4.
** MyClass copy constructed **
DONE.

This is exactly the type of behavior we want to see from the std::vector! It goes without saying that you need to understand what you are doing and how libraries (like the STL) behave when you are using them.

Always measure your code before assuming where performance bottlenecks might reside. In the case of large arrays of expensive objects being the problem, resolving to pointers might solve your problem.

Object Slicing

It sounds pretty fictional, but have you ever heard of "Object Slicing" in C++? Oh the fun you can have with inheritance and base class usage! Where I complained that pointer casting could show no respect for identity, pointers actually help you save identity in the following case:

class MyBaseClass {
public:
virtual ~MyBaseClass() {}
virtual void bar() const { (void)printf(“MyBaseClass::bar();\n”); }
};
class MyDerivedClass : public MyBaseClass
{
public:
  virtual ~MyDerivedClass() {}
virtual void bar() const { (void)printf(“MyDerivedClass::bar();\n”); }
};

/* this will slice */
void foo(MyBaseClass param) {
  param.bar();
}

void test() {
  DerivedClass myObj;
  foo(myObj);
}

The test() function calls foo() and passes an object by value. This object is correctly derived from MyBaseClass, but when we pass it to foo() only the copy constructor of MyBaseClass is being called upon! This doesn’t maintain the virtual table and the baseclass is not aware of any virtual derived function it should call upon instead of its own. So the output is “MyBaseClass::bar();”!

One way to prevent this from happening is by passing the object by reference, making foo() use the same instance leaving the vtable intact:

/* this will not slice */
void foo(MyBaseClass *param) {
  param->bar();
}

or

/* neither will this */
void fooRef(MyBaseClass &param) {
  param.bar();
}

Either function will output “MyDerivedClass::bar();”. You see that slicing is nothing more than casting and copying a derived object back to its base object, making it smaller than the original derived version. The usage of pointers or references prevents copies being made and thus prevents possible slicing of objects.

I hope you found these articles about pointers useful and that you will stick around through the remaining series, all the way up to the SatView application. Any questions can be mailed to jun.nakamura@gmail.com.

blog comments powered by Disqus
C# ARTICLES

- Beginning C#
- ASP.NET RedirectPermanent Method using C# an...
- C Programming Language and UNIX Pioneer Pass...
- Using Facebook JavaScript SDK in ASP.NET wit...
- ASP.NET Export to Excel and Word using VB.NE...
- WAV and MP3 Streaming with ASP.Net and C#
- Game Programming using SDL: the File I/O API
- C# and Java Developer Jobs on the Rise
- The Future Evolution of C# and VB.NET
- C# If and Else-if Statements
- How To Use the C# String Replace Method
- 5 Ways to Parse XML in C#
- C# Meets Design Patterns
- Coding a CRC-Generating Algorithm in C
- Cyclic Redundancy Check

ASP Web Hosting ASP.Net Web Hosting Windows Web Hosting
ASP Free Forums 
 RSS  Tutorials RSS
 RSS  Forums RSS
 RSS  All Feeds
Site Map 
Request Media Kit
Write For Us Get Paid 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Privacy Policy 
Support 


© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 3 - Follow our Sitemap
Most Popular Topics
All ASP.Net Tutorials