Web based School

13 — Memory Management

Processes and Memory

Separate Address Spaces
Address Spaces
Virtual Memory

32-Bit Programs

Integer Size
Type Modifiers and Macros
Address Calculations
Library Functions
Memory Models
Selector Functions

Simple Memory Management

Memory Allocation via malloc and new
The Problem of Stray Pointers
Sharing Memory Between Applications

Virtual Memory and Advanced Memory Management

Win32 Virtual Memory Management
Virtual Memory Functions
Heap Functions
Windows API and C Run-time Memory Management
Miscellaneous and Obsolete Functions
Memory-Mapped Files and Shared Memory
Shared Memory and Based Pointers

Threads and Memory Management

Interlocked Variable Access
Thread-Local Storage

Accessing Physical Memory and I/O Ports
Summary

13 — Memory Management

With the advent of 32-bit Windows, memory management has become a much prettier subject than before. The immense mess of segments, selectors, all the paraphernalia of memory management in 16-bit mode on the segmented Intel processor architecture is completely and irreversibly gone. In fact, memory management has become so greatly simplified that for most applications, malloc or new are all that are needed; in fact, were this an introductory level guide, I would probably be justified to end this chapter right here and move on to a different subject.

That said, Win32 memory management does have its own intricacies. However, programmers are no longer forced to learn about these to perform even the simplest tasks.

Processes and Memory

Win32 provides a sophisticated memory management scheme. The two most distinguishing characteristics of this are the ability to run applications in separate address spaces, and the ability to expand the amount of memory available for allocation through the use of swap files. Both of these capabilities are part of Win32 virtual memory management.

Separate Address Spaces

For programmers familiar with 16-bit Windows, one of the most difficult to get used to ideas is the notion that an address no longer represents a well-defined spot in physical memory. While one process may find a data item at address 0x10000000, another process may have a piece of its code running there; yet another process may regard that address as invalid. How is this accomplished?

The addresses Win32 applications use are often referred to as logical addresses. Every Win32 process has the entire range of 32-bit addresses available for its use (with some operating system specific restrictions, as we see shortly). When a Win32 process references data at a logical address, the computer's memory management hardware intervenes and translates the address into a physical address (more on it later). The same logical address may (and under most circumstances, does) translate into different physical addresses for different processes.

This mechanism has several consequences. Most are beneficial, but some actually render certain programming tasks a bit harder to accomplish.

The most obvious benefit of having separate logical address spaces is that processes can no longer accidentally overwrite code or data belonging to another process. Invalid pointers may still cause the death of the offending process but can no longer mangle data in the address space of other processes or the operating system.

On the other hand, the fact that processes no longer share the same logical address space renders the development of cooperating processes more difficult. It is no longer possible to send the address of an object in memory to another process and expect that process to be able to make use of it. That address only makes sense in the context of the sending application; in the context of the application that receives it, it is meaningless, representing a random spot in memory.

Fortunately, the Win32 API offers a set of new mechanisms for cooperating applications to use. Among this is the ability to use shared memory. Essentially, shared memory is a block of physical memory that is mapped into the logical address space of several processes. By writing data into, or reading data from, a block of shared memory, applications can cooperate.

NOTE: Under the simplified memory management regime of Win32s, all Win32 applications share the same address space.

Address Spaces

Earlier I hinted that the use of 32-bit addresses within the logical address space of a process is not entirely unrestricted. Indeed, there are some limitations. Some address ranges are reserved for use by the operating system. Moreover, the restrictions are not the same in the different Win32 environments.

Using 32-bit addresses with byte-addressable memory means a total address space of 4GB (232=4,294,967,296). Of this, Windows reserves the upper 2GB for its own use, while leaving the lower 2GB available for use by the application.

Windows 95 further reserves the lower 4MB of the address space. This area, often referred to as the Compatibility Arena in Microsoft documentation, exists for compatibility with 16-bit DOS and Windows applications.

I mentioned that Win32 applications run in separate address spaces. This is true inasmuch as the nonreserved areas of the logical address space are concerned. However, the situation of the reserved areas is somewhat different.

Under Windows 95, all reserved areas are shared. In other words, if one application finds a particular object at a memory location in one of the two reserved areas (lower 4MB or upper 2GB), all other applications are guaranteed to find the same object there. However, applications should not rely on this behavior; otherwise, the program will be incompatible with Windows NT (and thus not qualify for the new Microsoft logo program). Besides, as we see shortly, there are easy ways for applications to request a shared area in memory explicitly, and that mechanism works well under both Windows NT and Windows 95.

Windows 95 further divides the upper 2GB into two additional arenas. The arena between 2GB and 3GB is the shared arena that holds shared memory, memory mapped files, and some 16-bit components. The reserved system arena between 3GB and 4GB is where all of the operating system's privileged code resides. This arena is not addressable by nonprivileged application programs.

Virtual Memory

In the previous sections, I discreetly avoided one question. How exactly are logical addresses mapped to physical memory? After all, most computers do not have enough memory to hold several times the 4GB of memory that each application can address. (Even if they did, the power and heat dissipation requirements of that much memory might represent somewhat of a problem. Or the price.)

The answer is, not all logical addresses of an application are actually mapped to physical storage; and those that are may not be mapped to physical memory.

Ever since the introduction of Windows 3.1, Windows has been able to use a swap file. The swap mechanism expands the amount of memory that the system can use by storing unused blocks of data on disk and loading them as needed. While swap files are several orders of magnitude slower than RAM, their use enables the system to run more applications or applications that are more resource-intensive.

The reason swap files can be used efficiently is that most applications allocate blocks of memory that are rarely used. For example, if you use a word processor to edit two documents simultaneously, it may happen that while you work on one document, you do not touch the other for extended periods of time. The operating system may free up physical memory in which the other document resides, swapping the document to disk; the physical memory then becomes available for other applications. When, after some time, you switch to the other document, you may notice some disk activity and a slight delay before the document is displayed; this is when the operating system loads the relevant portions of the swap file back into memory, possibly swapping out other blocks of recently not used data in the process.

Figure 13.1 shows how the operating system and the computer's hardware accomplish the mapping of logical addresses. A table that is often called the page table contains information on all blocks or pages of memory. In effect, this table maps blocks in an application's logical address space to blocks in physical memory or portions of the swap file.

Figure 13.1. Mapping of logical addresses to physical memory.

When a logical address is mapped to actual physical memory, the mapping is dereferenced and the data is read or written as requested. As the operation is supported by the processor's hardware, it does not require any extra time to resolve memory addresses this way.

When the logical address maps to a block in the system's swap file, a different series of events takes place. The attempt to reference such an invalid address triggers the operating system into action. The operating system loads the requested block of data from the swap file into memory, possibly swapping out other blocks of data from memory to disk to make space. Once the requested data is in physical memory and the page table is updated, control is returned to the application. The access to the requested memory location can now be completed successfully. All this is completely transparent to the application; the only sign that would indicate that the requested block of memory was not readily available is the delay caused by the swapping operation.

The fact that logical addresses may map to physical memory locations, blocks of the swap file, or nothing at all implies interesting possibilities. Furthermore, the existence of a mechanism that maps the contents of a file (namely, the swap file) to logical addresses also carries the potential for useful features. Indeed, the Win32 API provides the means for applications to explicitly manage virtual memory and to access disk data through memory mapped files. These and other memory management mechanisms are explored in the next section.

32-Bit Programs

Because most Windows programmers have extensive experience in programming 16-bit Windows, it is perhaps helpful to begin our review of 32-bit memory management issues with the differences between 16- and 32-bit programs. A number of issues, such as integer size, the disappearance of the far and near specifiers, or differences in address calculations affect coding practices.

Integer Size

One of the most striking differences between the 16-bit and 32-bit environments can be demonstrated by the simple example shown in Listing 13.1. This program can be compiled from the command line using cl intsize.cpp.

Listing 13.1: Determining integer size.

#include <iostream.h>

void main(void)

{

    cout << "sizeof(int) = " << sizeof(int);

}

When you run this program, it prints the following result:

sizeof(int) = 4

UNIX programmers are probably relieved to see this result. The nightmare of trying to port UNIX programs that implicitly rely on integers and pointers both being of the same size (32 bits) is gone. Programmers of 16-bit Windows, on the other hand, are facing the added difficulty of having to review older code for any signs of an explicit dependence on the 16-bit integer size.

One thing that has not changed is the size of types defined by Windows. Specifically, the types WORD and DWORD remain 16 and 32 bits wide, respectively. Use of these types when saving application data to disk ensures that the contents of a disk file remain readable by both the 16-and the 32-bit versions of the same application. In contrast, if an application used the int type when writing to disk, the contents of the disk file would be operating system dependent.

Type Modifiers and Macros

An obvious consequence of 32-bit addressing is that you no longer need to use type modifiers to distinguish between near and far pointers, or to specify huge data. Does this mean that existing programs must be modified and all references to the _near, _far, or _huge keywords must be removed? Fortunately not; the 32-bit C/C++ compiler simply ignores these keywords to ensure backward compatibility.

Similarly, all the types that used to be defined in the windows.h header file, such as LPSTR for a far pointer to characters or LPVOID for a far pointer to a void type, still remain available. In the 32-bit environment, these types are simply defined to be equivalent to their near counterparts; thus, LPSTR is the same as PSTR, and LPVOID is the same as PVOID. To maintain backward compatibility (should you ever need to recompile your code with a 16-bit compiler) it is generally a good idea to continue using the correct types. This is further encouraged by the fact that the published interface to most Windows functions uses the correct (near or far) types.

Address Calculations

Naturally, if your program performs address calculations specific to the segmented Intel architecture, it needs to be modified. (Such calculations would also be in violation of the platform-independent philosophy of the Win32 API, making it difficult to compile your program under Windows NT on the MIPS, Alpha, or other platforms.)

A particular case concerns the use of the LOWORD macro. In Windows 3.1, memory allocated with GlobalAlloc was aligned on a segment boundary, with the offset set to 0. Some programmers used this fact to set addresses by simply modifying the low word of a pointer variable using the LOWORD macro. Under the Win32 API, the assumption that an allocated memory block starts on a segment boundary is no longer valid. The questionable practice of using LOWORD this way will no longer work.

Library Functions

In the 16-bit environment, many functions had two versions: one for near addresses, and one for far addresses. It was often necessary to use both. For example, in medium model programs, one frequently had to use _fstrcpy to copy characters from or to a far memory location. In the 32-bit environment, these functions are obsolete.

The header file windowsx.h defines these obsolete function names to refer to their regular counterparts. By including this file in your program that contains older source code, you can avoid having to manually comb through your source files and remove or change these obsolete function references.

Memory Models

Ever since the introduction of the IBM PC, programmers have learned to hate the multitude of compiler switches and options that control addressing behavior. Tiny, small, compact, medium, large, huge, custom memory models, address conversions, 64KB code and data segments—to make a long story short, in 32-bit Windows, this nightmare is no longer. There is only one memory model, in which both addresses and code reside in a flat 32-bit memory space.

Selector Functions

The Windows 3.1 API contains a set of functions (for example, AllocSelector, FreeSelector) that enable applications to directly manipulate physical memory. These functions are not available in the Win32 API; 32-bit applications should not attempt to manipulate physical memory in any way. Dealing with physical memory is a task best left to device drivers.

Simple Memory Management

As I mention at the beginning of this chapter, memory allocation in the 32-bit environment is greatly simplified. It is no longer necessary to separately allocate memory and lock it for use. The distinction between global and local heaps has disappeared. On the other hand, the 32-bit environment presents a set of new challenges.

Memory Allocation via malloc and new

The venerable set of memory management functions in Windows versions prior to 3.1, such as GlobalAlloc and GlobalLock, addressed a problem specific to real mode programming of the 80x86 processor family. Because applications used actual physical addresses to access objects in memory, there was no other way for the operating system to perform memory management functions. It was necessary for applications to abide by a convoluted mechanism by which they regularly relinquished control of these objects. This enabled the operating system to move these objects around as necessary. In other words, applications had to actively take part in memory management and cooperate with the operating system. Because malloc not only allocated memory but also locked it in place, use of this function caused dangerous fragmentation of available memory.

Windows 3.1 uses Intel processes in protected mode. In protected mode, applications no longer have access to physical addresses. The operating system is able to move a memory block around even while applications hold valid addresses to it that they obtained through a call to GlobalLock or LocalLock. Using malloc not only became safe, it became the recommended practice. Several implementations of this function (such as those in Microsoft C/C++ Version 7 and later) also solved another problem. Because of a system-wide limit of 8,192 selectors, the number of times applications could call memory allocation functions without subsequently freeing up memory was limited. By providing a suballocation scheme, the newer malloc implementations greatly helped applications that routinely allocated a large number of small memory blocks.

The 32-bit environment further simplifies memory allocation by eliminating the difference between global and local heaps. (It is actually possible, although definitely not recommended, to allocate memory with GlobalAlloc and free it using LocalFree.)

The bottom line? In a Win32 application, allocate memory with malloc or new, release it with free or delete, and let the operating system worry about all other aspects of memory management. For most applications, this approach is perfectly sufficient.

The Problem of Stray Pointers

Working with a 32-bit linear address space has one unexpected consequence. In the 16-bit environment, every call to GlobalAlloc reserved a new selector. In protected mode in the Intel segmented architecture, selectors define blocks of memory; as part of the selector, the length of the block is also specified. Attempting to address memory outside the allocated limits of a selector resulted in a protection violation.

In the 32-bit environment, automatic and static objects, global and local dynamically allocated memory, the stack, and everything else belonging to the same application shares the application's heap and is accessed through flat 32-bit addresses. The operating system is less likely to catch stray pointers. The possibility of memory corruption through such pointers is greater, increasing the programmer's responsibility in ensuring that pointers stay within their intended bounds.

Consider, for example, the following code fragment:

HGLOBAL hBuf1, hBuf2;

LPSTR lpszBuf1, lpszBuf2;

hBuf1 = GlobalAlloc(GPTR, 1024);

hBuf2 = GlobalAlloc(GPTR, 1024);

lpszBuf1 = GlobalLock(hBuf1);

lpszBuf2 = GlobalLock(hBuf2);

lpszBuf1[2000] = 'X';   /* Error! */

In this code fragment, an attempt is made to write past the boundaries of the first buffer allocated via GlobalAlloc. In the 16-bit environment, this results in a protection violation when the attempt is made to address a memory location outside the limits of the selector reserved by the first GlobalAlloc call. In the 32-bit environment, however, the memory location referenced by lpszBuf1[2000] is probably valid, pointing to somewhere inside the second buffer. An attempt to write to this address will succeed and corrupt the contents of the second buffer.

On the bright side, it is practically impossible for an application to corrupt another application's memory space through stray pointers. This increases the overall stability of the operating system.

Sharing Memory Between Applications

Because each 32-bit application has a private virtual address space, it is no longer possible for such applications to share memory by simply passing pointers to each other in Windows messages. The GMEM_DDESHARE flag is no longer functional. Passing the handle of a 32-bit memory block to another application is meaningless and futile; the handle only refers to a random spot in the private virtual address space of the recipient program.

If it is necessary for two applications to communicate using shared memory, they can do this by using the DDEML library or by using memory mapped files, which are described later in this chapter.

Virtual Memory and Advanced Memory Management

In the Win32 programming environment, applications have improved control over how they allocate and use memory. An extended set of memory management functions is provided. Figure 13.2 shows the different levels of memory management functions in the Win32 API.

Figure 13.2. Memory management functions in the 32-bit environment.

Win32 Virtual Memory Management

Figure 13.1 might appear to suggest that pages of virtual memory must always be mapped to either physical memory or a paging (or swap) file. This is not the case; Win32 memory management makes a distinction between reserved pages and committed pages. A committed page of virtual memory is a page that is backed by physical storage, either in physical memory or in the paging file. In contrast, a reserved page is not backed by physical storage at all.

Why would you want to reserve addresses without allocating corresponding physical storage? One possibility is that you might not know in advance how much space is needed for a certain operation. This mechanism enables you to reserve a contiguous range of addresses in the virtual memory space of your process, without committing physical resources to it until such resources are actually needed. When a reference to an uncommitted page is made, the operating system generates an exception that your program can catch through structured exception handling. In turn, your program can instruct the operating system to commit the page, and then it can continue the processing that was interrupted by the exception. Incidentally, this is how Windows 95 performs many of its own memory management functions, such as stack allocation or manipulating the page table itself.

One real-life example concerns sparse matrices, which are two-dimensional arrays that have most of their array elements equal to zero. Sparse matrices appear frequently in technical applications. It is possible to reserve memory for the entire matrix but commit only those pages that contain nonzero elements, thus reducing the consumption of physical resources significantly while still keeping the application code simple.

Virtual Memory Functions

An application can reserve memory through the VirtualAlloc function. With this function, the application can explicitly specify the address and the size of the memory block about to be reserved. Additional parameters specify the type of the allocation (committed or reserved) and access protection flags. For example, the following code reserves 1MB of memory, starting at address 0x10000000, for reading and writing:

VirtualAlloc(0x10000000, 0x00100000, MEM_RESERVE, PAGE_READWRITE);

Later, the application can commit pages of memory by repeated calls to the VirtualAlloc function. Memory (reserved or committed) can be freed using VirtualFree.

A special use of VirtualAlloc concerns the establishment of guard pages. Guard pages act as one-shot alarms, raising an exception when the application attempts to access them. Guard pages can thus be used to protect against stray pointers that point past array boundaries, for example.

VirtualLock can be used to lock a memory block in physical memory (RAM), preventing the system from swapping out the block to the paging file on disk. This can be used to ensure that critical data can be accessed without disk I/O. This function should be used sparingly because it can severely degrade system performance by restricting the operating system's capability to manage memory. Memory that was locked through VirtualLock can be unlocked using the VirtualUnlock function.

An application can change the protection flags of committed pages of memory using the VirtualProtect function. VirtualProtectEx can be used to change the protection flags of a block of memory belonging to another process. Finally, VirtualQuery can be used to obtain information about pages of memory; VirtualQueryEx obtains information about memory owned by another process.

Listing 13.2 shows another command line application, one that demonstrates the use of virtual memory functions. This program can be compiled with cl -GX sparse.cpp.

Listing 13.2. Handling sparse matrices using virtual memory management.

#include <iostream.h>

#include <windows.h>

#define PAGESIZE 0x1000

void main(void)

{

    double (*pdMatrix)[10000];

    double d;

    LPVOID lpvResult;

    int x, y, i, n;

    pdMatrix = (double (*)[10000])VirtualAlloc(NULL,

                            100000000 * sizeof(double),

                            MEM_RESERVE, PAGE_NOACCESS);

    if (pdMatrix == NULL)

    {

        cout << "Failed to reserve memory.\n";

        exit(1);

    }

    n = 0;

    for (i = 0; i < 10; i++)

    {

        x = rand() % 10000;

        y = rand() % 10000;

        d = (double)rand();

        cout << "MATRIX[" << x << ',' << y << "] = " << d << '\n';

        try

        {

            pdMatrix[x][y] = d;

        }

        catch (...)

        {

            if (d != 0.0)

            {

                n++;

                lpvResult = VirtualAlloc((LPVOID)(&pdMatrix[x][y]),

                             PAGESIZE, MEM_COMMIT, PAGE_READWRITE);

                if (lpvResult == NULL)

                {

                    cout << "Cannot commit memory.\n";

                    exit(1);

                }

                pdMatrix[x][y] = d;

            }

        }

    }

    cout << "Matrix populated, " << n << " pages used.\n";

    cout << "Total bytes committed: " << n * PAGESIZE << '\n';

    for(;;)

    {

        cout << "   Enter row: ";

        cout.flush();

        cin >> x;

        cout << "Enter column: ";

        cout.flush();

        cin >> y;

        try

        {

            d = pdMatrix[x][y];

        }

        catch (...)

        {

            cout << "Exception handler was invoked.\n";

            d = 0.0;

        }

        cout << "MATRIX[" << x << ',' << y << "] = " << d << '\n';

    }

}

This program creates a double-precision matrix of 10,000 by 10,000 elements. However, instead of allocating a whopping 800,000,000 bytes of memory, it only allocates memory on an as-needed basis. This mechanism is especially suitable for matrices that have very few nonzero elements; in this example, only 10 out of 100,000,000 elements are set to random nonzero values.

The program first reserves, but does not commit, 800,000,000 bytes of memory for the matrix. Next, it assigns random values to 10 randomly selected elements. If the element falls on a page of virtual memory that is not yet committed (has no backing in physical memory or in the paging file), an exception is raised. The exception is caught using the C++ exception handling mechanism. The exception handler checks whether the value to be assigned is nonzero; if so, it commits the page in question and repeats the assignment.

NOTE: In this simple example, we assumed that the exception we catch is always a Win32 structured exception indicating a memory access violation. In complex programs, this assumption may not always be valid and a more elaborate exception handling mechanism may be necessary to reliably identify exceptions.

In the last part of the program, the user is invited to enter row and column index values. The program then attempts to retrieve the value of the specified matrix element. If the element falls on a page that has not been committed, an exception is raised; this time, it is interpreted as an indication that the selected matrix element is zero.

The rudimentary user-interface loop of this program does not include a halting condition; the program can be stopped using Ctrl+C.

The program's output looks similar to the following:

MATRIX[41,8467] = 6334

MATRIX[6500,9169] = 15724

MATRIX[1478,9358] = 26962

MATRIX[4464,5705] = 28145

MATRIX[3281,6827] = 9961

MATRIX[491,2995] = 11942

MATRIX[4827,5436] = 32391

MATRIX[4604,3902] = 153

MATRIX[292,2382] = 17421

MATRIX[8716,9718] = 19895

Matrix populated, 10 pages used.

Total bytes committed: 40960

   Enter row: 41

Enter column: 8467

MATRIX[41,8467] = 6334

   Enter row: 41

Enter column: 8400

MATRIX[41,8400] = 0

   Enter row: 1

Enter column: 1

Exception handler was invoked.

MATRIX[1,1] = 0

   Enter row:

Heap Functions

In addition to their default heap, processes can create additional heaps using the HeapCreate function. Heap management functions can then be used to allocate and free memory blocks in the newly created private heap. A possible use of this mechanism involves the creation of a private heap at startup, specifying a size that is sufficient for the application's memory allocation needs. Failure to create the heap using HeapCreate can cause the process to terminate; however, if HeapCreate succeeds, the process is assured that the memory it requires is present and available.

After a heap is created via HeapCreate, processes can allocate memory from it using HeapAlloc. HeapRealloc can be used to change the size of a previously allocated memory block, and HeapFree deallocates memory blocks and returns them to the heap. The size of a previously allocated block can be obtained using HeapSize.

It is important to note that the memory allocated by HeapAlloc is no different from memory obtained using the standard memory allocation functions such as GlobalAlloc, GlobalLock, or malloc.

Heap management functions can also be used on the default heap of the process. A handle to the default heap can be obtained using GetProcessHeap. The function GetProcessHeaps returns a list of all heap handles owned by the process.

A heap can be destroyed using the function HeapDestroy. This function should not be used on the default heap handle of the process that is returned by GetProcessHeap. (Destroying the default heap would mean destroying the application's stack, global and automatic variables, and so on,. with obviously disastrous consequences).

The function HeapCompact attempts to compact the specified heap by coalescing adjacent free blocks of memory and decommitting large free blocks. Note that objects allocated on the heap by HeapAlloc are not movable, so the heap can easily become fragmented. HeapCompact will not unfragment a badly fragmented heap.

Windows API and C Run-time Memory Management

At the top of the hierarchy of memory management functions are the standard Windows and C run-time memory management functions. As noted earlier, these functions are likely to prove adequate for the memory management requirements of most applications. Handle-based memory management functions provided in the Windows API include GlobalAlloc and LocalAlloc, GlobalLock and LocalLock, GlobalFree and LocalFree. The C/C++ run-time library contains the malloc family of functions (malloc, realloc, calloc, free, and other functions). These functions are safe to use and provide compatibility with the 16-bit environment, should it become necessary to build applications that can be compiled as both 16-bit and 32-bit programs.

Miscellaneous and Obsolete Functions

In addition to the API functions already described, a number of miscellaneous functions are also available to the Win32 programmer. Several other functions that were available under Windows 3.1 have been deleted or become obsolete.

Memory manipulation functions include CopyMemory, FillMemory, MoveMemory, and ZeroMemory. These functions are equivalent to their C run-time counterparts such as memcpy, memmove, or memset.

A set of Windows API functions is provided to verify whether a given pointer provides a specific type of access to an address or range of addresses. These functions are IsBadCodePtr, IsBadStringPtr, IsBadReadPtr, and IsBadWritePtr. For the latter pair, huge versions (IsBadHugeReadPtr, IsBadHugeWritePtr) are also provided for backward compatibility with Windows 3.1.

Information about available memory can be obtained using GlobalMemoryStatus. This function replaces the obsolete GetFreeSpace function.

Other obsolete functions include all functions that manipulate selectors (for example, AllocSelector, ChangeSelector, FreeSelector); manipulate the processor's stack (SwitchStackBack, SwitchStackTo); manipulate segments (LockSegment, UnlockSegment); or manipulate MS-DOS memory (GlobalDOSAlloc, GlobalDOSFree).

Memory-Mapped Files and Shared Memory

Earlier in this chapter, I mentioned that applications are no longer capable of communicating using global memory created with the GMEM_DDESHARE flag. Instead, they must use memory-mapped files to share memory. What are memory-mapped files?

Normally, the virtual memory mechanism enables an operating system to map nonexistent memory to a disk file, called the paging file. It is possible to look at this the other way around and see the virtual memory mechanism as a method of referring to the contents of a file, namely the paging file, through pointers as if the paging file were a memory object. In other words, the mechanism maps the contents of the paging file to memory addresses. If this can be done with the paging file, why not with other files? Memory-mapped files represent this natural extension to the virtual memory management mechanism.

You can create a file mapping by using the CreateFileMapping function. You can also use the OpenFileMapping function to enable an application to open an existing named mapping. The MapViewOfFile function maps a portion of the file to a block of virtual memory.

The special thing about memory-mapped files is that they are shared between applications. That is, if two applications open the same named file mapping, they will, in effect, create a block of shared memory.

Isn't it a bit of an overkill to be forced to use a disk file when the objective is merely to share a few bytes between two applications? Actually, it is not necessary to explicitly open and use a disk file in order to obtain a mapping in memory. Applications can submit the special handle value of 0xFFFFFFFF to CreateFileMapping in order to obtain a mapping to the system paging file itself. This, in effect, creates a block of shared memory.

Listings 13.3 and 13.4 demonstrate the use of shared memory objects for intertask communication. They implement a very simple mechanism where one program, the client, deposits a simple message (a null-terminated string) in shared memory for the other program. This other program, the server, receives the message and displays it. These programs are written for the Windows 95 or Windows NT command line. To see how they work, start two MS-DOS windows, start the server program first in one of the windows, and then start the client program in the other. The client sends its message to the server; the server, in turn, displays the message it receives and then terminates.

Listing 13.3. Intertask communication using shared memory: The server.

#include <iostream.h>

#include <windows.h>

void main(void)

{

    HANDLE hmmf;

    LPSTR lpMsg;

    hmmf = CreateFileMapping((HANDLE)0xFFFFFFFF, NULL,

                             PAGE_READWRITE, 0, 0x1000, "MMFDEMO");

    if (hmmf == NULL)

    {

        cout << "Failed to allocated shared memory.\n";

        exit(1);

    }

    lpMsg = (LPSTR)MapViewOfFile(hmmf, FILE_MAP_WRITE, 0, 0, 0);

    if (lpMsg == NULL)

    {

        cout << "Failed to map shared memory.\n";

        exit(1);

    }

    lpMsg[0] = '\0';

    while (lpMsg[0] == '\0') Sleep(1000);

    cout << "Message received: " << lpMsg << '\n';

    UnmapViewOfFile(lpMsg);

}

Listing 13.4. Intertask communication using shared memory: The client.

#include <iostream.h>

#include <windows.h>

void main(void)

{

    HANDLE hmmf;

    LPSTR lpMsg;

    hmmf = CreateFileMapping((HANDLE)0xFFFFFFFF, NULL,

                             PAGE_READWRITE, 0, 0x1000, "MMFDEMO");

    if (hmmf == NULL)

    {

        cout << "Failed to allocated shared memory.\n";

        exit(1);

    }

    lpMsg = (LPSTR)MapViewOfFile(hmmf, FILE_MAP_WRITE, 0, 0, 0);

    if (lpMsg == NULL)

    {

        cout << "Failed to map shared memory.\n";

        exit(1);

    }

    strcpy(lpMsg, "This is my message.");

    cout << "Message sent: " << lpMsg << '\n';

    UnmapViewOfFile(lpMsg);

}

These two programs are nearly identical. They both start by creating a file mapping of the system paging file with the name MMFDEMO. After the mapping is successfully created, the server sets the first byte of the mapping to zero and enters a wait loop, checking once a second to see whether the first byte is nonzero. The client, in turn, deposits a message string at the same location and exits. When the server notices that the data is present, it prints the result and also exits.

Both programs can be compiled from the command line: cl mmfsrvr.cpp and cl mmfclnt.cpp.

Shared Memory and Based Pointers

A shared memory mapped file object may not necessarily appear at the same address for all processes. While shared memory objects are mapped to identical locations in the address spaces of Windows 95 processes, the same is not true in Windows NT. This can be a problem if applications want to include pointers in the shared data. One solution to this problem is to use based pointers and set them to be relative to the start of the mapping area.

Based pointers are a Microsoft-specific extension of the C/C++ language. A based pointer is declared using the __based keyword, in a fashion similar to the following:

void *vpBase;

void __based(vpBase) *vpData;

References through the based pointer always point to data relative to the specified base. Their utility extends beyond shared memory; based pointers can also be very useful when saving data that contains pointers to disk.

Threads and Memory Management

The multithreaded nature of 32-bit Windows presents some additional challenges when it comes to memory management. As threads may concurrently access the same objects in memory, it is possible that one thread's operation on a variable is interrupted by another; obviously, a synchronization mechanism is needed to avoid this. In other situations, threads may want private copies of a data object, instead of a shared copy.

Interlocked Variable Access

The first of the two problems I mentioned is solved in many cases by interlocked variable access. This mechanism enables a thread to change the value of an integer variable and check the result without the possibility of being interrupted by another thread.

Under normal circumstances, if you increment or decrement a variable within a thread, it is possible that another thread changes the value of this variable once again before the first thread has a change to examine its value. The functions InterlockedIncrement and InterlockedDecrement can be used to atomically increment or decrement a 32-bit value and check the result. A third function, InterlockedExchange, can be used to atomically set a variable's value and retrieve the old value, without the fear of being interrupted by another thread.

Thread-Local Storage

While automatic variables are always local to the instance of the function in which they are allocated, the same is not true for global or static objects. If your code relies heavily on such objects, it may prove to be very difficult to make your application thread-safe.

Fortunately, the Win32 API offers a mechanism to allocate thread-local storage. The TlsAlloc function can be used to reserve a TLS Index, which is a DWORD sized space. Threads can use this space, for example, to store a pointer to a private block of memory through the TlsSetValue and TlsGetValue functions. The TlsFree function can be used to release the TLS index.

If this doesn't sound easy, don't despair. The Visual C++ compiler provides an alternative mechanism that is much easier to use. Data objects can be declared thread local using the thread type modifier. For example

__declspec(thread) int i;

Using __declspec(thread) is problematic in DLLs because of a problem in extending the global memory allocation of a DLL at runtime to accommodate thread-local objects. It is recommended that you use the TLS APIs, such as TlsAlloc, in code that is intended to run in a DLL.

Accessing Physical Memory and I/O Ports

Programmers of 16-bit Windows are used to the idea of accessing physical memory or the input/output ports of Intel processors directly. For example, it is possible to write a 16-bit application that accesses a custom hardware device through memory-mapped I/O. It is natural to expect that those programming practices can be carried over to the 32-bit operating system.

However, this is not the case. Win32 is a platform-independent operating system specification. As such, anything that introduces platform (hardware) dependence is fundamentally incompatible with the operating system. This includes all kinds of access to actual physical hardware, such as ports, physical memory addresses, or anything else.

So what can you do if your task is to write an application that communicates directly with hardware? The answer is that you require one of the various DDKs (Device Driver Kits). Through the DDK, it is possible to create a driver library that encapsulates all low-level access to the device and keep your high-level Win32 application free of platform dependencies.

Summary

Memory management in Win32 is markedly different from memory management in 16-bit Windows. Applications need no longer be concerned about issues related to the Intel segmented architecture; on the other hand, new capabilities mean new responsibilities for the programmer.

Win32 applications run in separate address spaces. A pointer in the context of one application is meaningless in the context of another. All applications have access to a 4GB address space through 32-bit addresses (although the different Win32 implementations reserve certain portions of this address space for special purposes).

Win32 operating systems use virtual memory management to map a logical address in an application's address space to a physical address in memory or a block of data in the system's swap or paging file. Applications can explicitly use virtual memory management capabilities to create memory mapped files, or to reserve, but not commit, huge blocks of virtual memory.

Memory mapped files offer a very efficient intertask communication mechanism. By gaining access to the same memory mapped file object, two or more applications can utilize such a file as shared memory.

Special features address the unique problems of memory management in threads. Through interlocked variable access, threads can perform atomic operations on shared objects. Through thread-local storage, threads can allocate privately owned objects in memory.

Many of the old Windows and DOS memory management functions are no longer available. Because of the platform independence of Win32, applications can no longer access physical memory directly. If it is necessary to directly access hardware (as in the case when custom hardware is used), it may be necessary to utilize the appropriate Device Driver Kit.