Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Pro .NET Memory Management: For Better Code, Performance, and Scalability
Pro .NET Memory Management: For Better Code, Performance, and Scalability
Pro .NET Memory Management: For Better Code, Performance, and Scalability
Ebook1,594 pages14 hours

Pro .NET Memory Management: For Better Code, Performance, and Scalability

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Understand .NET memory management internal workings, pitfalls, and techniques in order to effectively avoid a wide range of performance and scalability problems in your software. Despite automatic memory management in .NET, there are many advantages to be found in understanding how .NET memory works and how you can best write software that interacts with it efficiently and effectively. Pro .NET Memory Management is your comprehensive guide to writing better software by understanding and working with memory management in .NET.
Thoroughly vetted by the .NET Team at Microsoft, this book contains 25 valuable troubleshooting scenarios designed to help diagnose challenging memory problems. Readers will also benefit from a multitude of .NET memory management “rules” to live by that introduce methods for writing memory-aware code and the means for avoiding common, destructive pitfalls.

What You'll Learn
  • Understand the theoretical underpinnings of automatic memory management
  • Take a deep dive into every aspect of .NET memory management, including detailed coverage of garbage collection (GC) implementation, that would otherwise take years of experience to acquire
  • Get practical advice on how this knowledge can be applied in real-world software development
  • Use practical knowledge of tools related to .NET memory management to diagnose various memory-related issues
  • Explore various aspects of advanced memory management, including use of Span and Memory types

Who This Book Is For

.NET developers, solution architects, and performance engineers
LanguageEnglish
PublisherApress
Release dateNov 12, 2018
ISBN9781484240274
Pro .NET Memory Management: For Better Code, Performance, and Scalability

Related to Pro .NET Memory Management

Related ebooks

Programming For You

View More

Related articles

Reviews for Pro .NET Memory Management

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Pro .NET Memory Management - Konrad Kokosa

    ©  Konrad Kokosa 2018

    Konrad KokosaPro .NET Memory Managementhttps://doi.org/10.1007/978-1-4842-4027-4_1

    1. Basic Concepts

    Konrad Kokosa¹ 

    (1)

    Warsaw, Poland

    Let’s start from a simple, yet very important question. When you should care about .NET memory management if it is all automated? Should you care at all? As you probably expect by the fact that I wrote such a book - I strongly encourage you to remember about memory in every developer’s situation. This is just a matter of our professionalism. A consequence of how we conduct our work. Are we trying to make our best or just make? If we take care of the quality of our work, we should worry not only about our piece of work to be just working. We should be worried about how is it working. Is it optimal in terms of CPU and memory usage? Is it maintainable, testable, opened for extension but closed for modification? Is our code SOLID? I believe all those questions distinguish beginners from more advanced, experienced programmers. The former are mainly interested in getting the job done and do not care much about the above-mentioned, nonfunctional aspects of their work. The latter are experienced enough to have enough mental processing power to consider the quality of their work. I believe everyone wants to be like that. But this is, of course, not a trivial thing. Writing an elegant code, without any bugs, with each possible nonfunctional requirement fulfilled is really hard.

    But should such a desire for the mastery be the only prerequisite for gaining deeper knowledge about .NET memory management? Memory corruptions revealing as AccessViolationException are extremely rare.¹ The uncontrolled increase in memory usage can also appear so. Do we have anything to be worried about then? As .NET runtime has a sophisticated Microsoft implementation, luckily we do not have to think about memory aspects a lot. But, on the other hand, when being involved in analyzing performance problems of big .NET-based applications, memory consumption problems were always high on the list of issues. Does it cause trouble in the long-term view if we have a memory leak after days of continuous running? On the Internet we can find a funny meme about a memory leak that was not fixed in the software of some particular combat missile, because the memory was enough before the missile reached its destination. Is our system such a one-time missile? Do we realize whether automated memory management introduces a big overhead for our application or not? Maybe we could use only two servers instead of ten? And further, we are not memory free even in the times of server-less cloud computing. One of the examples can be Azure Functions, which are billed based on a measure called gigabyte seconds (GB-s). It is calculated by multiplying the average memory size in gigabytes by the time in seconds it takes to execute a particular function. Memory consumption directly translates into money we spent.

    In each case, we begin to realize that we have no idea where to start looking for the real cause and valuable measurements. This is the place where we begin to understand that it is worthwhile to understand internal mechanisms of our applications and the underlying runtime.

    In order to deeply understand memory management in .NET, it is best to start from scratch. No matter whether you are a novice programmer or very advanced one. I would recommend that together we went through the theoretical introduction in this chapter. This will establish a common level of knowledge and understanding of concepts, which will be used through the rest of the book. For this not to be simply boring theory, sometimes I refer to specific technologies. We will have a chance to get a little history of software development. It fits well in the development of concepts related to memory management. We will notice also some little interesting facts, which I hope will prove to be interesting for you also. Knowing history is always one of the best ways to get the broader perspective of the topic.

    But do not be afraid. This is not a historical book. I will not describe biographies of all engineers involved in developing garbage collection algorithms since 1950. Ancient history background won’t be necessary either. But still, I hope you will find it interesting to know how this topic evolved and where we are now in the history timeline. This will also allow us to compare the .NET approach to the many other languages and runtimes you might hear about from time to time.

    Memory-Related Terms

    Before we begin, it is useful to take a look at some very important definitions, without which it is difficult to imagine discussing the topic of memory:

    bit - it is the smallest unit of information used in computer technology. It represents two possible states, usually meaning numerical values 0 and 1 or logic values true and false. We briefly mention how modern computers store single bits in Chapter 2. To represent bigger numerical values, a combination of multiple bits needs to be used to encode it as a binary number explained below. When specifying the data size, bits are specified with the lowercase letter b.

    binary number - integer numerical value represented as a sequence of bits. Each successive bit determines the contribution of the successive power of 2 in the sum of the given value. For example, to represent the number 5 we can use three successive bits with values 1, 0, and 1 because 1x1 + 0x2 + 1x4 equals 5. An n-bit binary number can represent a maximum value of 2^n - 1. There is also often an additional bit dedicated to represent the sign of the value to encode both positive and negative numbers. There are also other, more complex ways to encode numeric values in a binary form, especially for floating-point numbers.

    binary code - instead of numerical values, a sequence of bits can represent a specified set of different data - like characters of text. Each bits sequence is assigned to specific data. The most basic one and the most popular for many years was ASCII code, which uses 7-bit binary code to represent text and other characters. There are other important binary codes like opcodes encoding instructions telling the computer what it should do.

    byte - historically it was a sequence of bits for encoding a single character of text using specified binary code. The most common byte size is 8-bit long, although it depends on the computer architecture and may vary between different ones. Because of this ambiguity, there is a more precise octet term, which means exactly an 8-bit long data unit. Nevertheless, it is the de facto standard to understand the byte as an 8-bit length value, and as such it has become an unquestionable standard for defining data sizes. It is currently unlikely to meet anything different than the standard one architecture with 8-bit long bytes. Hence, when specifying the data size, bytes are specified with the uppercase letter B.

    By specifying the size of the data, we use the most common multiples (prefixes) determining their order of magnitude. It is a cause of constant confusion and misunderstanding, which is worth it at this point to explain. Overwhelmingly popular terms such as kilo, mega, and giga mean multiplication of thousands. One kilo is 1000 (and we denote it as lowercase letter k), one mega is 1 million (uppercase letter M), and so on. On the other hand, sometimes a popular approach is to express orders of magnitude in successive multiplications of 1024. In such cases, we talk about one kibi, which is 1024 (denoted as Ki), one mebi is 1024*1024 (denoted as Mi), one gibi (Gi) is 1024*1024*1024, and so on. This introduces common ambiguity. When someone talks about 1 gigabyte, they may be thinking about 1 billion of bytes (1 GB) or 1024^3 of bytes (1 GiB) depending on the context. In practice, very few care about the precise use of those prefixes. It is absolutely common to specify the size of memory modules in computers nowadays as gigabytes (GB) when they are truly gibibytes (GiB) or the opposite in case of hard drives storage. Even JEDEC Standard 100B.01 Terms, Definitions, and Letter Symbols for Microcomputers, Microprocessors, and Memory Integrated Circuits refers to common usage of K, M, and G as multiplications of 1024 without explicitly deprecating it. In such situations, we are just left to common sense in understanding those prefixes from the context.

    Currently we are very used to the terms such as RAM or persistent storage installed in our computers. Even smart watches are now equipped with 8 GiB of RAM. We can easily forget that the first computers were not equipped with such luxuries. You could say that they were not equipped with anything. A look at the short history of computer development will allow us to look differently on the memory itself. Let’s start from the beginning.

    We should bear in mind that it is very disputable which device can be named as the very first computer. Likewise, it is very hard to name the one and only inventor of the computer. This is just a matter of definition what computer really is. So instead of starting endless discussions what and who was first, let’s just look at some of the oldest machines and what they offered to programmers, although the word programmer was to be coined a lot of years later. At the beginning, they were called coders or operators.

    It should be emphasized that machines that may be defined as the first computers were not fully electronic, but electromechanical. For this reason, they were very slow and despite the impressive size offered very little. The first of these programmable electromechanical computers was designed in Germany by Konrad Zuse, named the Z3 computer. It weighed one ton! One addition took about one second and single multiplication took three seconds! Built from 2,000 electromechanical relays, it offered an arithmetical unit capable of add, subtract, multiply, divide, and square root operations only. Arithmetical units included also two 22-bit memory storages used for calculations. It offered also 64 general-purpose memory cells, each 22 bits long. Nowadays we could say it offered 176 bytes of internal memory for data!

    The data was typed via a special keyboard, and the program was read during calculation from punched celluloid film. The possibility of storing a program into internal computer memory was to be implemented a few years later, and we will come back to it shortly, although Zuse was fully aware of this idea. In the context of the book you are reading, more important is the question of access to the Z3’s memory. Programming the Z3, we had at our disposal only nine instructions! One of them allow you to load the value of one of the 64 memory cells to the memory storage of the arithmetic unit. Another was to save the value back. And that’s all when it comes to memory management in this very first computer. Although Z3 was ahead of his time in many ways, for political reasons and the outbreak of World War II, its impact on the development of computers has become negligible. Zuse had been developing its line of computers for many years after the war, and its latest version of the Z22 computer was built in 1955.

    During the war and shortly after, the main centers of development of computer science were the United States and the United Kingdom. One of the first computers built in the United States was the Harvard Mark I developed by IBM in collaboration with Harvard University called the Automatic Sequence Controlled Calculator . It was also electromechanical, like the Z3 mentioned before. It was enormous in size, measuring 8 feet high, 51 feet long, and 3 feet deep. And it weighed 5 tons! It is called the biggest calculating machine ever. Built a few years, the first programs launched at the end of the Second World War, in 1944. It served the Navy, but also John von Neumann, during his work in the Manhattan Project, on the first atomic bomb. Regarding its size, it offered only 72 memory slots for 23-digit numbers with sign. Such a slot was called an accumulator - a dedicated small memory place where intermediate arithmetic and logic results are stored. Translated into measures today, we could say that this 5-ton machine provided access to 72 memory slots each 78-bit long (we need 78 bits to represent quite a big 23-digit number); therefore, it offered memory of 702 bytes! The programs were then de facto a series of mathematical calculations operating on those 72 memory slots. Those were the first-generation programming languages ​​(denoted as 1GL) or machine languages where programs were stored on punched tape, which was physically fed into the machine as needed or operated by front panel switches. It could proceed with only three additions or subtractions per second. Single multiplication took 20 seconds and calculation of sin(x) took one minute! Just like in the Z3, memory management did not exist in this machine at all - you could only read or write the value to one of the mentioned memory cells.

    What is interesting for us that from this computer the Harvard architecture term has originated (see Figure 1-1). In accordance with this architecture, the storage of program and storage of data are physically separated. Such data is being processed by some kind of electronic or electromechanical device (like Central Processing Unit). Such a device is often also responsible for controlling Input/Output devices like punch card readers, keyboards, or displaying devices. Although Z3 or Mark I computers used this architecture because of its simplicity, it is not completely forgotten nowadays. As we will see in Chapter 2, it is used today in almost every computer as the modified Harvard architecture. And we will even see its influence on programs that we write on a daily basis.

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig1_HTML.png

    Figure 1-1

    Harvard architecture diagram

    The much better-known computer ENIAC, completed in 1946, was already an electronic device based on vacuum tubes. It offered thousands of times better mathematical operations speed than the Mark I. However, in terms of memory it looked still very unattractive. It offered only 20 10-digits signed accumulators, and there was no internal memory to store programs. Simply put, due to World War II, the priority was to build machines as fast as possible, for military purposes, not to build something sophisticated.

    But academics like Konrad Zuse, Alan Turing, and John von Neumann were investigating the idea of using an internal computer’s memory to store the program altogether with its data. This would allow a much easier programming (and especially, reprogramming) than coding via punched cards or mechanical switches. John von Neumann wrote in 1945 an influential paper named First Draft of a Report on the EDVAC in which he described architecture named the von Neumann architecture. It should be stated that it was not solely von Neumann’s concept as he was inspired by other academics of his time.

    The von Neumann architecture showed in Figure 1-2 is a simplified Harvard architecture in which there is a single memory unit for storing both the data and the program. It for sure reminds you of a current computer and this is not without a reason. From a high-level point of view, this is exactly how modern computers are still being constructed where von Neumann and Harvard architecture meets in a modified Harvard architecture.

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig2_HTML.png

    Figure 1-2

    Von Neumann architecture diagram

    The Manchester Small-Scale Experimental Machine (SSEM, nicknamed Baby) built in 1948 and the Cambridge’s EDSAC built in 1949 were the world’s first computers that stored program instructions and data in the same space and hence incorporated the von Neumann architecture. Baby was much more modern and innovative because it was the first computer using a new kind of storage - the Williams tubes, based on cathode ray tubes (CRT). Williams tubes can be seen as the very first Random Access Memory (RAM) explained below. The SSEM had a memory of 32 memory cells, each 32-bits long. So, we can say that the first computer with RAM had 128 bytes of it! This is the journey we are taking, from 128 bytes in 1949 to a typical 16 gibibytes in 2018. Nevertheless, Williams tubes become a standard at the turn of the 1940s and 1950s, when a lot of other computers where built.

    This leads us historically to a perfect moment that we may explain all the basic concepts of computer architecture. All are gathered below and shown in Figure 1-3:

    memory - responsible for storing data and the program itself. The way in which memory is implemented has evolved over time in a significant way, starting from the above-mentioned punch cards, through magnetic types and cathode ray tubes, until currently used transistors. Memory can be further divided into two main subcategories:

    Random Access Memory (RAM) - allows us to read data at the same access time irrespective of the memory region we access. In practice, as we will see in Chapter 2, modern memory fulfills this condition only approximately for technological reasons.

    Non-uniform access memory - opposite of RAM, the time required to access memory depends on its location on physical storage. This obviously includes punch cards, magnetic types, classical hard disks, CDs and DVDs, and so on where storage media has to be positioned (for example, rotated) to the correct position before accessing.

    address - represents a specific location within the entire memory area. It is typically expressed in term of bytes as a single byte is the smallest possible, addressing granularity on many platforms.

    arithmetic and logic unit (ALU) - responsible for performing operations like addition and subtraction. This is the core of the computer, where most of the work is being done. Nowadays computers include more than one ALU, allowing for parallelization of computation.

    control unit - decodes program instructions (opcodes) read from memory. Based on the internal instruction’s description, it knows which arithmetical or logical operation should be performed and on which data.

    register - memory location quickly accessible from ALU and/or Control Unit (which we can collectively refer to as execution units), usually contained in it. Accumulators mentioned before are a special, simplified kind of registers. Registers are extremely fast in terms of access time, and there is in fact no place for data closer to the execution units than them.

    word - fixed-size basic unit of data used in particular computer design. It is reflected in many design areas like the size of most registers, the maximum address, or the largest block of data transferred in a single operation. Most commonly it is being expressed in the number of bits (referred to as the word size or word length). Most computers today are 32-bit or 64-bit so they have 32-bit and 64-bit words length respectively, 32-bit or 64-bit long registers, and so on.

    Von Neumann architecture incarnated in SSEM or EDSAC machines leads as to the term of stored-program computers that is obvious nowadays, but it was not at the beginning of the computer era. In such a design, program code to be executed is stored in the memory so it can be accessed like normal data - including such useful operations like modifying it and overwriting with a new program code.

    A control unit stores an additional register, called instruction pointer (IP) or program counter (PC) , to point to a currently executing instruction. Normal program execution is as simple as incrementing the address stored in PC to the succeeding instructions. Things like loops or jumps are as easy as changing the value of the instruction pointer to the other address, designating where we want to move the program execution.

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig3_HTML.png

    Figure 1-3

    Stored-program computer diagram - memory + instruction pointer

    The first computers were programmed using a binary code that directly described the executed instructions. However, with the increasing complexity of programs, this solution has become increasingly burdensome. A new programming language (denoted as second-generation programming languages - 2GL) has been designed describing the code in a more accessible way by means of the so-called assembly code . This is a textual and very concise description of the individual instructions executed by the processor. However, it was much more convenient than direct binary encoding. Then even higher-level languages have been designed (3GL), such as well-known C, C ++, or Pascal.

    What is interesting to us is that all these languages must be transformed from text to binary form and then put into the computer memory. The process of such a transformation is called a compilation , and the tool that runs it is called a compiler . In the case of assembly code, we are rather naming it assembling by the assembler tool. In the end, the result is a program in a binary code format that may be later executed - a sequence of opcodes and their arguments (operands).

    Equipped with this basic knowledge, we can now begin our journey in the memory management topic.

    The Static Allocation

    Most of the very first programming languages did allow only static memory allocation - the amount and the exact location of memory needed had to be known during compilation time, before even executing the program. With the fixed and predefined sizes, memory management was trivial. All major ancient times programming languages, starting from machine or assembly code to the first versions of FORTRAN and ALGOL had such limited possibilities. But they have many drawbacks also. Static memory allocations can easily lead to inefficient memory usage- not knowing in advance how many data will be processed, how do we know how much memory we should allocate? This makes programs limited and not flexible. In general, such a program should be compiled again to process bigger data volumes.

    In the very first computers, all allocations were static because the memory cells used (accumulator, registers, or RAM memory cells) were determined during program encoding. So, defined variables lived over the whole lifetime of the program. Nowadays we still use static allocation in such a sense when creating static global variables and the like, stored in a special data segment of a program. We will see in later chapters where they are stored in the case of .NET programs.

    The Register Machine

    So far, we have seen examples of machines that were using registers (or accumulators as a special case) to operate on Arithmetic Logic Units (ALUs). Machine that constitute such a design is called the register machine . It is because while executing programs on such a computer, we are in fact making calculations on registers. If we want to add, divide, or do anything else, we must load proper data from memory into proper registers. Then we call specific instructions to invoke proper operations on them and then another one to store the result from one of the registers into memory.

    Let’s suppose we want to write a program that calculates an expression s=x+(2*y)+z in a computer with two registers - named A and B. Let’s assume also that s, x, y, and z are addresses to memory with some values stored there. We assume also some low-level pseudo-assembly code with instructions like Load, Add, Multiply. Such a theoretical machine can be programmed with the following simple program (see Listing 1-1).

    Load      A, y        // A = y

    Multiply  A, 2        // A = A * 2 = 2 * y

    Load      B, x        // B = x

    Add       A, B        // A = A + B = x + 2 * y

    Load      B, z        // B = z

    Add       A, B        // A = A + B = x + 2 * y + z

    Store     s, A        // s = A

    Listing 1-1

    Pseudo-code of a sample program realizing s=x+(2*y)+z calculation on the simple, two-register register machine. Comments shows register’s state after executing each instruction.

    If this code reminds you of x86 or any other assembly code you have ever learned - this is not a coincidence! This is because most modern computers are kind of complex register machines. All Intel and AMD CPUs we use in our computers operate in such a way. When writing x86/x64-based assembly code, we operate on general-purpose registers like eax, ebx, ecx, etc. There are, of course, many more instructions, other specialized registers, etc. But the concept behind it is the same.

    Note

    Could one imagine a machine with an instruction set that allows us to execute an operation directly on memory, without a need to load data into registers? Following our pseudo-assembly language, it could look much more succinct and higher level, because there are no additional load/store instructions from memory to registers and their opposites:

    Multiply        s, y, 2     // s = 2 * y

    Add             s, x        // s = s + x = 2 * y + x

    Add             s, z        // s = s + z = 2 * y + x + z

    Yes, there were such machines like IBM System/360, but nowadays I am not aware of any production-used computer of such kind.

    The Stack

    Conceptually, the stack is a data structure that can be simply described as last in, first out (LIFO) list. It allows two main operations: adding some data on the top of it ("push) and returning some data from top of it (pop") illustrated in Figure 1-4.

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig4_HTML.png

    Figure 1-4

    Pop and push stack operations. This is a conceptual drawing only, not related to any particular memory model and implementation.

    Stack from the very beginning become inherently related with computer programming, mainly because of the concept of the subroutine. Today’s .NET heavily uses a call stack and stack concepts, so let’s look how it all started. The original meaning of the stack as a data structure is still valid (for example, there is a Stack collection available in .NET), but let’s now look how it evolved into a more general meaning of the computer memory organization.

    The very first computers we were talking about earlier allowed only sequential program execution, reading each instruction one after another from the punch card or film. But the idea to write some parts of programs (subroutines) that could be reused from different points of the whole program was obviously very tempting. The possibility to call different parts of the program required, of course, the code to be addressable as we need somehow to point to what other part of the program we want to call. The very first approach was used by the famous Grace Hooper in the A-0 system- called the first compiler. She encoded a set of different programs on the tape, giving each a succeeding number to allow the computer to find it. Then a program consists of a sequence of numbers (programs’ indexes) and its parameters. Although it is indeed calling subroutines, it is obviously a very limited way. A program could only call subroutines each after another, and no nested calls were allowed.

    Nested calls require a little more complicated approach because computers must remember somehow where to continue with execution (where to return) after executing a specific subroutine. The return address stored in one of the accumulators was the very first approach invented by David Wheeler on the EDSAC machine (a method called " Wheeler jump "). But in his simplified approach, recursive calls were not possible, which means calling the same subroutine from itself.

    A first mention of the stack concept as we know it today in the context of computer architecture was probably mentioned by Alan Turing in his report describing Automatic Computer Engine (ACE) written in the early 1940s. It described a concept of the von Neumann-like machine, which was in fact a stored-program computer. Besides a lot of many other implementation details, he described two instructions - BURY and UNBURY - operating on the main memory and accumulators:

    When calling a subroutine (BURY), the address of the currently executing instruction, incremented by one to point to the next (returning) instruction, was stored in the memory. And another temporary storage, serving as a stack pointer, was incremented by 1.

    When returning from the subroutine (UNBURY), the opposite action was taken.

    This constituted the very first implementation of the stack in terms of the LIFO-organized place for the subroutines return addresses. This is a solution still used in modern computers, and besides that it has obviously evolved considerably since then, the foundations are still the same.

    The stack is a very important aspect of memory management because when programming in .NET, a lot of our data may be placed there. Let’s take a closer look at the stack and its use in function calls. We will use an example program from Listing 1-2 written in C-like pseudo-code that calls two functions - main calls fun1 (passing two arguments a and b), which has two local variables x and y. Then function fun1 at some moment calls function fun2 (passing single argument n), which has a single local variable z.

    void main()

    {

       ...

       fun1(2, 3);

       ...

    }

    int fun1(int a, int b)

    {

       int x, y;

       ...

       fun2(a+b);

    }

    int fun2(int n)

    {

       int z;

       ...

    }

    Listing 1-2

    Pseudo-code of a program calling function inside another function

    At first, imagine a continuous memory area, designed to handle the stack, drawn in such a way that subsequent memory cells have addresses growing up (see left part of Figure 1-5a) and also a second memory region where your program code resides (see right part of Figure 1-5a) organized the same way. As a code of functions does not have to lie next to each other, main, fun1, and fun2 code blocks have been drawn separated. The execution of the program from Listing 1-2 can be described in the following steps:

    1.

    Just before calling fun1 inside main (see Figure 1-5a). Obviously as the program is already running, some stack region is already created (grayed part of stack region at Figure 5a). Stack pointer (SP) keeps an address indicating the current boundary of the stack. Program counter (PC) points somewhere inside the main function (we marked this as address A1), just before the instruction to call fun1.

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig5a_HTML.png

    Figure 1-5a

    Stack and code memory regions - at the moment before calling function fun1 from Listing 1-2

    2.

    After calling fun1 inside main (see Figure 1-5b). When function is called, stack is being extended by moving SP to contain necessary information. This additional space includes:

    Arguments - all function arguments can be saved on stack. In our sample, arguments a and b were stored there.

    Return address - to have a possibility to continue main function execution after executing fun1, the next instruction’s address just after the function call is saved on stack. In our case we denoted it as A1+1 address (pointing to the next instruction after instruction under A1 address).

    Local variables - a place for all local variables, which can be saved also on stack. In our sample variables x and y were stored there.

    Such a structure placed on stack when a subroutine is being called is named an activation frame. In a typical implementation the stack pointer is decremented by an appropriate offset to point to the place where a new activation frame can start. That is why it is often said that the stack grows downward.

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig5b_HTML.png

    Figure 1-5b

    Stack and code memory regions - at the moment after calling function fun1 from Listing 1-2

    3.

    After calling fun2 inside fun1 (see Figure 1-5c). The same pattern of creating a new activation frame is being repeated. This time it contains a memory region for argument n, return address A2+1, and z local variable.

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig5c_HTML.png

    Figure 1-5c

    Stack and code memory regions - at the moment after calling function fun2 from fun1

    An activation frame is also called more generally as stack frame , meaning any structured data saved on a stack for specific purposes.

    As we see, subsequent nested subroutines’ calls just repeat this pattern adding a single activation frame per each call. The more nested the subroutine calls, the more activation frames on the stack will be. This of course makes calling infinite nested calls impossible as it would require a memory for an infinite number of activation frames.² If you ever encountered StackOverflowException , this is the case. You have called so many nested subroutines that the memory limit for the stack has been hit.

    Bear in mind that mechanism presented here is merely exemplary and very general. Actual implementations may vary between architectures and operating systems. We will look closely how activation frames and stack is being used by .NET in the later chapters.

    When a subroutine ends, its activation frame is being discarded just by incrementing stack pointer with the size of the current activation farm, while saved return address is used to accordingly set PC to continue execution of the calling function. In other words, what was inside stack frame (local variables, parameters) is no longer needed so incrementing stack pointer is just enough to free memory used so far. Those data will be simply overwritten in next stack usage (see Figure 1-6).

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig6_HTML.png

    Figure 1-6

    Stack and code memory regions - after returning from function fun1 both activation frames are discarded

    Regarding implementation, both SP and PC are typically stored in the dedicated registers. At this point the size of the address itself, the observed memory areas and registers are not particularly important.

    A stack in modern computers is supported both by the hardware (by providing dedicated registers for stack pointers) and by the software (by operating system abstraction of thread and its part of the memory designated as a stack).

    It is worth noticing that one can imagine a lot of different stack implementations from the hardware architecture point of view. The stack can be stored on a dedicated memory block inside the CPU or on a dedicated chip. It can also reuse a general computer’s memory. The latter is exactly the case in most modern architectures, where a stack is just a fixed-size region of a process memory. There can even be implementations with multiple stacks architecture. In such an exemplary case, the stack for return addresses could be separated from the stack with data- parameters and local variables. This can be beneficial for performance reasons because it allows for simultaneous access to two separated stacks. It allows for additional tunings of CPU pipelining and other low-level mechanisms. Nevertheless, with the current personal computers, the stack is just a part of the main memory.

    FORTRAN can be seen as the very first broadly used high-level, general-purpose programming language. But since 1954, when it was defined, only static allocation was possible. All arrays had to have sizes defined during compile time and all allocations were stack based. ALGOL was another very important language that more or less directly inspired a myriad of other languages (like C/C++, Pascal, Basic, and through Simula and Smalltalk - all modern object-oriented languages like Python or Ruby). ALGOL 60 had only stack allocation - together with dynamic arrays (with a size specified by variable). Alan Perlis, a notable member of the team that created ALGOL, said:

    Algol 60 would have been impossible to adequately process in a reasonable way without the concept of stacks. Though we had stacks before, only in Algol 60 did stacks come to take a central place in the design of processors.

    While the family of ALGOL and FORTRAN languages was mainly used by the scientific society, there was another stream of development for business-oriented programming languages starting from A-0, FLOW-MATIC, through COMTRANS to more widely known COBOL (Common Business Language). All of them were lacking explicit memory management, operating mainly on primitive data types like numbers and strings.

    The Stack Machine

    Before we move on to other memory concepts, let’s stay for a while with a stack-related context - so-called stack machines. In contrast to the registry machine, in the stack machine all instructions are operating on the dedicated, expression stack (or evaluation stack). Please bear in mind that this stack does not have to be the same stack that we were talking about before. Hence, such a machine could have both an additional " expression stack " and a general-purpose stack. There can be no registers at all. In such a machine, by default, instructions are taking arguments from the top of the expression stack - as many as they require. The result is also stored on the top of the stack. In such cases, they are called pure stack machines, opposite to impure implementations when operations can access values not only from the top of the stack but also deeper.

    How exactly does operation on the expression stack looks? For example, hypothetical Multiply instruction (without any argument) will pop two values from the top of the evaluation stack, multiply them, and put back the result on the evaluation stack (see Figure 1-7).

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig7_HTML.png

    Figure 1-7

    Hypothetical Multiply instruction in stack machine - pops two elements from the stack and pushes the result of multiplying them

    Let’s back to the sample s=x+(2*y)+z expression from the register machine example and rewrite it in the stack machine manner (see Listing 1-3).

                        // empty stack

    Push 2              // [2] - single stack element of value 2

    Push y              // [2][y] - two stack elements of value 2 and y

    Multiply            // [2*y]

    Push x              // [2*y][x]

    Add                 // [2*y+x]

    Push z              // [2*y+x][z]

    Add                 // [2*y+x+z]

    Pop l               // [] (with side effect of writing a value under l)

    Listing 1-3

    Pseudo-code of the simple stack machine realizing s=x+(2*y)+z calculation. Comments show evaluation stack state.

    This concept leads to very clear and understandable code. Main advantages can be described as follows:

    There is no problem regarding how and where to store temporary values - whether they should be registers, stack, or main memory. Conceptually this is easier than trying to manage all those possible targets optimally. Thus, it simplifies implementation.

    Opcodes can be shorter in terms of required memory as there are many no-operand or single-operand instructions. This allows efficient binary encoding of the instructions and hence produces dense binary code. So even the number of instructions can be bigger than in the registry-based approach because of more load/store operations; this is still beneficial.

    This was an important advantage in the early times of computers when memory was very expensive and limited. This can be also beneficial today in case of downloadable code for smartphones or web applications. Dense binary encoding of instructions implies also better CPU cache usage.

    Despite its advantages, the stack machine concept was rarely implemented in the hardware itself. One notable exception was the Burroughs machines like B5000, which included hardware implementation of the stack. Nowadays there is probably no widely used machine that could be described as the stack machine. One notable exception is x87 floating-point unit (inside x86 compatible CPUs), which was designed as a stack machine, and because of backward compatibility it is still programmed as such even today.

    So why mention these kind of machines at all? Because such architecture is a great way of designing platform-independent virtual machines or execution engines. Sun’s Java Virtual Machine and .NET runtime are perfect examples of stack machines. They are executed underneath by well-known register machines of x86 or ARM architecture, but it doesn’t change the fact they realize stack machine logic. We will see this clearly when describing .NET’s Intermediate Language (IL) in Chapter 4. Why have .NET runtime and JVM (Java Virtual Machine) been designed that way? As always, there is some mix of engineering and historical reasons. Stack machine code is of higher level and abstracts away actual underlying hardware better. Microsoft’s runtime or Sun’s JVM could be written as registry machine, but then, how many registers would be necessary? As they are only virtual, the best answer is - an infinite number of registers. Then we need a way of handling and reusing them. What would an optimal, abstract registry-based machine look like?

    If we leave such problems away by letting something else (Java or .NET runtime, in this case) to make specific platform optimizations, it will translate either registry-based or stack-based mechanisms into specific registry-based architecture. But stack-based machines are conceptually simpler. Virtual stack machine (the one that is not executed by a real, hardware stack machine) can provide good platform independence while still producing high-performant code. Putting it together with the mentioned better code density makes a good choice for a platform to be run on a wide range of devices. That was probably the reason why Sun decided to choose that path when Java was invented for small devices like set-top boxes. Microsoft, while designing .NET, followed that path either. The stack machines concept is simply elegant, simple, and it just works. This makes implementing a virtual machine a nicer engineering task!

    On the other hand, registry-based virtual machines’ designs are much closer to the design of the real hardware they are running at. This is very helpful in terms of possible optimizations. Advocates of this approach say that much better performance can be achieved, especially in interpreted runtimes. The interpreter has much less time to proceed with any advanced optimizations so the more that the interpreted code is similar to the machine code, the better it is. Additionally, operating on the most frequently used set of registers provides a great cache locality of reference.³

    As always, when making a decision, you need to make some compromises. The dispute between advocates of both approaches is long and unresolved. Nevertheless, the fact is that currently the .NET execution engine is implemented as a stack machine, although it is not completely pure - we will notice this in Chapter 4. We will see also how the evaluation stack is being mapped to the underlying hardware consisting of registers and memory.

    Note

    Are all virtual machines and execution engines stack machines? Absolutely not! One notable exception is Dalvik, which was a virtual machine in Google’s Android until the 4.4 version, which was a registry-based JVM implementation. It was an interpreter of intermediate Dalvik bytecode. But then JIT (Just in Time compilation explained in Chapter 4) was introduced in Dalvik’s successor - Android Runtime (ART). Other examples include BEAM - a virtual machine for Erlang/Elixir, Chakra - JavaScript execution engine in IE9, Parrot (Perl 6 virtual machine) and Lua VM (Lua virtual machine). No one can therefore say that this kind of machine is not popular.

    The Pointer

    So far we have introduced only two memory concepts: static allocation and stack allocation (as a part of stack frame). The concept of a pointer is very general and could be spotted from the very beginning of the computing era - like previously shown concept of instruction pointer (program counter) or stack pointer. Specific registers dedicated to memory addressing like index registers can be also seen as pointers.

    PL/I was a language proposed by IBM in about 1965, intended to be a general proposition for both scientific and business worlds. Although its goal was not quite achieved, it is an important element of history because it was the first language that introduced the concept of pointers and memory allocation. In fact, Harold Lawson, involved in PL/I language development, was awarded by IEEE in 2000 for inventing the pointer variable and introducing this concept into PL/I, thus providing for the first time, the capability to flexibly treat linked lists in a general-purpose high level language. That was exactly the need behind the pointer invention - to perform list processing and operate on other more or less complex data structures. The pointer concept was then used during the development of the C language, which evolved from the language B (and predecessors or BCPL and CPL). Only as late as the FORTRAN 90 version, a successor of FORTRAN 77, defined in 1991, introduced dynamic memory allocation (via allocate/deallocate subroutines), POINTER attribute, pointer assignment, and the NULLIFY statement.

    Pointers are variables in which we store the address of the position in memory. Simply put, it allows us to reference other places in memory by its address. Pointer size is related to word length mentioned before, and it results from the architecture of the computer. Thus nowadays, we typically deal with 32- or 64 bit-wide pointers. As it is just some small region of memory, it can be placed on the stack (for example, as a local variable or function argument) or CPU register. Figure 1-8 shows a typical situation where one of the local variables (stored within function activation frame) is a pointer to another memory region with the address Addr.

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig8_HTML.png

    Figure 1-8

    Local variable of a function being a pointer ptr pointing to the memory under address Addr

    The simple idea of pointers allows us to build sophisticated data structures like linked lists or trees because data structures in memory can reference each other, creating more complex structures (see Figure 1-9).

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig9_HTML.png

    Figure 1-9

    Pointers used to build double-linked list structure when each element points its previous and next elements

    Moreover, pointers can provide so-called pointer arithmetic . They can be added or subtracted to the reference relative part of memory. For example, the increment operator increases the value of the pointer by the value of the size of the pointed object, not by single byte as one could expect.

    Pointers in high-level languages like Java or C# are often not available or must be explicitly enabled, and it makes such code unsafe. Why that is will be clearer when talking about manual memory management using pointers in the next subchapter.

    The Heap

    Eventually, we reach the most important concept in the context of the .NET memory management . The heap (less known also as the Free Store ) is an area of memory used for dynamically allocated objects. The free store is a better name because it does not suggest any internal structure but rather a purpose. In fact, one might rightly ask what is the relationship between the heap data structure and the heap itself. The truth is - there is none. While the stack is well organized (it is based on LIFO data structure concept), the heap is just more like a black box that can be asked for providing memory, no matter where it will come from. Hence the pool or mentioned free store would be probably a better name. The heap name was probably used from the beginning in a traditional English sense meaning messy place - especially the opposite of well-ordered, stack space. Historically ALGOL 68 introduced heap allocation but this standard was not widely adopted. But this is where this name probably come from. Fact is, the true historical origin of this name is now rather unclear.

    The heap is a memory mechanism able to provide a continuous block of memory with a specified size. This operation is called dynamic memory allocation because both the size and the actual location of the memory need not be known at compile time. Since the location of the memory is not known at compile time, dynamically allocated memory must be referenced by a pointer. Hence pointer and heap concepts are inherently related.

    An address returned by some allocate me X bytes of memory function should be obviously remembered in some pointer for future reference to a created memory block. It can be stored on a stack (see Figure 1-10), on the heap itself, or anywhere else.

                   PTR ptr = allocate(10);

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig10_HTML.png

    Figure 1-10

    Stack with pointer ptr and 10-bytes wide block on the heap

    The reverse operation of an allocation operation is called a deallocation, when the given block of memory is returned to the pool of memory for future use. How exactly heap is allocating a space with a given size is an implementation detail. There are many allocators possible, and we will learn about some of them soon.

    By allocating and deallocating many blocks, we may end up with a situation where there is not enough free space for a given object, although in total there is enough free space on heap. Such situation is called heap fragmentation and may lead to significant inefficiency in memory usage. Figure 1-11 illustrates such problem, when there is not enough free continuous space for object X. There are many different strategies used by allocators to manage space as optimally as possible to avoid fragmentation (or make good use of it).

    ../images/430794_1_En_1_Chapter/430794_1_En_1_Fig11_HTML.png

    Figure 1-11

    Fragmentation - after deleting objects B and D, there is no enough space for new object X although in total there is enough free space for it

    It is also worth noting that whether there is a single heap or multiple heap instances within a single process is yet another implementation detail (we will see it when discussing .NET more deeply).

    Let’s make a short summary of the stack and the heap differences in Table 1-1.

    Table 1-1

    Comparison of the Stack and the Heap Features

    Besides their differences, most commonly both the stack and heap are located at opposite ends of the process’s address space. We will return to a detailed stack and heap layout inside the process address space when considering low-level memory management in Chapter 2. Nevertheless, one should remember it is still just an implementation detail. By providing abstractions of value and reference types (which will be introduced in Chapter 4), we should not care where they are created.

    Now let’s now move forward to the discussion over manual versus automatic memory management. As Ellis and Stroustrup write in The Annotated C++ Reference Manual:

    C programmers think memory management is too important to be left to the computer. Lisp programmers think memory management is too important to be left to the user.

    Manual Memory Management

    Until now what we have been seeing was a manual memory management. What it means, in particular, is that a developer is responsible for explicitly allocating memory, and then when it is no longer needed, she should deallocate it. This is real manual work. It’s exactly like a manual gear in most European cars. I am from Europe and we are just used to manually changing the transition. We must think whether it is a good time to change it now, or we should wait a few seconds until the engine speed is high enough. This has one big advantage - we have complete, full control over the car. We are responsible whether an engine is used optimally or not. And as humans are still much more adaptive to changing conditions, good drivers can make it better than an automatic gear. Of course, there is one big disadvantage. Instead of thinking about our main goal - getting from place A to place B, we have to additionally think about changing gears - hundreds, thousands of times during a long trip. This is both time consuming and tiresome. I know some people will say that it is fun and giving control to the automatic gear is boring. I can even agree with them. But still, I quite like how this automotive metaphor relates the memory management.

    When we are talking about explicit memory allocation and deallocation, it is exactly like having a manual gear. Instead of thinking about our main goal, which is probably some kind of a business goal of our code, we must think also about how to manage memory of our program. This moves us back from the main goal and takes our valuable attention. Instead of thinking about algorithms, business logic, and domains, we are obliged to think also about when and how much memory I will need. For how long? And who will be responsible for freeing it? Does it sound like business logic? Of course not. The question whether it is good or is not another story.

    The well-known C language was designed by Dennis Ritchie somewhere around the early 1970s and had become one of the most widely used programming languages in the world. The history how C evolved from ALGOL through intermediate languages CPL, BCPL, and B is interesting on its own, but in our context, it is important that altogether with Pascal (being a direct ancestor of ALGOL), they were the two most popular languages with explicit memory management at the time. Regarding C, without a doubt, I can say that a compiler of it has been written for any hardware architecture ever created. I will not be surprised if alien spaceships had their own C compiler on board (probably implementing TCP/IP stack as an example of another widely used standard). The relevance of this language on other programming languages is huge and not to imagine. Let’s pause for a moment and take a deeper look into it in the context of memory management. This will allow us to list some of the characteristics of the manual memory management.

    Let’s look at simple example code written in C at Listing 1-4.

    #include

    void printReport(int* data)

    {

        printf(Report: %d\n, *data);

    }

    int main(void) {

        int *ptr;

        ptr = (int*)malloc(sizeof(int));

        if (ptr == 0)

        {

            printf(ERROR: Out of memory\n);

            return 1;

        }

        *ptr = 25;

        printReport(ptr);

        free(ptr);

        ptr = NULL;

        return 0;

    }

    Listing 1-4

    Sample C program showing manual memory management

    This is, of course, a little exaggerated example but thanks to it we can illustrate the problem clearly. We can notice that this simple code has in fact only one simple business goal: printing a report. For simplicity, this report consists only of a single integer, but you can image it is a more complex structure containing pointers to other data structures and so on. This simple business goal looks over-helmed by a lot of ceremony code taking care of nothing more than memory. This is a manual memory management in its essence.

    Summarizing the above piece of code, besides business writing logic, a developer must:

    allocate a proper amount of memory for the required data using malloc function.

    cast returned generic (void*) pointer to proper pointer type (int*) to indicate we are pointing to the numerical value (int type in case of C).

    remember the pointer to the allocated region of memory in local pointer variable ptr.

    check whether it succeeded in allocating such amount of memory (returned address will be 0 in case of failure).

    dereference the pointer (access memory under its address) to store some data (numerical value of 25).

    pass the pointer to other function printReport, that dereferences it for its own purpose.

    free allocated memory when it is no longer needed using free function.

    to be assured we should mark the pointer with a special NULL value (which is a way of telling this pointer points to nothing and in fact corresponds to value of 0⁷).

    As we see, there are a lot of things to be kept in mind by us when we must manage memory manually. Moreover, each of the above steps can be mistakenly used or forgotten, which can lead to bunch of serious problems. Going through each of those steps, let’s see what bad things can happen:

    We should know exactly how much memory we need. It is as simple as sizeof(int) in our example, but what if we dealt with much more complex, nested data structures? One can easily imagine a situation in which we allocate too little memory because of some minor error in manual calculations of the required size. Later, when we want to write or read from such a memory region, we will probably end up with Segmentation Faulterror - trying to access memory that has not been allocated by us or allocated for another purpose. On the other hand, by a similar mistake we can allocate way too much memory, which will lead us to memory inefficiency.

    Casting can be always error prone and can introduce really hard to diagnose bugs if we accidentally introduce a type mismatch. We would be trying to interpret a pointer of some type as it was a completely different type, which easily leads to danger access violations.

    Remembering the address is an easy thing. But what if we forget to do that? We will have a bunch of memory allocated and no way to free it - we’ve just forgotten its address! This is a direct path to the memory leak problem, as unfreeable memory can grow in time endlessly. Moreover, a pointer can be stored in something more complicated than a local variable. What if we forget a pointer to a complex graph of objects because we freed some structure containing it?

    A single check whether we were able to allocate the desired amount of memory is not cumbersome. But doing it a hundred times in each and every function for sure will be. We are probably going to decide to omit those checks, but this may lead us to undefined behavior in many points of our application, trying to access memory that was not successfully allocated in the first place.

    Dereferencing pointers is always dangerous. No one ever knows what is at the address pointed by them. Is there still a valid object, or maybe it has been freed already? Is this pointer valid in the first place? Does it point to the proper user-memory address space? Full control over a pointer in languages like C leads to such worries. Manual control over pointers leads to serious security concerns - it is only the programmer who must take care about not exposing data beyond regions that should be available according to the current memory and type model.

    Passing the pointer between functions and threads only multiplicates worries from the previous points in the multithreaded environment.

    We must remember to free the allocated memory. If we omit this step, we get memory leak. In an example as simple as the one above, it is of course really hard to forget about calling free function. But it is much more problematic in more sophisticated code bases, when ownership of data structures is not so obvious and where pointers to those structures are passed here and there. There is also yet another risk - no one can stop us from freeing memory

    Enjoying the preview?
    Page 1 of 1