Scientific and technological advances in the area of integrated circuits have allowed the performance of microprocessors to grow exponentially since the late 1960's. However, the imbalance between processor and memory bus capacity has increased in recent years. The increasing on-chip-parallelism of multi-core processors has turned the memory subsystem into a key factor for achieving high performance. When two or more processes share the memory subsystem their execution times typically increase, even at relatively low levels of memory traffic. Current research shows that a throughput increase of up to 40% is possible if the job-scheduler can minimizes the slowdown caused by memory contention in industrial multi-core systems such as high performance clusters, datacenters or clouds. In order to optimize the throughput the job-scheduler has to know how much slower the process will execute when co-scheduled on the same server as other processes. Consequently, unless the slowdown is known, or can be fairly well estimated, the scheduling becomes pure guesswork and the performance suffers. The central question addressed in this thesis is how the slowdown caused by memory traffic interference between processes executing on the same server can be predicted and to what extent. This thesis presents and evaluates a new slowdown prediction method which estimates how much longer a program will execute when co-scheduled on the same multi-core server as another program. The method measures how external memory traffic affects a program by generating different levels of synthetic memory traffic while observing the change in execution time. Based on the observations it makes a first order prediction of how much slowdown the program will experience when exposed to external memory traffic. Experimental results show that the method's predictions correlate well with the real measured slowdowns. Furthermore, it is shown that scheduling based on the new slowdown prediction method yields a higher throughput than three other techniques suggested for avoiding co-scheduling slowdowns caused by memory contention. Finally, a novel scheme is suggested to avoid some of the worst co-schedules, thus increasing the system throughput.
The disproportion between processor and memory bus capacities has increased constantly during the last decades. With the introduction of multi-core processors the memory bus capacity is divided between the simultaneously executing processes (cores). The memory bus capacity directly affects the number of applications that can be executed simultaneously at its full potential. Thus, against this backdrop it becomes important to estimate how the limitation of the memory bus effects the applications performance. Towards this end we introduce a method and a tool for experimental estimation of an applications memory requirement as well as the impact of sharing the memory bus has on the execution times. The tool enables black-box approximate profiling of an applications memory bus usage during execution. It executes entirely in user-space and does not require access to the application code, only the binary.
When two or more programs are co-scheduled on the same multicore computer they might experience a slowdown due to the limited off-chip memory bandwidth. According to our measurements, this slowdown does not depend on the total bandwidth use in a simple way. One thing we observe is that a higher memory bandwidth usage will not always lead to a larger slowdown. This means that relying on bandwidth usage as input to a job scheduler might cause non-optimal scheduling of processes on multicore nodes in clusters, clouds, and grids. To guide scheduling decisions, we instead propose a slowdown based characterization approach. Real slowdowns are complex to measure due to the exponential number of experiments needed. Thus, we present a novel method for estimating the slowdown programs will experience when co-scheduled on the same computer. We evaluate the method by comparing the predictions made with real slowdown data and the often used memory bandwidth based method. This study show that a scheduler relying on slowdown based categorization makes fewer incorrect co-scheduling choices and the negative impact on program execution times is less than when using a bandwidth based categorization method.
The ability to precisely predict how memory contention degrades performance when co-scheduling programs is critical for reaching high performance levels in cluster, grid and cloud environments. In this paper we present an overview and compare the performance of state-of-the-art characterization methods for memory aware (co-)scheduling. We evaluate the prediction accuracy and co-scheduling performance of four methods: one slowdown-based, two cache-contention based and one based on memory bandwidth usage. Both our regression analysis and scheduling simulations find that the slowdown based method, represented by Memgen, performs better than the other methods. The linear correlation coefficient (Formula presented.) of Memgen's prediction is 0.890. Memgen's preferred schedules reached 99.53 % of the obtainable performance on average. Also, the memory bandwidth usage method performed almost as well as the slowdown based method. Furthermore, while most prior work promote characterization based on cache miss rate we found it to be on par with random scheduling of programs and highly unreliable.