Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 10) Show all publications
de Blanche, A. & Lundqvist, T. (2018). Node Sharing for Increased Throughput and Shorter Runtimes: an Industrial Co-Scheduling Case Study. In: Trinitis, Carsten; Weidendorfer, Josef (Ed.), Proceedings of the 3rd Workshop on Co-Scheduling of HPC Applications (COSH 2018): Held together with HiPEAC 2018. Paper presented at HiPEAC Workshop on Co-Scheduling of HPC Applications (pp. 15-20).
Open this publication in new window or tab >>Node Sharing for Increased Throughput and Shorter Runtimes: an Industrial Co-Scheduling Case Study
2018 (English)In: Proceedings of the 3rd Workshop on Co-Scheduling of HPC Applications (COSH 2018): Held together with HiPEAC 2018 / [ed] Trinitis, Carsten; Weidendorfer, Josef, 2018, p. 15-20Conference paper, Published paper (Refereed)
Abstract [en]

The allocation of jobs to nodes and cores in industrial clusters is often based on queue-system standard settings, guesses or perceived fairness between different users and projects. Unfortunately, hard empirical data is often lacking and jobs are scheduled and co-scheduled for no apparent reason. In this case-study, we evaluate the performance impact of co-scheduling jobs using three types of applications and an existing 450+ node cluster at a company doing large-scale parallel industrial simulations. We measure the speedup when co-scheduling two applications together, sharing two nodes, compared to running the applications on separate nodes. Our results and analyses show that by enabling co-scheduling we improve performance in the order of 20% both in throughput and in execution times, and improve the execution times even more if the cluster is running with low utilization. We also find that a simple reconfiguration of the number of threads used in one of the applications can lead to a performance increase of 35-48% showing that there is a potentially large performance increase to gain by changing current practice in industry.

Keywords
Co-scheduling; Cluster; Engineering Simulations; MPI; FEM; Simulation; Scheduling; Multicore; Slowdown; Industrial HPC
National Category
Computer Systems
Research subject
ENGINEERING, Computer engineering
Identifiers
urn:nbn:se:hv:diva-12024 (URN)10.14459/2018md1428535 (DOI)
Conference
HiPEAC Workshop on Co-Scheduling of HPC Applications
Available from: 2018-01-26 Created: 2018-01-26 Last updated: 2019-01-04Bibliographically approved
de Blanche, A. & Lundqvist, T. (2017). Disallowing Same-program Co-schedules to Improve Efficiency in Quad-core Servers. In: Clauss, Carsten; Lankes, Stefan; Trinitis, Carsten; Weidendorfer, Josef (Ed.), Proceedings of the Joined Workshops COSH 2017 and VisorHPC 2017: . Paper presented at HIPEAC 2017, 2st COSH Workshop on Co-Scheduling of HPC Applications (pp. 1-7).
Open this publication in new window or tab >>Disallowing Same-program Co-schedules to Improve Efficiency in Quad-core Servers
2017 (English)In: Proceedings of the Joined Workshops COSH 2017 and VisorHPC 2017 / [ed] Clauss, Carsten; Lankes, Stefan; Trinitis, Carsten; Weidendorfer, Josef, 2017, p. 1-7Conference paper, Published paper (Refereed)
Abstract [en]

Programs running on different cores in a multicore server are often forced to share resources like off-chip memory,caches, I/O devices, etc. This resource sharing often leads to degraded performance, a slowdown, for the program sthat share the resources. A job scheduler can improve performance by co-scheduling programs that use different resources on the same server. The most common approachto solve this co-scheduling problem has been to make job schedulers resource aware, finding ways to characterize and quantify a program’s resource usage. We have earlier suggested a simple, program and resource agnostic, scheme as a stepping stone to solving this problem: Avoid Terrible Twins, i.e., avoid co-schedules that contain several instances from the same program. This scheme showed promising results when applied to dual-core servers. In this paper, we extend the analysis and evaluation to also cover quad-core servers. We present a probabilistic model and empirical data that show that execution slowdowns get worse as the number of instances of the same program increases. Our scheduling simulations show that if all co-schedules containing multiple instances of the same program are removed, the average slowdown is decreased from 54% to 46% and that the worst case slowdown is decreased from 173% to 108%.

Keywords
Co-scheduling; Same Process;Scheduling; Allocation; Multicore; Slowdown; Cluster; Cloud
National Category
Computer Systems
Research subject
ENGINEERING, Computer engineering
Identifiers
urn:nbn:se:hv:diva-10620 (URN)10.14459/2017md1344414 (DOI)978-3-00-055564-0 (ISBN)
Conference
HIPEAC 2017, 2st COSH Workshop on Co-Scheduling of HPC Applications
Available from: 2017-01-19 Created: 2017-01-19 Last updated: 2019-01-04Bibliographically approved
Andersson, H. R., de Blanche, A. & Lundqvist, T. (2017). Flipping the Data Center Network: Increasing East-West Capacity Using Existing Hardware. In: 2017 IEEE 42nd Conference on Local Computer Networks (LCN), 9-12 Oct. 2017: . Paper presented at 42nd IEEE Conference on Local Computer Networks, LCN 2017; Singapore; Singapore; 9 October 2017 through 12 October 2017 (pp. 211-214). IEEE, Article ID 8109355.
Open this publication in new window or tab >>Flipping the Data Center Network: Increasing East-West Capacity Using Existing Hardware
2017 (English)In: 2017 IEEE 42nd Conference on Local Computer Networks (LCN), 9-12 Oct. 2017, IEEE, 2017, p. 211-214, article id 8109355Conference paper, Published paper (Refereed)
Abstract [en]

In today's datacenters, there is an increasing demand for more network traffic capacity. The majority of the increase in traffic is internal to the datacenter, i.e., it flows between different servers within the datacenter. This category of traffic is often referred to as east-west traffic and traditional hierarchical architectures are not well equipped to handle this type of traffic. Instead, they are better suited for the north-southbound traffic between hosts and the Internet. One suggested solution for this capacity problem is to adopt a folded CLOS topology, also known as spine-leaf, which often relies on software defined network (SDN) controllers to manage traffic. This paper shows that it is possible to implement a spine-leaf network using commodity-ofthe-shelf switches and thus improve the east-west traffic capacity. This can be obtained using low complexity configuration and edgerouting for load balancing, eliminating the need for a centralized SDN controller.

Place, publisher, year, edition, pages
IEEE, 2017
Series
IEEE Conference on Local Computer Networks (LCN), E-ISSN 0742-1303
Keywords
Commodity, Datacenter, Clos, Spine-Leaf, East-West Traffic, Network, Core-Distribution-Access, Edge routing, SDN
National Category
Computer Systems
Research subject
ENGINEERING, Computer engineering
Identifiers
urn:nbn:se:hv:diva-12005 (URN)10.1109/LCN.2017.92 (DOI)2-s2.0-85040631515 (Scopus ID)978-1-5090-6523-3 (ISBN)978-1-5090-6522-6 (ISBN)
Conference
42nd IEEE Conference on Local Computer Networks, LCN 2017; Singapore; Singapore; 9 October 2017 through 12 October 2017
Available from: 2018-01-29 Created: 2018-01-29 Last updated: 2019-01-04Bibliographically approved
Lundmark, E., Persson, C., de Blanche, A. & Lundqvist, T. (2017). Increasing Throughput of Multiprogram HPC Workloads: Evaluating a SMT Co-Scheduling Approach. In: : . Paper presented at SC 2017: The International Conference for High Performance Computing, Storage and Analysis (Supercomputing) November 12-17, 2017. , Article ID P44.
Open this publication in new window or tab >>Increasing Throughput of Multiprogram HPC Workloads: Evaluating a SMT Co-Scheduling Approach
2017 (English)Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Simultaneous Multithreading (SMT) is a technique that allows formore efficient processor utilization by scheduling multiple threadson a single physical core. Previous research have shown an averagethroughput increase of around 20% with an SMT level of two, e.g.two threads per core. However, a bad combination of threads canactually result in decreased performance. To be conservative, manyHPC-systems have SMT disabled, thus, limiting the number ofscheduling slots in the system to one per core. However, for SMT tonot hurt performance, we need to determine which threads shouldshare a core. In this poster, we use 30 random SPEC CPU job mixedon a twelve-core Broadwell based node, to study the impact ofenabling SMT using two different co-scheduling strategies. Theresults show that SMT can increase performance especially whenusing no-same-program co-scheduling.

Keywords
co-scheduling, SMT, high performance computing, scheduling, hyperthreading, terrible twins
National Category
Computer Engineering
Research subject
ENGINEERING, Computer engineering
Identifiers
urn:nbn:se:hv:diva-11937 (URN)
Conference
SC 2017: The International Conference for High Performance Computing, Storage and Analysis (Supercomputing) November 12-17, 2017
Available from: 2017-12-19 Created: 2017-12-19 Last updated: 2019-01-04Bibliographically approved
de Blanche, A. & Lundqvist, T. (2017). Initial Formulation of Why Disallowing Same Program Co-schedules Improves Performance (1ed.). In: Carsten Trinitis, Josef Weidendorfer (Ed.), Co-Scheduling of HPC Applications: (pp. 95-113). Netherlands: IOS Press
Open this publication in new window or tab >>Initial Formulation of Why Disallowing Same Program Co-schedules Improves Performance
2017 (English)In: Co-Scheduling of HPC Applications / [ed] Carsten Trinitis, Josef Weidendorfer, Netherlands: IOS Press, 2017, 1, p. 95-113Chapter in book (Refereed)
Abstract [en]

Co-scheduling processes on different cores in the same server might leadto excessive slowdowns if they use the same shared resource, like a memory bus. Ifpossible, processes with a high shared resource use should be allocated to differentserver nodes to avoid contention, thus avoiding slowdown. This article proposesthe more general principle that twins, i.e. several instances of the same program,should be allocated to different server nodes. The rational for this is that instancesof the same program use the same resources and they are more likely to be eitherlow or high resource users. High resource users should obviously not be combined,but a bit non-intuitively, it is also shown that low resource users should also notbe combined in order to not miss out on better scheduling opportunities. This isverified using both a probabilistic argument as well as experimentally using tenprograms from the NAS parallel benchmark suite running on two different systems.By using the simple rule of forbidding these terrible twins, the average slowdownis shown to decrease from 6.6% down to 5.9% for System A and from 9.5% to8.3% for System B. Furthermore, the worst case slowdown is lowered from 12.7%to 9.0% and 19.5% to 13% for systems A and B, respectively. Thus, indicating aconsiderable improvement despite the rule being program agnostic and having noinformation about any program’s resource usage or slowdown behavior.

Place, publisher, year, edition, pages
Netherlands: IOS Press, 2017 Edition: 1
Series
Advances in parallel Computing, ISSN 0927-5452, E-ISSN 1879-808X ; 28
Keywords
Co-scheduling; Scheduling; Allocation; Multicore; Slowdown; Cluster; Cloud
National Category
Computer Systems
Research subject
ENGINEERING, Computer engineering
Identifiers
urn:nbn:se:hv:diva-10619 (URN)10.3233/978-1-61499-730-6-95 (DOI)978-1-61499-729-0 (ISBN)978-1-61499-730-6 (ISBN)
Available from: 2017-01-19 Created: 2017-01-19 Last updated: 2019-01-04Bibliographically approved
Lundqvist, T., de Blanche, A. & Andersson, H. R. (2017). Thing-to-thing electricity micro payments using blockchain technology. In: Global Internet of Things Summit (GIoTS), 2017: Proceedings of a meeting held 6-9 June 2017, Geneva, Switzerland. Paper presented at 2017 Global Internet of Things Summit, GIoTS 2017; International Conference Centre in Geneva (CICG)Geneva; Switzerland; 6 June 2017 through 9 June 2017. Institute of Electrical and Electronics Engineers (IEEE), Article ID 8016254.
Open this publication in new window or tab >>Thing-to-thing electricity micro payments using blockchain technology
2017 (English)In: Global Internet of Things Summit (GIoTS), 2017: Proceedings of a meeting held 6-9 June 2017, Geneva, Switzerland, Institute of Electrical and Electronics Engineers (IEEE), 2017, article id 8016254Conference paper, Published paper (Refereed)
Abstract [en]

Thing-to-thing payments are a key enabler in the Internet of Things (IoT) era, to ubiquitously allow for devices to pay each other for services without any human interaction. Traditional credit card-based systems are not able to handle this new paradigm, however blockchain technology is a promising payment candidate in this context. The prominent example of blockchain technology is Bitcoin, with its decentralized structure and ease of account creation. This paper presents a proof-of-concept implementation of a smart cable that connects to a smart socket and without any human interaction pays for electricity. In this paper, we identify several obstacles for the widespread use of bitcoins in thing-to-thing payments. A critical problem is the high transaction fees in the Bitcoin network when doing micro transactions. To reduce this impact, we present a single-fee micro-payment protocol that aggregates multiple smaller payments incrementally into one larger transaction needing only one transaction fee. The proof-of concept shows that trustless, autonomous, and ubiquitous thing-to-thing micro-payments is no longer a future technology.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
Keywords
IoT, Internet of things, Bitcoin, Digital payments, Crypto currency, Smart grid
National Category
Computer Systems
Research subject
ENGINEERING, Computer engineering
Identifiers
urn:nbn:se:hv:diva-11402 (URN)10.1109/GIOTS.2017.8016254 (DOI)2-s2.0-85029291023 (Scopus ID)978-1-5090-5873-0 (ISBN)
Conference
2017 Global Internet of Things Summit, GIoTS 2017; International Conference Centre in Geneva (CICG)Geneva; Switzerland; 6 June 2017 through 9 June 2017
Available from: 2017-08-26 Created: 2017-08-26 Last updated: 2019-01-04Bibliographically approved
de Blanche, A. & Lundqvist, T. (2016). Terrible Twins: A Simple Scheme to Avoid Bad Co-Schedule. In: Trinitis, Carsten ; Weidendorfer, Josef (Ed.), Proceedings of the 1st COSH Workshop on Co-Scheduling of HPC Applications: . Paper presented at COSH Workshop on Co-Scheduling of HPC Applications HIPEAC 2016 (pp. 1-6). Munchen, 1
Open this publication in new window or tab >>Terrible Twins: A Simple Scheme to Avoid Bad Co-Schedule
2016 (English)In: Proceedings of the 1st COSH Workshop on Co-Scheduling of HPC Applications / [ed] Trinitis, Carsten ; Weidendorfer, Josef, Munchen, 2016, Vol. 1, p. 1-6Conference paper, Published paper (Refereed)
Abstract [en]

Co-scheduling processes on different cores in the same server might lead to excessive slowdowns if they use a shared resource,like the memory bus. If possible, processes with a high shared resource use should be allocated to different server nodes to avoid contention, thus avoiding slowdown.This paper introduces the simple scheme of avoiding to coschedule twins, i.e., several instances of the same program.The rational for this is that instances of the same program use the same resources and they are more likely to be either low or high resource users − high resource users should obviously not be combined, but a bit non-intuitively, it is also shown that low resource users should also not be combined in order to not miss out on better scheduling opportunities.This is verified using both a statistical argument as well as experimentally using ten programs from the NAS parallel benchmark suite. By using the simple rule of forbidding twins, the average slowdown is shown to decrease from 6.6% down to 5.9%, and the worst case slowdown is lowered from 12.7% to 9.0%, indicating a considerable improvement despite having no information about any programs' resource usage or slowdown behavior.

Place, publisher, year, edition, pages
Munchen: , 2016
Keywords
Co-scheduling; Scheduling; Allocation; Multicore; Slowdown; Cluster; Cloud
National Category
Computer Engineering
Research subject
ENGINEERING, Computer engineering
Identifiers
urn:nbn:se:hv:diva-9072 (URN)10.14459/2016md1286952 (DOI)
Conference
COSH Workshop on Co-Scheduling of HPC Applications HIPEAC 2016
Available from: 2016-02-12 Created: 2016-02-12 Last updated: 2019-01-04Bibliographically approved
de Blanche, A. & Lundqvist, T. (2015). Addressing characterization methods for memory contention aware co-scheduling. Journal of Supercomputing, 71(4), 1451-1483
Open this publication in new window or tab >>Addressing characterization methods for memory contention aware co-scheduling
2015 (English)In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 71, no 4, p. 1451-1483Article in journal (Refereed) Published
Abstract [en]

The ability to precisely predict how memory contention degrades performance when co-scheduling programs is critical for reaching high performance levels in cluster, grid and cloud environments. In this paper we present an overview and compare the performance of state-of-the-art characterization methods for memory aware (co-)scheduling. We evaluate the prediction accuracy and co-scheduling performance of four methods: one slowdown-based, two cache-contention based and one based on memory bandwidth usage. Both our regression analysis and scheduling simulations find that the slowdown based method, represented by Memgen, performs better than the other methods. The linear correlation coefficient (Formula presented.) of Memgen's prediction is 0.890. Memgen's preferred schedules reached 99.53 % of the obtainable performance on average. Also, the memory bandwidth usage method performed almost as well as the slowdown based method. Furthermore, while most prior work promote characterization based on cache miss rate we found it to be on par with random scheduling of programs and highly unreliable.

Keywords
Memory contention, Memory subsystem, Performance measurements, Co-scheduling, Slowdown based scheduling
National Category
Computer Sciences
Research subject
ENGINEERING, Computer engineering
Identifiers
urn:nbn:se:hv:diva-7664 (URN)10.1007/s11227-014-1374-8 (DOI)2-s2.0-84939948746 (Scopus ID)
Available from: 2015-06-02 Created: 2015-06-02 Last updated: 2019-01-04Bibliographically approved
de Blanche, A. & Lundqvist, T. (2014). A methodology for estimating co-scheduling slowdowns due to memory bus contention on multicore nodes. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014: . Paper presented at 12th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014; Innsbruck; Austria; 17 February 2014 through 19 February 2014; Code 104419 (pp. 216-223). ACTA Press
Open this publication in new window or tab >>A methodology for estimating co-scheduling slowdowns due to memory bus contention on multicore nodes
2014 (English)In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014, ACTA Press, 2014, p. 216-223Conference paper, Published paper (Refereed)
Abstract [en]

When two or more programs are co-scheduled on the same multicore computer they might experience a slowdown due to the limited off-chip memory bandwidth. According to our measurements, this slowdown does not depend on the total bandwidth use in a simple way. One thing we observe is that a higher memory bandwidth usage will not always lead to a larger slowdown. This means that relying on bandwidth usage as input to a job scheduler might cause non-optimal scheduling of processes on multicore nodes in clusters, clouds, and grids. To guide scheduling decisions, we instead propose a slowdown based characterization approach. Real slowdowns are complex to measure due to the exponential number of experiments needed. Thus, we present a novel method for estimating the slowdown programs will experience when co-scheduled on the same computer. We evaluate the method by comparing the predictions made with real slowdown data and the often used memory bandwidth based method. This study show that a scheduler relying on slowdown based categorization makes fewer incorrect co-scheduling choices and the negative impact on program execution times is less than when using a bandwidth based categorization method.

Place, publisher, year, edition, pages
ACTA Press, 2014
Keywords
Cluster, cloud, multicore, memory bandwidth, co-scheduling, slowdown
National Category
Computer Sciences
Research subject
ENGINEERING, Computer engineering
Identifiers
urn:nbn:se:hv:diva-6195 (URN)10.2316/P.2014.811-027 (DOI)2-s2.0-84898422321 (Scopus ID)978-0-88986-967-7 (ISBN)978-0-88986-965-3 (ISBN)
Conference
12th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014; Innsbruck; Austria; 17 February 2014 through 19 February 2014; Code 104419
Available from: 2014-05-07 Created: 2014-04-30 Last updated: 2019-01-04Bibliographically approved
Amoson, J. & Lundqvist, T. (2012). A light-weigh non-hierarchical file system navigation extension. In: Eric Jul (Ed.), Proceedings of the 7th International Workshop on Plan 9: . Paper presented at Seventh International Workshop on Plan 9, IWP9. November 14th – 16th, 2012 at Bell Labs Ireland (pp. 11-13). Dublin, Ireland
Open this publication in new window or tab >>A light-weigh non-hierarchical file system navigation extension
2012 (English)In: Proceedings of the 7th International Workshop on Plan 9 / [ed] Eric Jul, Dublin, Ireland, 2012, p. 11-13Conference paper, Oral presentation only (Other academic)
Abstract [en]

Drawbacks in organising and finding files in hierarchies have led researchers to explorenon-hierarchical and search-based filesystems, where file identity and belonging is pred-icated by tagging files to categories. We have implemented a chdir() shell extension en-abling navigation to a directory using a search expression. Our extension is light-weightand avoids modifying the file system to guarantee backwards compatibility for applicationsrelying on normal hierarchical file namespaces.

Place, publisher, year, edition, pages
Dublin, Ireland: , 2012
Keywords
non-hierarchical, file system, applications
National Category
Computer Sciences
Research subject
ENGINEERING, Computer engineering
Identifiers
urn:nbn:se:hv:diva-4835 (URN)
Conference
Seventh International Workshop on Plan 9, IWP9. November 14th – 16th, 2012 at Bell Labs Ireland
Available from: 2012-12-06 Created: 2012-11-27 Last updated: 2018-08-09Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-0589-8086

Search in DiVA

Show all publications