SRG Seminar: “Large-Scale Hierarchical k-means for Heterogeneous Many-Core Supercomputers” by Teng Yu

We present a novel design and implementation of k-means clustering algorithm targeting supercomputers with heterogeneous many-core processors. This work introduces a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level ($nkd$) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer.
Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a large-scale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.
This work is to be presented in the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18).

Event details

  • When: 1st November 2018 13:00 - 14:00
  • Where: Cole 1.33b
  • Series: Systems Seminars Series
  • Format: Seminar, Talk

SRG Seminar: “Efficient Cross-architecture Hardware Virtualisation” by Tom Spink

Virtualisation is a powerful tool used for the isolation, partitioning, and sharing of physical computing resources. Employed heavily in data centres, becoming increasingly popular in industrial settings, and used by home-users for running alternative operating systems, hardware virtualisation has seen a lot of attention from hardware and software developers over the last ten?fifteen years.

From the hardware side, this takes the form of so-called hardware assisted virtualisation, and appears in technologies such as Intel-VT, AMD-V and ARM Virtualization Extensions. However, most forms of hardware virtualisation are typically same-architecture virtualisation, where virtual versions of the host physical machine are created, providing very fast isolated instances of the physical machine, in which entire operating systems can be booted. But, there is a distinct lack of hardware support for cross-architecture virtualisation, where the guest machine architecture is different to the host.

I will talk about my research in this area, and describe the cross-architecture virtualisation hypervisor Captive that can boot unmodified guest operating systems, compiled for one architecture in the virtual machine of another.

I will talk about the challenges of full system simulation (such as memory, instruction, and device emulation), our approaches to this, and how we can efficiently map guest behaviour to host behaviour.

Finally, I will discuss our plans for open-sourcing the hypervisor, the work we are currently doing and what future work we have planned.

Event details

  • When: 11th October 2018 13:00 - 14:00
  • Where: Cole 1.33b
  • Series: Systems Seminars Series
  • Format: Seminar, Talk

The OpenMP and MPI refactoring with ParaFormance – Turkey Alsalkini

Abstract:

The increasing complexity of codes with the growing number of cores that should be utilised make such codes hard to optimise and maintain. In this talk, we present the OpenMP and MPI refactoring implemented in the ParaFormance tool. This tool transforms the sequential code into parallel code able to run on shared memory machines. Further refactoring is implemented to adapt the source code to exploit a larger number of processors on large HPC clusters with message passing support. In addition, the resulting MPI code can be used by developers as a starting point for further optimisation. Both refactorings are preceded by an advanced safety checking which reports concurrency problems and gives hints and suggestions on how to fix them.

Event details

  • When: 17th May 2018 12:00 - 13:00
  • Where: Cole 1.33a
  • Format: Talk

A Type-System for describing System-on-a-Chip Architectures – Jan De Muijnck-Hughes

Title:
A Type-System for describing System-on-a-Chip Architectures

Abstract:
The protocols that describe the interactions between IP Cores on System-on-a-Chip (SoC) architectures are well-documented. These protocols described not only the structural properties of the physical interfaces but also the behaviour of the emanating signals. However, there is a disconnect between the design of SoC architectures, their formal description, and the verification of their implementation in known hardware description languages.

Within the Border Patrol project we are investigating how to capture and reason about the structural and behavioural properties of SoC architectures using state-of-the-art advances in programming language research. Namely, we are investigating using dependent types and session types to capture and reason about hardware communication.

In this talk I will discuss my work in designing a dependent type- system and corresponding language that captures and reasons about the topological structure of a System-on-a-Chip. This language provides correct-by-construction guarantees over:

  • the physical structure of an interaction protocol;
  • the adherence of a component’s interface to a given protocol; and
  • the validity of the specified connections made between components.

We provide these guarantees through the (ab)use of dependent types as presented in Idris; and abuse of indexed monads to reason about resource usage.

Given time I will give an account of how this language enables reasoning about SoC behaviour when considered in conjunction with Session Types.ssion Types.

Event details

  • When: 5th April 2018 12:00 - 13:00
  • Where: Cole 1.33a
  • Format: Talk

Impact Talk: Xelect Ltd

This is Impact talk as a “brown bag lunch” (i.e. you bring your lunch if you wish) and the school will provide cakes.

Xelect Ltd

– a successful spin-out from the University of St Andrews

Xelect was founded 5 years ago by Professor Ian Johnston and Dr Tom Ashton to provide genetic services to the global aquaculture industry. The company built on several decades of research in fish physiology and genetics which was funded by BBSRC through to the stage of commercialisation. Xelect develops genetic selection technology and provides associated laboratory support to breeders of finfish, shellfish and shrimps and to date has served 52 customers in 17 countries. We currently manage broodstock genetics programs for producers in Chile, Scotland (Atlantic salmon), Croatia, Greece (seabass and sea bream), New Zealand (King salmon) and Vietnam (Barramundi). The company started out in incubation space at the Scottish Oceans Institute before moving to independent premises in 2016. Xelect also has sales offices in Puerto Montt, Chile and in Hong Kong. The company employs 12 people, mostly PhDs, and is account managed by Scottish Enterprise. Xelect’s shareholders are the founders, the University, SalmoBreed A/S and the EOS Technology Investment Syndicate.

About Ian Johnston

Ian Johnston is former Chandos Professor, Head of the School of Biology and Director of the Scottish Oceans Institute. He currently works full-time for Xelect but retains a 0 FTE position as Professor of Biology.

Event details

  • When: 8th February 2018 13:00 - 14:00
  • Where: Cole 1.33a
  • Format: Seminar, Talk

Simplifying ARM concurrency – Prof. Susmit Sarkar

ARM has a relaxed memory model, previously specified in informal prose for ARMv7 and ARMv8. Over time, and partly due to work building formal semantics for ARM concurrency, it has become clear that some of the complexity of the model is not justified by the potential benefits. In
particular, the model was originally non-multicopy-atomic: writes could
become visible to some other threads before becoming visible to all —
but this has not been exploited in production implementations, the
corresponding potential hardware optimisations are thought to have
insufficient benefits in the ARM context, and it gives rise to subtle
complications when combined with other ARMv8 features. The ARMv8
architecture has therefore been revised: it now has a multicopy-atomic
model. It has also been simplified in other respects, including more
straightforward notions of dependency, and the architecture now includes
a formal concurrency model.

This is work presented in POPL this year. I will also present some of
the background and context on relaxed memory which is absent from the
necessarily compressed talk format of POPL.

Event details

  • When: 22nd February 2018 12:00 - 13:00
  • Where: Cole 1.33a
  • Format: Talk

Pireh Pirzada: Sensors in Smart Homes for independent living of elderly people

Title: Sensors in Smart Homes for independent living of elderly people

Abstract: In the UK alone approximately about 3.64 million people aged 65 or above live on their own, and this number is rising. This increases concern of the safety and wellbeing of an ageing population, as growing old often results in reduced capabilities for individuals to perform activities of daily living (ADL), which will soon have a compounded effect on wider societal health care and the economy. Recent technological advances allow logging, monitoring and tracking behaviour can be put to the service of reducing this burden, allowing elderly people to remain in their own homes safely and independently. In this talk, I will describe my MSc dissertation work on a system designed for recording ADL, monitoring, classifying, predicting and alerting concerned people if anything out of regular pattern or life threatening happening occurs using unobtrusive sensors so that their quality of life is not impaired. This development has highlighted a number of areas for extension and improvement which can be further explored in the context of my doctoral research, which I will also outline within this talk.

This dissertation work was completed at the University of Southampton in September 2017 under the supervision of Neil White and Adriana Wilde.

Event details

  • When: 12th December 2017 14:00 - 15:00
  • Where: Cole 1.33b
  • Format: Seminar, Talk

Dr. Vladimir Janjic – Efficient Dynamic Mapping of Parallel Applications to NUMA Architectures by Reinforcement Learning

Title: Efficient Dynamic Mapping of Parallel Applications to NUMA Architectures by Reinforcement Learning

 

Abstract: We present a dynamic framework for mapping threads and data of parallel applications to computational cores/memory nodes of parallel non-uniform memory architecture (NUMA) systems. We use a feedback-based mechanism where the performance of each thread is collected and used to drive the reinforcement-learning policy of assigning affinities of threads/data to CPU cores/memory nodes. The proposed framework can address different optimisation criteria, such as maximum processing speed and minimum speed variance. We demonstrate that we can achieve an improvement of 12% in execution time compared to the default Linux operating system scheduling/mapping of threads under varying availability of resources (e.g. when multiple applications are running on the same system).

Event details

  • When: 7th December 2017 12:00 - 12:00
  • Where: Honey 103 - GFB
  • Format: Talk

Bidirectional-Curious? – Dr. Conor McBride

Type systems are often presented in a declarative style, but with an emphasis on ensuring that there is some sort of type synthesis algorithm. Since Pierce and Turner’s “Local Type Inference” system, however, there has been a small but growing alternative: bidirectional typing, where types are synthesized for variables and elimination forms, but must always be proposed in advance for introduction forms and checked. You can still get away without any type annotations, as long as you write only normal forms. But where’s the fun in that? If you want to write terms that compute, you need to write type annotations at exactly the point where an introduction form collides with its elimination form, showing exactly the type at which computation is happening.

For type systems with some sorts of value dependency, the bidirectional approach seriously cuts down on the amount of annotation required in terms, needed only to achieve type synthesis. We have a real opportunity to reduce clutter and also to give a clearer account of the connections between types and computation.

But it doesn’t stop there. A disciplined approach to the construction of bidirectional type systems makes it easier to get their metatheory right. I’ll show this by reconstructing Martin-Löf’s 1971 type theory (the inconsistent one) in a bidirectional style and show why it has type
preservation, even without normalization.

Event details

  • When: 22nd November 2017 13:00 - 14:30
  • Where: Cole 1.33b
  • Format: Talk