An Academic’s Observations from a Sabbatical at Google

Professor Adam Barker is featured in this month’s Communications of the ACM Magazine (CACM) discussing his recent Visiting Faculty appointment at Google. The Viewpoints article summarises his experiences working in software engineering on the Borgmaster team, and some of the core lessons which can be brought back to academia.

Borg is Google’s cluster management framework, which runs hundreds of thousands of jobs, across a number of clusters each with up to tens of thousands of machines.

SRG Seminar: “Application of Bayesian Nonparametric in household human activity recognition” by Lei Fang

Event details

  • When: 12th April 2018 13:00 - 14:00
  • Where: Cole 1.33b
  • Series: Systems Seminars Series
  • Format: Seminar

Abstract

In this talk, I will talk about the possibility of using Bayesian nonparametric clustering, or Dirichlet Process Mixture model to solve human activity recognition problem. In particular, I will discuss how the technique can be useful when the activity labels are not annotated and/or the activity evolves over the time. This initial study is built on an existing work on using directional statistical models (von Mises-Fisher) distribution, called Hierarchical Mixture of Conditional Independent von Mises Fisher distribution (HMCIvMFs), for unknown events detection and learning. Markov chain Monte Carlo sampling based learning algorithm will be presented together with some initial experiment results.

SRG Seminar: “Introduction to Apache Mesos and the DataCenter Operating System” by Matt Jarvis

Event details

  • When: 24th April 2018 13:00 - 14:00
  • Where: Cole 1.33b
  • Series: Systems Seminars Series
  • Format: Seminar
Abstract
Data processing paradigms are undergoing a paradigm shift as we move more and more towards real time processing. Emerging software models such as the SMACK stack are at the forefront of this change, focused on a pipeline processing model, but are also introducing new levels of operational complexity in running multiple complex distributed systems such as Spark, Kafka and Cassandra. In this talk, I’ll introduce both Apache Mesos and DC/OS as a solution to this growing problem, and describe the benefits are of running these new kinds of systems for emerging cloud native workloads.
 
Bio
Matt Jarvis is Senior Director of Community and Evangelism at Mesosphere, engaging with the communities around DC/OS and Mesos. Matt has spent more than 15 years building products and services around open source software, on everything from embedded devices to large scale distributed systems. Most recently he has been focused on the open cloud infrastructure space, and in emerging patterns for cloud native applications. 

SRG Seminar: “On Engineering Unikernels” by Ward Jaradat

Event details

  • When: 15th March 2018 13:00 - 14:00
  • Where: Cole 1.33b
  • Series: Systems Seminars Series
  • Format: Seminar

We have explored data coordination techniques that permit distributed systems to be constructed by interconnecting services. In such systems the network latency is often a problem. For example, large data volumes might have to be transmitted across the network if computation cannot be co-located close to data sources. One solution to this problem is the ability to deploy services in appropriate geographical locations and compose them together to create distributed ecosystems. Hence we seek to be able to deploy such services rapidly and dynamically enact and orchestrate them. However, this goal is hindered by the size of the deployments. Currently, virtual machine appliances that host such services on top of monolithic kernels are very large, thus are potentially slow to deploy as they may need to be transmitted across a network.

Our principles led us to take the route of re-engineering the standard software stack to create self-contained applications that are less-bloated and consequently much smaller based on Unikernels. Unikernels are compact library operating systems that enable a single application to be statically linked against a simple kernel that manages the underlying resources presented by a hypervisor. In this talk I will present Stardust – a specialised Unikernel that aims to support the deployment of application services based on the Java programming language.

DLS: Functional Foundations for Operating Systems

Event details

  • When: 13th February 2018 09:30 - 15:15
  • Where: Byre Theatre
  • Series: Distinguished Lectures Series, Systems Seminars Series
  • Format: Distinguished lecture

Biography: Dr. Anil Madhavapeddy is a University Lecturer at the Cambridge Computer Laboratory, and a Fellow of Pembroke College where he is Director of Studies for Computer Science. He has worked in industry (NetApp, Citrix, Intel), academia (Cambridge, Imperial, UCLA) and startups (XenSource, Unikernel Systems, Docker) over the past two decades. At Cambridge, he directs the OCaml Labs research group which delves into the intersection of functional programming and systems, and is a maintainer on many open source projects such as OpenBSD, OCaml, Xen and Docker.

Timetable
9:30: Introduction by Professor Saleem Bhatti
9:35: Lecture 1
10:35: Break with tea and coffee
11:15: Lecture 2
12:15: Lunch (not provided)
14:00: Lecture 3
15:00: Close by Professor Simon Dobson

Lecture 1: Rebuilding Operating Systems with Functional Principles
The software stacks that we deploy across computing devices in the world are based on shaky foundations. Millions of lines of C code crammed into monolithic operating system kernels, mixed with layers of scheduling logic, wrapped in a hypervisor, and served with a dose of nominal security checking on the side. In this talk, I will describe an alternative approach to constructing reliable, specialised systems with a familiar developer experience. We will use modular functional programming to build several services such as a secure web server that have no reliance on conventional operating systems, and explain how to express their logic in a high level, functional fashion. By the end of it, everyone in the audience should be able to build their own so-called unikernels!

Lecture 2: The First Billion Real Deployments of Unikernels
Unikernels offer a path to a more sane basis for driving applications on hardware, but will they ever be adopted for real? For the past fifteen years, an intrepid group of adventurers have been developing the MirageOS application stack in the OCaml programming language. Along the way, it has been deployed in many unusual industrial situations that I will describe in this talk, starting with the Docker container stack, then moving onto the Xen hypervisor that drives billions of servers worldwide. I will explain the challenges of using functional programming in industry, but also the rewards of seeing successful deployments quietly working in mission-critical areas of systems software.

Lecture 3: Programming the Next Trillion Embedded Devices
The unikernel approach of compiling highly specialised applications from high-level source code is perfectly suited to programming the trillions of embedded devices that are making their way around the world. However, this raises new challenges from a programming language perspective: how can we run on a spectrum of devices from the very tiny (with just kilobytes of RAM) to specialised hardware? I will describe the new frontier of functional metaprogramming (programs which generate more programs) that we are using to compile a single application to many heterogenous devices, and a Git-like model to coordinate across thousands of nodes. I will conclude with by motivating the need for a next-generation operating system to power new exciting applications such as augmented and virtual reality in our situated environments, and remove the need for constant centralised coordination via the Internet.

“Sensing and topology: some ideas by other people, and an early experiment” by Simon Dobson

Event details

  • When: 30th November 2017 13:00 - 14:00
  • Where: Cole 1.33a
  • Series: Systems Seminars Series
  • Format: Seminar

Abstract
The core problem in many sensing applications is that we’re trying to
infer high-resolution information from low-resolution observations —
and keep our trust in this information as the sensors degrade. How can
we do this in a principled way? There’s an emerging body of work on
using topology to manage both sensing and analytics, and in this talk I
try to get a handle on how this might work for some of the problems
we’re interested in. I will present an experiment we did to explore
these ideas, which highlights some fascinating problems.

SRG Seminar: “Interactional Justice vs. The Paradox of Self-Amendment and the Iron Law of Oligarchy” by Jeremy Pitt

Event details

  • When: 15th November 2017 13:00 - 14:00
  • Where: Cole 1.33a
  • Series: Systems Seminars Series
  • Format: Seminar

Self-organisation and self-governance offer an effective approach to resolving collective action problems in multi-agent systems, such as fair and sustainable resource allocation. Nevertheless, self-governing systems which allow unrestricted and unsupervised self-modification expose themselves to several risks, including the Suber’s paradox of self-amendment (rules specify their own amendment) and Michel’s iron law of oligarchy (that the system will inevitably be taken over by a small clique and be run for its own benefit, rather than in the collective interest). This talk will present an algorithmic approach to resisting both the paradox and the iron law, based on the idea of interactional justice derived from sociology, and legal and organizational theory. The process of interactional justice operationalised in this talk uses opinion formation over a social network with respect to a shared set of congruent values, to transform a set of individual, subjective self-assessments into a collective, relative, aggregated assessment.

Using multi-agent simulation, we present some experimental results about detecting and resisting cliques. We conclude with a discussion of some implications concerning institutional reformation and stability, ownership of the means of coordination, and knowledge management processes in ‘democratic’ systems.

Biography
Photograph of Professor Jeremy Pitt
Jeremy Pitt is Professor of Intelligent and Self-Organising Systems in the Department of Electrical & Electronic Engineering at Imperial College London, where he is also Deputy Head of the Intelligent Systems & Networks Group. His research interests focus on developing formal models of social processes using computational logic, and their application in self-organising multi-agent systems, for example fair and sustainable common-pool resource management in ad hoc and sensor network. He also has strong interests in human-computer interaction, socio-technical systems, and the social impact of technology; with regard to the latter he has edited two books, This Pervasive Day (IC Press, 2012) and The Computer After Me (IC Press, 2014). He has been an investigator on more than 30 national and European research projects and has published more than 150 articles in journals and conferences. He is a Senior Member of the ACM, a Fellow of the BCS, and a Fellow of the IET; he is also an Associate Editor of ACM Transactions on Autonomous and Adaptive Systems and an Associate Editor of IEEE Technology and Society Magazine.

“Ambient intelligence with sensor networks” by Lucas Amos and “Location, Location, Location: Exploring Amazon EC2 Spot Instance Pricing Across Geographical Regions” by Nnamdi Ekwe-Ekwe

Event details

  • When: 9th November 2017 13:00 - 14:00
  • Where: Cole 1.33a
  • Series: Systems Seminars Series
  • Format: Seminar

Lucas’s abstract

“Indoor environment quality has a significant effect on worker productivity through a complex interplay of factors such as temperature, humidity and levels of Volatile Organic Compounds (VOCs).

In this talk I will discuss my Masters project which used off the shelf sensors and Raspberry Pis to collect environmental readings at one minute intervals throughout the Computer Science buildings. The prevalence of erroneous readings due to sensor failure and the strategy used for the identification and correction of such faults will be presented. Identifiable correlations between environmental variables and attempts to model these relationships will be discussed

Past studies identifying the ideal environmental conditions for human comfort and productivity allow for the objective assessment of indoor environmental conditions. An adaptation of Frešer’s environment rating system will be presented, showing how VOC levels can be incorporated into assessments of environment quality and how this can be communicated to building users.”

Nnamdi’s abstract

“Cloud computing is becoming an almost ubiquitous part of the computing landscape. For many companies today, moving their entire infrastructure and workloads to the cloud reduces complexity, time to deployment, and saves money. Spot Instances, a subset of Amazon’s cloud computing infrastructure (EC2), expands on this. They allow a user to bid on spare compute capacity in Amazon’s data centres at heavily discounted prices. If demand was ever to increase such that the user’s maximum bid is exceeded, their compute instance is terminated.

In this work, we conduct one of the first detailed analyses of how location affects the overall cost of deployment of a spot instance. We simultaneously examine the reliability of pricing data of a spot instance, and whether a user can be confident that their instance has a low risk of termination.

We analyse spot pricing data across all available Amazon Web Services regions for 60 days on a variety of instance types. We find that location does play a critical role in spot instance pricing and also that pricing differs depending on the granularity of the location – from a more coarse-grained AWS region to a more fine-grained Availability Zone within a region. We relate the pricing differences we find to the price’s stability, confirming whether we can be confident in the bid prices we make.

We conclude by showing that it is very possible to run workloads on Spot Instances achieving
both a very low risk of termination as well as paying very low amounts per hour.”

“A Decentralised Multimodal Integration of Social Signals: A Bio-Inspired Approach” by Esma Benssassi and “Plug and Play Bench: Simplifying Big Data Benchmarking Using Containers” by Sheriffo Ceesay

Event details

  • When: 26th October 2017 13:00 - 14:00
  • Where: Cole 1.33a
  • Series: Systems Seminars Series
  • Format: Seminar

Esma’s abstract

The ability to integrate information from different sensory modalities in a social context is crucial for achieving an understanding of social cues and gaining useful social interaction and experience. Recent research has focused on multi-modal integration of social signals from visual, auditory, haptic or physiological data. Different data fusion techniques have been designed and developed; however, the majority have not achieved significant accuracy improvement in recognising social cues compared to uni-modal social signal recognition. One of the possible limitations is that these existing approaches have no sufficient capacity to model various types of interactions between different modalities and have not been able to leverage the advantages of multi-modal signals by considering each of them as complementary to the others. We introduce ideas for creating a decentralised model for social signals integration inspired by computational models of multi-sensory integration in neuroscience and the perception of social signals in the human brain.

Sheriffo’s abstract

The recent boom of big data, coupled with the challenges of its processing and storage gave rise to the development of distributed data processing and storage paradigms like MapReduce, Spark, and NoSQL databases. With the advent of cloud computing, processing and storing such massive datasets on clusters of machines is now feasible with ease. However, there are limited tools and approaches, which users can rely on to gauge and comprehend the performance of their big data applications deployed locally on clusters, or in the cloud. Researchers have started exploring this area by providing benchmarking suites suitable for big data applications. However, many of these tools are fragmented, complex to deploy and manage, and do not provide transparency with respect to the monetary cost of benchmarking an application.

In this talk, I will present Plug And Play Bench PAPB (https://github.com/sneceesay77/papb): an infrastructure aware abstraction built to integrate and simplify the process of big data benchmarking. PAPB automates the tedious process of installing, configuring and executing common big data benchmark workloads by containerising the tools and settings based on the underlying cluster deployment framework. Our proof of concept implementation utilises HiBench as the benchmark suite, HDP as the cluster deployment framework and Azure as the cloud platform. The talk will further illustrate the inclusion of cost metrics based on the underlying Microsoft Azure cloud platform.

SRG Seminar: “Adaptive Multisite Computation Offloading in Mobile Clouds” by Dawand Sulaiman and “Topological Ranking-Based Resource Scheduling for Multi-Accelerator Systems” by Teng Yu

Event details

  • When: 12th October 2017 13:00 - 14:00
  • Where: Cole 1.33b
  • Series: Systems Seminars Series
  • Format: Seminar

Dawand’s abstract

The concept of using cloud hosted infrastructure as a means to overcome the resource-constraints of mobile devices is known as Mobile Cloud Computing (MCC), and allows applications to run partially on the device, and partially on a remote cloud instance, thereby overcoming any device-specific resource constraints. However, as smart phones and tablets gain more CPU power and longer battery life, the meaning of MCC gradually changes. Instead of being fully dependent on the cloud, a number of nearby devices can be used to coordinate and distribute content and resources in a decentralised manner; this is known as Mobile Ad hoc Cloud Computing. Mobile devices with less computational power and lower battery life can be leveraged by the nearby mobile devices to run resource-intensive applications. Therefore, more efficient and reliable methodologies need to be explored for resource hungry and real time applications such as face recognition, data-intensive, and augmented reality mobile applications.
We present a unified framework which allows each mobile device within the shared environment to intelligently offload its computation to other external platforms. For the individual mobile devices, it is important to make the offloading decision based on network conditions, load of other machines, and mobile device’s own constraints (e.g., mobility and battery). Moreover, to achieve a global optimal task completion time for tasks from all the mobile devices, it is necessary to devise a task scheduling solution that schedules offloaded tasks in real time. The offloading decision engine needs to adapt to the dynamic changes in both the host device and connected nearby and remote devices.

Teng’s abstract

Accelerators are becoming increasingly prevalent in distributed computation. FPGAs have been shown to be fast and power efficient for particular tasks, yet scheduling on multi-accelerator systems is challenging when workloads vary significantly in granularity in terms of task size and/or number of computational unit required.
We present a novel approach for dynamically scheduling tasks on networked multi-accelerator systems which maintains high performance, even in the presence of irregular jobs. Our topological ranking-based scheduling allows realistic irregular workloads to be processed while maintaining a significantly higher level of performance than existing schedulers.