Fully funded PhD scholarship: Trustworthy Refactoring Tools for Haskell Programs

Supervisor: Dr Christopher Brown

Proposal and Context

Software is large and complex. Ubiquitous systems, such as weather forecasting, medical imaging, advanced AI and big-data processing are extremely expensive and time-consuming for software companies to produce. Moreover, they often comprise many subtle bugs that can have disastrous consequences, are difficult to find, and difficult or impossible to fix. What developers need are specialised software refactoring tools that help them develop these important and complex systems in a safe and semi-automated way, reducing developer time, human error and overall increasing productivity, saving companies and customers money, and providing robust, safe, systems that have been developed in a responsible and trustworthy way.

Refactoring is the process of changing the structure of software without changing what it does: in effect, refactoring is about helping the programmer re-purpose their code to make it more understandable, accessible, or amenable to further change in the program’s design. It is often a process that developers use on a daily basis by manually changing code to reflect an API change, re-purposing methods, eliminating duplicated code, remaining variable parameters and function names, generalising functions, etc.  However, this process is rather tedious and cumbersome to apply manually: effecting a structural change that could potentially affect millions of lines of code across thousands of files is inevitably error – prone.

The advent of refactoring tool-support provides developers with automated transformations that they can apply to their code base, usually through an existing IDE interface. Refactoring tools, on the other hand, provide a way to apply refactorings across an entire code base in a semi-automatic way: they rely on the user to make certain choices about which refactorings to perform but are automatic in their underlying machinery. This automated underlying machinery means that both simple and complex refactorings can be applied to large code bases comprising thousands of files and millions of lines of code instantly.

However, automated tools are prone to bugs. This has the potentially disastrous consequence of a refactoring tool refactoring a program into one that contains subtle bugs or changes in behaviour. Despite the obvious implication that this will refactor software to contain, perhaps, subtle and difficult-to-spot bugs, it also erodes developers’ confidence in using refactoring —and general software tools— in general. Furthermore, a refactoring that introduces errors or is not trustworthy requires the programmer to inspect the transformed code, therefore taking out the benefit of using an automated tool in the first place. The problem is also amplified in that refactoring tools are extremely cumbersome, laborious and difficult in themselves to implement, especially over large programming languages, such as Haskell. This makes the refactorings deployed in such tools limited, both in number and applicability.

What is needed is to answer the following research questions:

  1. How can we provide an automated approach of implementing refactoring tools, via compositionality and proof search?
  2. How can we have the means to generate new refactorings easily that are safe and trusted by the developers that require them?
  3. How can we provide a means to generate soundness proofs that the refactorings are safe and verified, in an automated way?

Exploratory Ideas

The PhD will be exploratory in nature. Here are some ideas for research directions to investigate as part of the PhD:

  1. Formally characterise a number of refactorings for Haskell and prove properties of their general soundness.
  2. Develop a fully verified refactoring tool that encodes general soundness proofs as part of its implementation using e.g., Dependent Types for the full Haskell standard.
  3. Model a fully verified static and functional semantics for Haskell using e.g., Dependent Types.
  4. Provide a fully generalised theory of the formalisation of refactoring tools for e.g., Haskell.
  5. Provide an automated technique to find refactorings and proofs of refactorings, via e.g., proof search and compositionality of sub-proofs.
  6. Evaluate the applicability of the approach on a number of use-cases and domains, from a variety of languages, across a different number of verticals.

Supervisor and Background

The PhD project is to be supervised by Dr Christopher Brown, a lecturer in the school of computer science who has over 18 years of experience working in the field of refactoring, program transformation and functional programming. Dr Brown contributed to the original HaRe (Haskell Refactorer) system, developed at the university of Kent, and has since developed a number of new refactoring tools for a variety of languages, including Haskell, Erlang and C/C++ for introducing and tuning parallel programs. Dr Brown also works in the field of formal semantics, with prior work on formalising refactorings and using types to reason about e.g., extra functional properties of imperative systems for embedded languages.

The PhD project fits directly with the research vision of the supervisor, who is also working on building refactoring tools for dependently typed languages, such as Idris and Pi-Forall.

The Programming Languages Research Group

The Programming Languages Research Group has a long history in functional programming and type theory. This project would directly fit with the group’s general vision of:

  1. Making programming languages more accessible to experienced and inexperienced programmers alike.
  2. Providing tool-support to make programming more accessible to inexperienced programmers.
  3. Using types to formalise general soundness of programming language properties and semantics.
  4. Providing verified refactoring tooling for functional programs.

To apply

Informal inquiries can be directed to Chris Brown. Formal applications can be made through the School’s postgraduate research portal.

The deadline for applications is 1 March 2023.

Why Homotopy type Theory (HoTT) matters – Professor Thorsten Altenkirch

Abstract:
Dependent types are a wonderful way to construct correct functional programming and specify interfaces as Edwin has shown in his nice book on type driven development using a welsh dragon. But shall we go further in the esoteric world of homotopy type theory? I will try to motivate this and I am looking forward to some discussions with people who have a more pragmatic attitude to dependent types.

Event details

  • When: 25th May 2018 11:00 - 12:30
  • Where: Cole 1.33a
  • Format: Seminar

The OpenMP and MPI refactoring with ParaFormance – Turkey Alsalkini

Abstract:

The increasing complexity of codes with the growing number of cores that should be utilised make such codes hard to optimise and maintain. In this talk, we present the OpenMP and MPI refactoring implemented in the ParaFormance tool. This tool transforms the sequential code into parallel code able to run on shared memory machines. Further refactoring is implemented to adapt the source code to exploit a larger number of processors on large HPC clusters with message passing support. In addition, the resulting MPI code can be used by developers as a starting point for further optimisation. Both refactorings are preceded by an advanced safety checking which reports concurrency problems and gives hints and suggestions on how to fix them.

Event details

  • When: 17th May 2018 12:00 - 13:00
  • Where: Cole 1.33a
  • Format: Talk

A Type-System for describing System-on-a-Chip Architectures – Jan De Muijnck-Hughes

Title:
A Type-System for describing System-on-a-Chip Architectures

Abstract:
The protocols that describe the interactions between IP Cores on System-on-a-Chip (SoC) architectures are well-documented. These protocols described not only the structural properties of the physical interfaces but also the behaviour of the emanating signals. However, there is a disconnect between the design of SoC architectures, their formal description, and the verification of their implementation in known hardware description languages.

Within the Border Patrol project we are investigating how to capture and reason about the structural and behavioural properties of SoC architectures using state-of-the-art advances in programming language research. Namely, we are investigating using dependent types and session types to capture and reason about hardware communication.

In this talk I will discuss my work in designing a dependent type- system and corresponding language that captures and reasons about the topological structure of a System-on-a-Chip. This language provides correct-by-construction guarantees over:

  • the physical structure of an interaction protocol;
  • the adherence of a component’s interface to a given protocol; and
  • the validity of the specified connections made between components.

We provide these guarantees through the (ab)use of dependent types as presented in Idris; and abuse of indexed monads to reason about resource usage.

Given time I will give an account of how this language enables reasoning about SoC behaviour when considered in conjunction with Session Types.ssion Types.

Event details

  • When: 5th April 2018 12:00 - 13:00
  • Where: Cole 1.33a
  • Format: Talk

Diderot: A Parallel Domain-Specific Language for Image Analysis and Visualization – John Reppy

Diderot: A Parallel Domain-Specific Language for Image Analysis and Visualization

Abstract:
The analysis of structure in three-dimensional images is increasingly valuable for biomedical research and computational science. At the same time, the computational burden of processing images is increasing as devices produce images of higher resolution (e.g., typical CT scans have gone from 128^3 to roughly 512^3 resolutions). With the latest scanning technologies, it is also more common for the the values measured at each sample to be multi-dimensional rather than a single scalar, which further complicates implementing mathematically correct methods.

Diderot is a domain-specific language (DSL) for programming advanced 3D image visualization and analysis algorithms. These algorithms, such as volume rendering, fiber tractography, and particle systems, are naturally defined as computations over continuous tensor fields that are reconstructed from the discrete image data. Diderot combines a high-level mathematical programming notation based on tensor calculus with an abstract bulk-synchronous parallelism model. Diderot is designed to both enable rapid prototyping of new image analysis algorithms and high performance on a range of parallel platforms.

In this talk, I will give an overview of the design of Diderot and examples of its use. I will then describe aspects of its implementation with a focus on how we translate the notation of tensor calculus to efficient code. I will also briefly discuss the automated techniques we use to validate the correctness of the compilation process.

Diderot is joint work with Gordon Kindlmann, Charisee Chiw, Lamont Samuels, and Nick Seltzer.

Bio:
John Reppy is a Professor of Computer Science and a Senior Fellow of the Computation Institute at the University of Chicago. He received his Ph.D. from Cornell University in 1992 and spent the first eleven years of his career at Bell Labs in Murray Hill NJ. He has been exploring issues in language design and implementation since the late 1980’s, with a focus on higher-order, typed, functional languages. His work includes the invention of Concurrent ML and work on combining object-oriented and functional language features. His current research is on high-level languages for parallel programming, including the Diderot, Manticore, and Nessie projects.

Event details

  • When: 2nd April 2018 13:00 - 14:00
  • Where: Cole 1.33b
  • Format: Seminar

Compositional Coinduction with Sized Types – Dr. Andreas Abel

Abstract:

Formal languages and automata are taught to every computer science student.  However, the student will most likely not see the beautiful coalgebraic foundations, which use coindutive reasoning.

In this talk, I recapitulate how infinite tries can represent formal languages (sets of strings).  I explain Agda’s coinduction mechanism based on copatterns and sized types demonstrate that it allows an elegant representation of the usual language constructions like union, concatenation, and Kleene star, with the help of Brzozowski derivatives.

Event details

  • When: 16th February 2018 12:00 - 13:00
  • Where: Cole 1.33b
  • Format: Seminar