Fully funded PhD scholarship: Trustworthy Refactoring Tools for Haskell Programs

Supervisor: Dr Christopher Brown

Proposal and Context

Software is large and complex. Ubiquitous systems, such as weather forecasting, medical imaging, advanced AI and big-data processing are extremely expensive and time-consuming for software companies to produce. Moreover, they often comprise many subtle bugs that can have disastrous consequences, are difficult to find, and difficult or impossible to fix. What developers need are specialised software refactoring tools that help them develop these important and complex systems in a safe and semi-automated way, reducing developer time, human error and overall increasing productivity, saving companies and customers money, and providing robust, safe, systems that have been developed in a responsible and trustworthy way.

Refactoring is the process of changing the structure of software without changing what it does: in effect, refactoring is about helping the programmer re-purpose their code to make it more understandable, accessible, or amenable to further change in the program’s design. It is often a process that developers use on a daily basis by manually changing code to reflect an API change, re-purposing methods, eliminating duplicated code, remaining variable parameters and function names, generalising functions, etc.  However, this process is rather tedious and cumbersome to apply manually: effecting a structural change that could potentially affect millions of lines of code across thousands of files is inevitably error – prone.

The advent of refactoring tool-support provides developers with automated transformations that they can apply to their code base, usually through an existing IDE interface. Refactoring tools, on the other hand, provide a way to apply refactorings across an entire code base in a semi-automatic way: they rely on the user to make certain choices about which refactorings to perform but are automatic in their underlying machinery. This automated underlying machinery means that both simple and complex refactorings can be applied to large code bases comprising thousands of files and millions of lines of code instantly.

However, automated tools are prone to bugs. This has the potentially disastrous consequence of a refactoring tool refactoring a program into one that contains subtle bugs or changes in behaviour. Despite the obvious implication that this will refactor software to contain, perhaps, subtle and difficult-to-spot bugs, it also erodes developers’ confidence in using refactoring —and general software tools— in general. Furthermore, a refactoring that introduces errors or is not trustworthy requires the programmer to inspect the transformed code, therefore taking out the benefit of using an automated tool in the first place. The problem is also amplified in that refactoring tools are extremely cumbersome, laborious and difficult in themselves to implement, especially over large programming languages, such as Haskell. This makes the refactorings deployed in such tools limited, both in number and applicability.

What is needed is to answer the following research questions:

  1. How can we provide an automated approach of implementing refactoring tools, via compositionality and proof search?
  2. How can we have the means to generate new refactorings easily that are safe and trusted by the developers that require them?
  3. How can we provide a means to generate soundness proofs that the refactorings are safe and verified, in an automated way?

Exploratory Ideas

The PhD will be exploratory in nature. Here are some ideas for research directions to investigate as part of the PhD:

  1. Formally characterise a number of refactorings for Haskell and prove properties of their general soundness.
  2. Develop a fully verified refactoring tool that encodes general soundness proofs as part of its implementation using e.g., Dependent Types for the full Haskell standard.
  3. Model a fully verified static and functional semantics for Haskell using e.g., Dependent Types.
  4. Provide a fully generalised theory of the formalisation of refactoring tools for e.g., Haskell.
  5. Provide an automated technique to find refactorings and proofs of refactorings, via e.g., proof search and compositionality of sub-proofs.
  6. Evaluate the applicability of the approach on a number of use-cases and domains, from a variety of languages, across a different number of verticals.

Supervisor and Background

The PhD project is to be supervised by Dr Christopher Brown, a lecturer in the school of computer science who has over 18 years of experience working in the field of refactoring, program transformation and functional programming. Dr Brown contributed to the original HaRe (Haskell Refactorer) system, developed at the university of Kent, and has since developed a number of new refactoring tools for a variety of languages, including Haskell, Erlang and C/C++ for introducing and tuning parallel programs. Dr Brown also works in the field of formal semantics, with prior work on formalising refactorings and using types to reason about e.g., extra functional properties of imperative systems for embedded languages.

The PhD project fits directly with the research vision of the supervisor, who is also working on building refactoring tools for dependently typed languages, such as Idris and Pi-Forall.

The Programming Languages Research Group

The Programming Languages Research Group has a long history in functional programming and type theory. This project would directly fit with the group’s general vision of:

  1. Making programming languages more accessible to experienced and inexperienced programmers alike.
  2. Providing tool-support to make programming more accessible to inexperienced programmers.
  3. Using types to formalise general soundness of programming language properties and semantics.
  4. Providing verified refactoring tooling for functional programs.

To apply

Informal inquiries can be directed to Chris Brown. Formal applications can be made through the School’s postgraduate research portal.

The deadline for applications is 1 March 2023.

PhD success for a former graduate

Last month Professor Simon Dobson was invited to be on the PhD examining committee for Indushree Banerjee at TU Delft.

PhD examining committee, paranymphs, and family

She passed with flying colours, for her thesis on ad hoc network protocols for use in disaster recovery situations. The protocol is designed around a very strong model of social justice and equality, working on low-power mobile devices and operating so as to conserve power reserves and device lifetime over the important 48-hour initial period of disaster relief.

Indushree did her MSc in St Andrews ten years ago, which gives us the opportunity for a couple of before-and-after photographs.

Simon and Indushree, MSc graduation 2012 Simon and Indushree, PhD graduation 2022

Neither of them seem to have changed all that much, apart from Simon having gone “Full Gandalf” during lockdown.

Indushree is now doing a postdoc in Delft, focusing on technology applied to  wildlife conservation and ecology. We’re hoping to get her over for a seminar in the new year.

PhD opportunity: Interdisciplinary approaches for improving biodiversity assessments using remote sensors

Funded PhD opportunity.

The project will be heavy on state of the art machine learning with applications to statistical ecology. The project will be jointly supervised by the Schools of Maths and Stats (Alison Johnston, Chris Sutherland) and Computer Science (Kasim and Oggie) and potentially fully funded by CREEM.

Strong applicants with experience with deep learning and strong maths background are particularly welcome to apply.

Phd Scholarships for 2023

Scholarship Description
The School of Computer Science is offering the following types of scholarships for 3.5 years of study in our PhD programme. UK, EU and International students are all eligible for:

• Fully funded scholarships consisting of tuition + stipend
• Tuition-only scholarships

This award is part-funded through the University’s new ‘handsels’ scheme.

Value of Award
• Tuition scholarships cover PhD fees irrespective of country of origin.
• Stipends are valued £17,668 per annum (or the standard UKRI stipend, if it is higher).

Eligibility Criteria
We are looking for highly motivated research students willing to be part of a diverse and supportive research community. Applicants must hold a BSc or MSc in Computer Science or a related area appropriate for their proposed topic of study.

International applications are welcome. We especially encourage female applicants and underrepresented minorities to apply.

Application Deadline
All applications received before 1st February 2023 will be considered for the first round of scholarship eligibility. Later applications will also be considered for scholarships as long as funding remains.

How to Apply

If accepted, every PhD application indicating interest will automatically be considered for these scholarships. There is no need for a separate application.

The best way to win one of our scholarships is to make a robust PhD application. You are strongly encouraged to approach supervisors before formal submission to discuss your project ideas with them.

The School’s main groups are Artificial Intelligence and Symbolic Computation, Computer Systems and Networks, Human-Computer Interaction, and Programming Languages. It is highly recommended that applicants identify potential supervisors in their applications. A list of existing faculty and areas of research can be found at https://www.st-andrews.ac.uk/computer-science/prospective/pgr/supervisors/). All supervisors listed on this page may be contacted directly to discuss possible projects.

Full application instructions can be found at https://www.st-andrews.ac.uk/study/apply/postgraduate/research/.
Inquiries and questions may be directed to pg-admin-cs@st-andrews.ac.uk.

PhD Viva Success: Chawanangwa Lupafya

Please join me in congratulating Chawanangwa Lupafya, who has just passed his PhD viva subject to minor corrections.

Chawanangwa is supervised by Dr Dharini Balasubramaniam.

Special thanks to Dr Ruth Hoffman for serving as internal examiner and  Dr Rami Bahsoon from the University of Birmingham for serving as the external examiner

 

PhD Viva Success: Yasir Alguwaifli

Please join me in congratulating Yasir Alguwaifli, who has just passed his PhD viva subject to minor corrections.

Yasir, who is supervised by Christopher Brown, has provided his thesis abstract below.

Thanks to Özgür Akgün for serving as internal examiner and Prof Christoph Kessler from Linköping University for serving as the external examiner.

Controlling energy consumption has always been a necessity in many computing contexts as the resources that provide said energy is limited, be it a battery supplying power to an Single Board Computer (SBC)/System-on-a-Chip (SoC), an embedded system, a drone, a phone, or another low/limited energy device, or a large cluster of machines that process extensive computations requiring multiple resources, such as a Non-Uniform Memory Access (NUMA) system. The need to accurately predict the energy consumption of such devices is crucial in many fields. Furthermore, different types of languages, e.g. Haskell and C/C++, exhibit different behavioural properties, such as strict vs. lazy evaluation, garbage collection vs. manual memory management, and different parallel runtime behaviours. In addition most software developers do not write software with energy consumption as a goal, this is mostly due to the lack of generalised tooling to help them optimise and predict energy consumption of their software. Therefore, the need to predict energy consumption in a generalised way for different types of languages that do not rely on specific program properties is needed. We construct several statistical models based on parallel benchmarks using regression modelling such as Non-negative Least Squares (NNLS), Random Forests, and Lasso and Elastic-Net Regularized Generalized Linear Models (GLMNET) from two different programming paradigms, namely Haskell and C/C++. Furthermore, the assessment of the statistical models is made over a complete set of benchmarks that behave similarly in both Haskell and C/C++. In addition to assessing the statistical models, we develop meta-heuristic algorithms to predict the energy consumed in parallel benchmarks from Haskell’s Nofib and C/C++’s Princeton Application Repository for Shared-Memory Computers (PARSEC) suites for a range of implementations in PThreads, OpenMP and Intel’s Threading Building Blocks (TBB). The results show that benchmarks with high scalability and performance in parallel execution can have their energy consumption predicted and even optimised by selecting the best configuration for the desired results. We also observe that even in degraded performance benchmarks, high core count execution can still be predicted to the nearest configuration to produce the lowest energy sample. Additionally, the meta-heuristic technique can be employed using a language- and architecture-agnostic approach to energy consumption prediction rather than requiring hand-tuned models for specific architectures and/or benchmarks. Although meta-heuristic sampling provided acceptable levels of accuracy, the combination of the statistical model with the meta-heuristic algorithms proved to be challenging to optimise. Except for low to medium accuracy levels for the Genetic algorithm, combining meta-heuristics demonstrated limited to poor accuracy.