Home » GAP » Group Numbers Reproducibility Project

Group Numbers Reproducibility Project

In January 2016, I’ve asked on the Mathematics Q&A site the question called “Most wanted reproducible results in computational algebra“. I hope that making a list of suggested experiments to reproduce will be useful to those interested in checking them twice ;-). For example, one could submit their findings to a journal like ReScience which “targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research is reproducible”. If you have any suggestions, please consider posting them as answers to that question. Today I’ve also added my own answer, which I am reproducing below.

I believe that enumeration of finite groups of a given order is definitely among most wanted reproducible experiments. Here “enumeration” means providing complete and non-redundant list of groups, “complete” means that no groups are missing in this list, and “non-redundant” means that groups from this list are non-isomorphic pairwise. Guaranteeing these properties is crucial for results that rely on checking all groups of a given order, or that refer to a particular group by its “catalogue number”.

The most complete collection of groups of certain orders is available in the GAP system via several interconnected packages:

Altogether, this provides some precomputed collections of groups for some orders as well as functions to construct all groups of a given order for some infinite series. None of this packages is actually a mere database which only stores certain groups, since even predominantly database-providing packages also implement algorithms for generic constructions of groups of order p^n for some p and n, and for groups of square-free orders. These packages are closely interconnected: for example, while GrpConst was used to construct some groups from the SmallGroups library, it also uses the SmallGroups library to enumerate groups of some other orders.

It is very important to have such results as much cross-checked and reproducible as possible: even if the database part of the libraries remains unchanged, that does not guarantee that any other changes in GAP and/or this packages will not break the code. Of course, a lot of cross-checks had been done, and this functionality is considered to be very reliable:

  • The group numbers in the SmallGroups library are to a large extent cross-checked, being computed using different approaches and also compared with theoretical results, where available (see [Hans Ulrich Besche, Bettina Eick and Eamonn O’Brien. A MILLENNIUM PROJECT: CONSTRUCTING SMALL GROUPS. Int. J. Algebra Comput. 12, 623 (2002), http://dx.doi.org/10.1142/S0218196702001115], in particular 4.1. Reliability of the data).
  • The Cubefree package was cross-checked against the SmallGroups Library and IRREDSOL package as described at http://www.gap-system.org/Manuals/pkg/cubefree/htm/CHAP002.htm#SECT005.
  • The GAP standard test suite, which is run nightly and is a part of the release preparation workflow, includes tests of ‘ConstructAllGroups’ from the GrpConst package.

But modern tools permit us to do even more, and in particular improve situation for orders where precomputed collections of all groups are not available. Recently I have initiated a “Group numbers reproducibility project” (which was inspired by some questions under the ‘groups-enumeration’ tag here – see in particular this one). This project uses crowdsourcing approach to assemble a database of numbers of isomorphism types of finite groups. In other words, it fills in the table of the values of the function gnu(n), returning the number of isomorphism types of finite groups of order n (so “gnu” stands for the “Group NUmber”). It puts together data from several sources, including values calculated at runtime using the packages listed above and numbers published by the AG Algebra und Diskrete Mathematik group of TU Braunschweig at http://www.icm.tu-bs.de/ag_algebra/software/small/number.html. Furthermore, it accepts reports on new values, not available in any of the above mentioned sources and on recomputation of previously known values. The data are added to the database only after they are replicated by the maintainer. The project also uses two other designations for the submissions: reproduced, when the same result was obtained using another implementation, and agrees with theory, when it corresponds to the theoretically proved result.

In the current version of the database, the value of gnu(n) is available from the computer algebra system level (from GAP locally or remotely, and from any other SCSCP client remotely). Using the version control history, one could also access provenance information (runtime, versions of the software, etc). This could be useful to the researchers interested in producing the list of all groups locally, since they can check if someone else had already attempted to do this and how much time did they wait. Since the beginning of the project, almost 200 new entries has been submitted, replicated and added to the database, which now provides the most complete available table of known values of gnu(n).

Further details could be found in the README.md file at https://github.com/alex-konovalov/gnu. See also my presentation “Computational Algebra meets Open Science: Group Numbers Reproducibility Project“.