Inspect4py and RepoSim:  A Knowlege Extraction Framework and a DeepLearning tool for Python Repositories

Ruth Hoffmann
Monday 27 February 2023

The development of scientific software has resulted in large, complex, and swiftly growing codebases consisting of thousands of source code files, making it difficult to understand, adopt, compare, execute, reproduce, or scale scientific software. Therefore, this project aims to facilitate the production and adoption of reproducible and reusable scientific software by finding solutions for understanding and comparing scientific software repositories, thus accelerating scientific discovery. To achieve this goal, we have developed two software understanding tools:

Inspect4py is a static code analysis framework designed to help developers understand and classify Python software repositories, by analyzing and extracting important information such as functions, classes, methods, documentation, dependencies, call graphs, and control flow graphs. It is useful for understanding, managing and maintaining software repositories.

RepoSim is a tool that determines the similarity between different Python software repositories. It uses inspect4py to extract functions and documentation from each repository and employs deep learning models to calculate repository similarities based on extracted information. It is useful for detecting code-reuse and clones, identifying alternative implementations or just exploring related project.

In the future, we aim to expand RepoSim’s capabilities by incorporating additional metrics for comparing repositories. For instance, we can compare repositories based on Readmes, call graphs, paper abstracts, or other criteria that users can choose to suit their needs.

Keywords

Static Code Analysis, NLP, Machine Learning for Code, Transformers, Python Repositories

Staff

[Rosa Filgueira]{rf208}

Related topics

Share this story