Technical leverage analysis in the Python ecosystem: lessons learned


[Context:] Technical leverage is the ratio between dependencies (other people’s code) and own codes of a software package. It has been shown to be useful to characterize the Java ecosystem and there are also studies on the NPM ecosystem available. [Objective:] By using this metric we aim to analyze the Python ecosystem, how it evolves, and how secure it is, as a developer would perceive it when deciding to adopt or update (or not) a library. [Method:] We collect a dataset of the top 600 Python packages (corresponding to 21,205 versions) and used a number of innovative approaches for its analysis including the use of a two-part statistical model to deal with excess zeros, a mathematical closed formulation to estimate vulnerabilities that we confirm with bootstrapping on the actual dataset. [Results:] Small Python package versions have a median technical leverage of 6.9x their own code, while bigger package versions rely on dependencies code a tenth of their own (median leverage of 0.1). In terms of evolution, Python packages tend to have stable technical leverage through their evolution (once highly leveraged, always leveraged). On security, the chance of getting a safe package version when choosing a package is actually better than previous research has shown based on the ratio of safe package versions in the ecosystem. [Conclusions:] Python packages ship a lot of other people’s code and tend to keep doing so. However, developers will have a good chance to choose a safe package version.