I’m a data scientist that works primarily in R. Recently I’ve been on a path to learn more about Python, which I initially installed via brew. Unfortunately I almost immediately ran into the following error:
error: externally-managed-environment
× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
python3-xyz, where xyz is the package you are trying to
install.
If you wish to install a non-Debian-packaged Python package,
create a virtual environment using python3 -m venv path/to/venv.
Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
sure you have python3-full installed.
If you wish to install a non-Debian packaged Python application,
it may be easiest to use pipx install xyz, which will manage a
virtual environment for you. Make sure you have pipx installed.
See /usr/share/doc/python3.11/README.venv for more information.
note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
Which led me down a rabbit hole of learning about Python virtual environments.
Virtual environments let you isolate project-specific dependencies from your system installation of Python. The benefits are 1) if you accidentally screw something up it won’t disrupt the system Python environment and 2) your projects become more reproducible.
Eventually, what started as a simple attempt to set up a new project and practice some Python left me feeling completely out to sea. After days of wrestling with pip, poetry, conda, pipenv, and virtualenv, I was left scratching my head about Python’s package management landscape. I can’t help but wonder: what the hell happened?
Coming from R’s relatively straightforward package management ecosystem, with just {renv} handling most of my needs, I was overwhelmed by the sheer number of Python package management tools. While choice is generally good, each one seemed to solve a different piece of the puzzle, and every blog post or tutorial I read recommended a different approach. After spending what felt like weeks exploring various options and dealing with environment conflicts, I eventually decided I just needed to make a choice and pick something that would work well for my data science workflow.
I mean just look at the table I made summarizing my notes!
Tool | Pros | Cons |
---|---|---|
venv |
- Built into Python 3.3+ - Zero configuration |
- Manual dependency management - Python-only focus |
conda |
- Cross-platform scientific stack - Non-Python dependency support |
- Large installation size - Slower than alternatives |
micromamba |
- Lightweight Conda alternative - Blazing-fast solver |
- CLI-only experience - Smaller community |
poetry |
- All-in-one dependency/packaging - Modern pyproject.toml workflow |
- Complex resolver - Limited conda integration - History of forcing users towards unwanted version upgrades | |
pipx |
- Isolated CLI tool management - Prevents global pollution |
- For installing python commands, not for installing project package dependencies |
PDM |
- PEP 582 pioneer - Fast dependency resolution |
- No system library support - Smaller ecosystem |
pip-tools |
- Lightweight pip enhancement - Deterministic locking |
- Manual workflow - Basic features |
uv |
- Rust-powered speed demon - Modern dependency resolver |
- New/unproven - Python-only |
hatch |
- Modern project builder - Built-in version management |
- Growing community - Less mature plugin system |
rye |
- Rust-based toolchain - Integrated Python version control |
- Experimental - Rapidly changing API |
virtualenv |
- Python 2 compatibility - Legacy standard |
- Redundant with venv - Requires separate install |
setuptools |
- Traditional packaging - Deep customization |
- No lockfiles - Complex setup.py |
flit |
- Simple pure-Python packaging - Quick PyPI uploads |
- Limited to basic projects |
mamba |
- Conda-compatible speed upgrade - Optimized C++ solver |
- Conda dependency - Less mature than conda |
pixi |
- Unified Python/R dependency management - Built-in task automation - Reproducible lockfiles |
- Early-stage project but under active development - Requires workflow adoption |
pipenv |
- Combines pip + virtualenv - Security-focused dependency locking |
- Deprecated in PEP 665 - Performance issues - Less active development |
And I’m certain I missed more than a few.
After exploring these numerous options, I settled on pixi for my data science workflow, and here’s why: I needed a tool that could handle both Python packages and the scientific computing dependencies that often come with data science work. What really sealed the deal was discovering that pixi can handle R packages alongside Python ones - which could be a game-changer for hybrid workflows where I might need both languages.
While conda is the traditional choice here, pixi offers a more modern
and streamlined approach. Its .toml
and .lock
files ensure
reproducibility (similar to renv in R), while its ability to handle
non-Python dependencies means I don’t have to worry about complex
system-level installations or bother with docker for simple projects.
This is particularly valuable when working with packages that require specific system libraries or external dependencies like database drivers, CUDA toolkits, or specialized scientific computing libraries.
The pixi.toml
configuration seems clean and intuitive and includes both
templates and activation scripts. Most importantly, the task automation
features let me define common data science workflows (like running
jupyter notebooks or training models) in a way that’s easily shareable
with colleagues.
While it’s true that pixi seems newer than some alternatives, it could help me resolve a lot of my package management needs simply and easily. And its unique ability to handle both R and Python ecosystems made it the clear choice for my transition to a multi-language data science workflow. I expect there will still be some friction but I gotta admit, I’m excited. And I love the idea of being able to specify and install the versions of languages, my most used packages, and any external dependencies all from one place.
I like the idea of being to open a shell and run something like:
```
pixi add pytorch
or
pixi add r-tidymodels
Have you got any suggestions? What do you use?