5.2. Conda environments

Once you’ve installed Anaconda, when you open a new shell you should see the word (base) on the shell prompt (if not, try typing conda --version to make sure Anaconda installed properly and was set up). The word in parentheses is your active conda environment, by default this is the base environment.

5.2.1. Why Environments?

Virtual environments allow us to install packages into a self-contained location on our computer without affecting any other parts of our system. This is particularly useful is we have multiple projects, each with their own requirements and dependencies. Sometimes the dependencies of different projects can be conflicting, and we want to be careful about installing new packages which could break code we’ve written for another project.

Another good reason for virtual environments is reproducibility. If we develop and run our programs in virtual environments, we know the exact packages which allowed us to properly run the code, and we can later share this environment with collaborators, or be confident our code will run when we return to it a year or many years later.

5.2.2. Creating a New Environment

To make a conda environment, we use the command conda create. We need to specify a name of the environment with -n. We can also specify some packages we want to install right away. Try it in your shell!

conda create -n researchenv python numpy matplotlib jupyter

Aside: if you want to install all of the default Anaconda packages (e.g. the ones available in the base environment), you can type conda create -n <envname> anaconda. We very rarely need all of the packages that come with Anaconda, and it is generally better to use as few dependencies as possible. An environment with only the packages you need has less chances of dependency conflicts and is more reproducible!

Once conda processes the dependencies of the packages you specified. Are you surprised by how many dependencies it takes to make numpy, matplotlib and jupyter work? Good thing conda can keep track of these dependencies for us!

Once you confirm with y, conda will install these packages in a self-contained environment called researchenv.

To activate an environment, we use the conda activate <envname> command. To deactivate an environment, type conda deactivate (no envname needed in deactivate). Try activating your new conda environment and listing the installed packages with conda list.

conda activate researchenv

Check your command prompt to ensure your environment is active (you should see (researchenv) at the beginning of the line).

We can check the packages in the current environment with the conda list command.

conda list

5.2.3. Using an activated environment

If we start Python from an active conda environment, we will have access to the packages installed in that environment. We can test this by running a simple one-line Python script in the command line. Try the following:

python -c "import pandas"

Did you see an error? This is because we haven’t installed pandas in our conda environment. Instead let’s try importing an installed module like numpy:

python -c "import numpy"

No error? That means numpy is available in our environment!

Now let’s try deactivating our environment and see what packages the conda base environment provides.

conda deactivate
conda list

The default conda environment comes with a lot of packages. If you encounter long unwieldy output on the command line, you can always search it by piping (|) it to the grep bash command. Say we want to see if pandas is included in the base environment, we can write:

conda list | grep pandas

Now what happens if we try to import pandas with the base conda environment active?

python -c "import pandas"

No error! This example illustrates how conda environments are self-contained. Our researchenv does not know about any of the packages in base, it only has access to the ones we’ve installed in it. This makes it important to be aware of which environment you have active. If you get errors about packages that don’t exist, make sure you’ve conda activated the right environment.

Say we want to use pandas in our tidy, self-contained researchenv. First we will need to activate our environment then install it!

5.2.4. Installing packages

Now let’s try installing a new package. When you use the conda install command, it installs the package in the currently active environment only. Let’s install pandas in researchenv.

conda activate researchenv
conda install pandas

Once you’ve confirmed and conda is finished working, let’s check our list of installed packages. Remember, we can search for a word by piping output to grep:

conda list | grep pandas

Great! You now know how to make environments, activate them and install new packages!

5.2.5. Finding new packages

Often we don’t know what packages exist or do the things we want them to do. Sometimes a simple Google search about the program you are writing will reveal that many others have had the same problem and have already written a package to solve it!

Say we want to work with GeoSpatial data in Python and some Googling reveals the rasterio package. First we should check if it is available from Anaconda by Googling “conda packages”, or just going straight to this link.

On this page we can search for packages available to be installed by Anaconda (most well-used packages are). Try searching for rasterio in the search box.

My search revealed two top results:

  • conda-forge/rasterio

  • anaconda/rasterio

5.2.6. Anaconda channels

It seems rasterio has two versions of the package available from two different channels of Anaconda. The anaconda channel is the default, and your conda install command will generally install from anaconda. The conda-forge channel is a community initiative to make open-source community packages easier to access. Sometimes packages will only be available here (or in the case of rasterio, we can see that the conda-forge version has many more downloads and is probably the one we want).

We can install packages from a particular channel by using the -c option to conda install. Make sure your researchenv is active, then try the following:

conda install -c conda-forge rasterio

5.2.7. Pip

Sometimes a package is not available for install from any channels of Anaconda, but it may still be available from the default Python package manager, pip.

The pip install command is similar to conda install, and should still install Python packages and their dependencies. The main advantage of using conda install over pip install is that conda is not language-specific. Packages installed with conda can have dependencies not necessarily written in Python, and conda will be able to install and keep track of those non-Python packages for for you. Pip will handle Python dependencies only.

There is no problem with using both conda install and pip install within a conda environment, though it is best practice to try stay consistent and use conda where possible

5.2.8. Installing Packages TL;DR

First find the name of a new package you would like to install (via Google, word of mouth, the Python Package Classifieds in the newspaper, etc.)

  • Search the Anaconda package list for the package

  • if it is only available from anaconda or this version has the most installs, use conda install <name>

  • if it is only available from conda-forge or this version has the most installs, use conda install -c conda-forge <name>

  • if neither is available, use pip install <name>

Now we have the tools we need to find and install packages into Anaconda environments! Feel free to make a new environment whenever you start a new project or want to try installing a new package without knowing if it will conflict with your current packages. Conda environments can help us keep our code packages tidy and reproducible!

In the next section we will learn to make an exact copy of our environment to share with collaborators or store for our future selves to ensure we can reproduce our results.

5.2.9. Sharing environments

Say we want to send our conda environment to a collaborator so that they can run some code that we wrote. Or maybe we simply want a list of the packages we used to get a result to make sure we can use produce the same result in the future.

5.2.10. The environment.yml

Anaconda provides us with a way to build an identical environment using a special configuration file usually called environment.yml. To generate this file, we use the conda env export command. This command simply produces a list of the packages currently installed in our active conda environment. If we pipe this output to environment.yml, voila! We have instructions for Anaconda to reproduce our environment.

conda env export > environment.yml

Now we have an environment.yml file that looks something like:

name: researchenv
channels:
  - conda-forge
  - defaults
dependencies:
  - <package1>=<verion.major.minor>
  - <package2>=<verion.major.minor>

Now we can make a brand new environment using this environment.yml file.

5.2.11. Deleting environments

Let’s delete our researchenv environment with conda remove --all. The --all ensures that all packages and the environment are deleted (rather than simply deleting one package at a time from your environment).

conda remove --name researchenv --all

We can check that the environment is deleted by trying to activate it:

conda activate researchenv

Or we can list all available environments with:

conda env list

5.2.12. Creating environments from a yml

Now we can make an identical environment with the environment.yml file that we saved:

conda env create -f environment.yml
conda activate researchenv
conda list

You should now see all of the files that we previously had in researchenv

5.2.13. Caveats to Sharing Environments

Now that we have an environment.yml file, we should be able to send it to a collaborator, and have them run conda env create -f environment.yml to have a copy of our working environment.

One caveat is that certain packages depend on the operating system that the code is being run on, and Anaconda will not always be able to exactly reproduce the environment on another system. Using an environmnet.yml works best on the same OS that it was created on.

To make our code truly distributable and reproducible on other machines, we get into the realm of developing open source packages and publishing them to be installable with pip or conda. While this is a deep topic outside the scope of this book, there are many resources online for exploring this further if you want to share your awesome package with the community!

Now you know all you need to get started using Anaconda and managing environments for your programming projects. Everything you need to know about Anaconda environments is available here (or like everything else, via Google search).