When you create a new Python 2 or Python 3 notebook, you are actually invoking something called a kernel in order to do so. Each programming language comes with its own kernel. The vanilla Python 2 and 3 kernels do not come with many packages pre-installed (the major exception is PySpark, which you can read more about below). If you would like to use other standard packages (e.g. NumPy, SciPy, Matplotlib, etc), we recommend installing them into a virtual environment and then creating your own custom kernel that can be selected from the “New” button on top-right of your homepage. Note that Python-based kernels in Jupyter also come with support for shell magic commands.
Python packages can be easily installed from within a notebook. For Python 3, you can place the following lines in a single cell (it’s actually important to use a single cell to ensure the Python interpreter in your virtual environment is selected) and execute the cell to build your custom Python 3 kernel with the packages listed below installed via pip. Note that the name of the kernel for this example will appear as py3-data-science (you will probably select a different name):
%%bash
python3.6 -m virtualenv py3-data-science
source py3-data-science/bin/activate
pip3 install ipykernel numpy scipy matplotlib pandas scikit-learn hdfs
ipython kernel install --user --name=py3-data-science
After successfully running this notebook (it may take several minutes to complete, and make sure you confirm there were no error messages listed in the output from the cell), you can refresh your JupyterHub homepage and select your custom kernel from the “New” drop-down menu on your homepage.
The process is very similar for Python 2:
%%bash
python2 -m virtualenv py2-data-science
source py2-data-science/bin/activate
pip install ipykernel numpy scipy matplotlib pandas scikit-learn hdfs
ipython kernel install --user --name=py2-data-science
Note that when using a custom kernel, shell commands executed in a cell, such as !pip3 freeze
will not be aware of the virtual environment. So a command such as !pip3 install pandas
will fail as users do not have permission to add packages to the system python installation. If you want to check your installed packages or add additional packages, you must source your virtual environment in each cell that you execute bash commands. For example, to list packages installed in your virtual environment you should instead use the following instead of !pip3 freeze
:
%%bash
source py3-data-science/bin/activate
pip3 freeze