Jupyter AI brings generative AI capabilities right into the interface. Having a local AI assistant ensures privacy, reduces latency, and provides offline functionality, making it a powerful tool for developers. In this article, we’ll learn how to set up a local AI coding assistant in JupyterLab using Jupyter AI, Ollama and Hugging Face. By the end of this article, you’ll have a fully functional coding assistant in JupyterLab capable of autocompleting code, fixing errors, creating new notebooks from scratch, and much more, as shown in the screenshot below.
⚠️ Jupyter AI is still under heavy development, so some features may break. As of writing this article, I’ve tested the setup to confirm it works, but expect potential changes as the project evolves. Also the performance of the assistant depends on the model that you select so make sure you choose the one that is fit for your use case.
First things first — what is Jupyter AI? As the name suggests, Jupyter AI is a JupyterLab extension for generative AI. This powerful tool transforms your standard Jupyter notebooks or JupyterLab environment into a generative AI playground. The best part? It also works seamlessly in environments like Google Colaboratory and Visual Studio Code. This extension does all the heavy lifting, providing access to a variety of model providers (both open and closed source) right within your Jupyter environment.

Setting up the environment involves three main components:
- JupyterLab
- The Jupyter AI extension
- Ollama (for Local Model Serving)
- [Optional] Hugging Face (for GGUF models)
Honestly, getting the assistant to resolve coding errors is the easy part. What is tricky is ensuring all the installations have been done correctly. It’s therefore essential you follow the steps correctly.
1. Installing the Jupyter AI Extension
It’s recommended to create a new environment specifically for Jupyter AI to keep your existing environment clean and organised. Once done follow the next steps. Jupyter AI requires JupyterLab 4.x or Jupyter Notebook 7+, so make sure you have the latest version of Jupyter Lab installed. You can install/upgrade JupyterLab with pip or conda:
# Install JupyterLab 4 using pip
pip install jupyterlab~=4.0
Next, install the Jupyter AI extension as follows.
pip install "jupyter-ai[all]"
This is the easiest method for installation as it includes all provider dependencies (so it supports Hugging Face, Ollama, etc., out of the box). To date, Jupyter AI supports the following model providers :

If you encounter errors during the Jupyter AI installation, manually install Jupyter AI using pip
without the [all] optional dependency group. This way you can control which models are available in your Jupyter AI environment. For example, to install Jupyter AI with only added support for Ollama models, use the following:
pip install jupyter-ai langchain-ollama
The dependencies depend upon the model providers (see table above). Next, restart your JupyterLab instance. If you see a chat icon on the left sidebar, this means everything has been installed perfectly. With Jupyter AI, you can chat with models or use inline magic commands directly within your notebooks.

2. Setting Up Ollama for Local Models
Now that Jupyter AI is installed, we need to configure it with a model. While Jupyter AI integrates with Hugging Face models directly, some models may not work properly. Instead, Ollama provides a more reliable way to load models locally.
Ollama is a handy tool for running Large Language Models locally. It lets you download pre-configured AI models from its library. Ollama supports all major platforms (macOS, Windows, Linux), so choose the method for your OS and download and install it from the official website. After installation, verify that it is set up correctly by running:
Ollama --version
------------------------------
ollama version is 0.6.2
Also, ensure that your Ollama server must be running which you can check by calling ollama serve
at the terminal:
$ ollama serve
Error: listen tcp 127.0.0.1:11434: bind: address already in use
If the server is already active, you will see an error like above confirming that Ollama is running and in use.
Option 1: Using Pre-Configured Models
Ollama provides a library of pre-trained models that you can download and run locally. To start using a model, download it using the pull command. For example, to use qwen2.5-coder:1.5b
, run:
ollama pull qwen2.5-coder:1.5b
This will download the model in your local environment. To confirm if the model has been downloaded, run:
ollama list
This will list all the models you’ve downloaded and stored locally on your system using Ollama.
Option 2: Loading a Custom Model
If the model you need isn’t available in Ollama’s library, you can load a custom model by creating a Model File that specifies the model’s source.For detailed instructions on this process, refer to the Ollama Import Documentation.
Option 3: Running GGUF Models directly from Hugging Face
Ollama now supports GGUF models directly from the Hugging Face Hub, including both public and private models. This means if you want to use GGUF model directly from Hugging Face Hub you can do so without requiring a custom Model File as mentioned in Option 2 above.
For example, to load a 4-bit quantized Qwen2.5-Coder-1.5B-Instruct model
from Hugging Face:
1. First, enable Ollama under your Local Apps settings.

2. On the model page, choose Ollama from the Use this model dropdown as shown below.

We are almost there. In JupyterLab, open the Jupyter AI chat interface on the sidebar. At the top of the chat panel or in its settings (gear icon), there is a dropdown or field to select the Model provider and model ID. Choose Ollama as the provider, and enter the model name exactly as shown by Ollama list in the terminal (e.g. qwen2.5-coder:1.5b
). Jupyter AI will connect to the local Ollama server and load that model for queries. No API keys are needed since this is local.
- Set Language model, Embedding model and inline completions models based on the models of your choice.
- Save the settings and return to the chat interface.

This configuration links Jupyter AI to the locally running model via Ollama. While inline completions should be enabled by this process, if that doesn’t happen, you can do it manually by clicking on the Jupyternaut icon, which is located in the bottom bar of the JupyterLab interface to the left of the Mode indicator (e.g., Mode: Command). This opens a dropdown menu where you can select Enable completions by Jupyternaut
to activate the feature.

Once set up, you can use the AI coding assistant for various tasks like code autocompletion, debugging help, and generating new code from scratch. It’s important to note here that you can interact with the assistant either through the chat sidebar or directly in notebook cells using %%ai magic commands
. Let’s look at both the ways.
Coding assistant via Chat interface
This is pretty straightforward. You can simply chat with the model to perform an action. For instance, here is how we can ask the model to explain the error in the code and then subsequently fix the error by selecting code in the notebook.

You can also ask the AI to generate code for a task from scratch, just by describing what you need in natural language. Here is a Python function that returns all prime numbers up to a given positive integer N, generated by Jupyternaut.

Coding assistant via notebook cell or IPython shell:
You can also interact with models directly within a Jupyter notebook. First, load the IPython extension:
%load_ext jupyter_ai_magics
Now, you can use the %%ai
cell magic to interact with your chosen language model using a specified prompt. Let’s replicate the above example but this time within the notebook cells.

For more details and options you can refer to the official documentation.
As you can gauge from this article, Jupyter AI makes it easy to set up a coding assistant, provided you have the right installations and setup in place. I used a relatively small model, but you can choose from a variety of models supported by Ollama or Hugging Face. The key advantage here is that using a local model offers significant benefits: it enhances privacy, reduces latency, and decreases dependence on proprietary model providers. However, running large models locally with Ollama can be resource-intensive so ensure that you have sufficient RAM. With the rapid pace at which open-source models are improving, you can achieve comparable performance even with these alternatives.