Skip to main content

Using Jupyter Notebooks with Hugo

··1842 words·9 mins
Race Dorsey
Author
Race Dorsey
Table of Contents
Convert Jupyter notebooks to markdown, and hide or collapse your cells/inputs/outputs.

This site is built with Hugo, a framework to build static websites. One issue, however, is that Hugo does not support jupyter notebook (.ipynb) files natively. In my search to find how to use jupyter notebooks with my site, I found and tried many options but ultimately landed with using nbconvert.

Want to see what this will look like at the end? Head over to my predicting recipe website traffic project

Things I tried or looked at
#

I tried to use over half a dozen options before landing on nbconvert. Some of these were:

Some of these options I tried and ran into issues, or other solutions seemed to be different than my current needs. While digging through the various options though I realized that several of the options called for nbconvert. Using nbconvert seemed like a natural choice after seeing it used in several different solutions.

Using nbconvert with Hugo
#

Here are the steps I took to use nbconvert with Hugo:

1. Install nbconvert
#

See installation instructions. Out of the box, nbconvert works well.

jupyter nbconvert --to markdown path/to/notebook.ipynb

This produces a notebook.md but in order to make it a page that renders in hugo we’ll need to do a bit more.

2. Add Frontmatter
#

In your notebook, create a markdown cell as the first cell and add your frontmatter:

---
date: 2024-02-02T04:14:54-08:00
draft: false
title: Example
---

3. Check Folder Structure
#

Put your notebook inside of the folder you want your page to be, and rename the notebook as index.ipynb.

This assumes the following folder structure:

path/to/project
└── content
    └── category-name
        └── page-name
            └── index.ipynb <--- your notebook titled 'index'

Now jupyter nbconvert --to markdown path/to/index.ipynb will produce an index.md within your folder.

Run hugo server and your page should render your notebook fully. And since it is in markdown, it will render with proper colors for your theme (which was one issue I had with some options that wanted to rely on html output).

Alternatively, you can have your notebooks saved in some other folder such and use the --output index --output-dir path/to/dir arguments for it to generate an index.md within your specified directory. The first argument changes the output filename to index.md and the second argument changes the output folder.

Hiding Jupyter Cells, Inputs, and/or Outputs
#

You now can have your jupyter notebook as a hugo webpage, but sometimes you do not want all of your cells to be displayed.

1. Create config file
#

path/to/project/python/jupyter/nbconvert_config.py:

c.TagRemovePreprocessor.enabled = True
c.TagRemovePreprocessor.remove_cell_tags = ['hide_cell']
c.TagRemovePreprocessor.remove_input_tags = ['hide_input']
c.TagRemovePreprocessor.remove_all_outputs_tags = ['hide_output']

2. Tag Cells
#

Open your notebook and go to your cells and add the tags hide_cell, hide_input, or hide_output as approropriate.

3. Run nbconvert with config
#

Use the following command to run nbconvert and specifying your config file:

jupyter nbconvert --to markdown --config path/to/project/python/jupyter/nbconvert_config.py path/to/index.ipynb

Now that cells/inputs/outputs that were tagged will be removed (not collapsed), meaning they will not be shown in your markdown.

Alternatively, you can store your config file in .venv/etc/jupyter as jupyter_nbconvert_config.py which will be picked up automatically when you use nbconvert without using the --config argument. Note this will occur even if you are using nbconvert for some other purpose within your environment. If you run into issues with the file not being picked up add --debug to your arguments and look for the paths nbconvert is checking for the config file.

Collapsing Jupyter Cells, Inputs, and Outputs
#

Removing cells is nice in some cases, but I want to be able to collapse sections of my notebook so they are there in case someone is so inclined to read through them. Collapsed sections can let you structure your notebooks to form a report but still contain all the relevant info. This is a little more involved than removing cells/inputs/outputs but we’ll take it one step at a time. Note that I assume you’re not doing the alternative approaches I listed above and if you are you will of course need to adjust things slightly.

The issue I ran into is that when trying to handle collapsing input/outputs within the preprocess, nbconvert expects your inputs and outputs to look a certain way and it will format them accordingly. Specifically, it will format even our detail shortcode so we get something like this for inputs:

Input collapsed. Click to expand:
print("collapse_input's output")
collapse_input's output

And then the output looks like this:

print("collapse_output")
Output collapsed. Click to expand:
collapse_output

While this works, it isn’t saving any space because the detail shortcodes are actually being put inside of coding blocks which is just a peculiarity with how nbconvert interacts with what we’re trying to do. To solve this we will be writing our own postprocessing script.

1. Enable collapsing sections in markdown with a shortcode.
#

First, we need a way to collapse sections of text. In HTML this is done with a <detail> and <summary> HTML tags but Hugo does not have a way to utilize these within markdown directly. I have written a guide on how to enable collapsing sections of text in hugo via a shortcode here.

This guide assumes your shortcode can be enabled with

{{< details "Input collapsed:" "closed" "Input expanded:" >}}
Text goes here
{{< /details >}}

Which renders as:

Input collapsed: Text goes here

2. Create collapse_preprocessor.py:
#

Create the following file: path/to/project/python/jupyter/collapse_preprocessor.py:

from nbconvert.preprocessors import Preprocessor
import uuid

class CollapsePreprocessor(Preprocessor):
    def preprocess(self, nb, resources):
        grouped_cells = []
        collapse_group = []

        def generate_cell_id(id_length=8):
            return uuid.uuid4().hex[:id_length]
        
        def append_collapsed():
                # add details shorttag to beginning and end of collapse group.
                if len(collapse_group) == 1:
                    grouped_cells.append({'cell_type': 'markdown', 'id': generate_cell_id(), 'metadata': {'tags': []}, 'source': f'{{{{< details "1 cell collapsed:" "closed" "1 cell expanded:" >}}}}'})
                else:
                    grouped_cells.append({'cell_type': 'markdown', 'id': generate_cell_id(), 'metadata': {'tags': []}, 'source': f'{{{{< details "{len(collapse_group)} cells collapsed:" "closed" "{len(collapse_group)} cells expanded:" >}}}}'})

                for c_cell in collapse_group: 
                    grouped_cells.append(c_cell)
                grouped_cells.append({'cell_type': 'markdown', 'id': generate_cell_id(), 'metadata': {'tags': []}, 'source': '{{< /details >}}'})

        for cell in nb.cells:
            # check for cell.id
            if not hasattr(cell, 'id') or cell.id is None:
                cell.id = generate_cell_id()

            # check for collapse_cell tag and add to collapse group
            if 'collapse_cell' in cell.metadata.get('tags', []):
                collapse_group.append(cell)
            else:

                # format and add collapse group to grouped_cells
                if collapse_group:
                    append_collapsed()
                    collapse_group = []

                # collapse input/output
                if cell.cell_type == "code":
                    if 'collapse_input' in cell.metadata.get('tags', []):
                        cell.source = f'{{{{< detailsInput >}}}}\n```python\n{cell.source}\n```\n{{{{< /detailsInput >}}}}'
                    if 'collapse_output' in cell.metadata.get('tags', []):
                        new_outputs = []
                        for output in cell.outputs:
                            if 'text' in output:
                                output['text'] = f'{{{{< detailsOutput >}}}}\n{output["text"]}\n{{{{< /detailsOutput >}}}}'
                            new_outputs.append(output)
                        cell.outputs = new_outputs
                
                # add cell to grouped cells
                grouped_cells.append(cell)

        # format and append last cells
        if collapse_group:
            append_collapsed()

        nb.cells = grouped_cells
        return nb, resources

At a high level this is iterating through each cell in your notebook and determining if it has the collapse_cell tag, and if it does then it will group it with adjacent cells with the same tag and wrapping them in our details shortcode. It is also looking for collapse_input and collapse_output and adding placeholder text that will be picked up in postprocessing. Lastly, to ensure compatibility it is adding cell ids if one is not present. This is necessary because the preprocessing is adding in additional cells for formatting and without generating a cell.id nbconvert will throw soft errors that may become hard errors in the future.

3. Update nbconvert_config.py
#

You will then need to update your nbconvert_config.py:

from python.jupyter.collapse_preprocessor import CollapsePreprocessor

c = get_config()

c.TagRemovePreprocessor.enabled = True
c.TagRemovePreprocessor.remove_cell_tags = ['hide_cell']
c.TagRemovePreprocessor.remove_input_tags = ['hide_input']
c.TagRemovePreprocessor.remove_all_outputs_tags = ['hide_output']
c.MarkdownExporter.preprocessors = [CollapsePreprocessor]

This adds your preprocessor to your config so nbconvert knows to use it. Now when you run your conversion with your config it will utilize the collapse_preprocessor.py.

If you are only planning to use collapse_cell (and not collapsing input/outputs) then you can stop here. Using the same command as above nbconvert will use your preprocessing script and group collapsed cells together. To look like this:

3 cells collapsed:
print("collapse_cell_1")
collapse_cell_1
print("collapse_cell_2")
collapse_cell_2
print("collapse_cell_3")
collapse_cell_3

4. Create collapse_postprocessor.py:
#

We now will be creating a postprocessor to fix some of the formatting issues. Create the following file: path/to/project/python/jupyter/collapse_postprocessor.py:

import sys

class CollapsePostprocessor:
    def __init__(self, filepath):
        self.filepath = filepath

    def process(self):
        try:
            with open(self.filepath, 'r', encoding='utf-8') as file:
                content = file.read()

                # collapsed inputs/outputs
                content = content.replace('```python\n{{< detailsInput >}}', '{{< details "Input collapsed:" "closed" "Input expanded:" >}}')
                content = content.replace('{{< /detailsInput >}}\n```', '{{< /details >}}\n') 
                content = content.replace('    {{< detailsOutput >}}', '{{< details "Output collapsed:" "closed" "Output expanded:" >}}')
                content = content.replace('    {{< /detailsOutput >}}', '{{< /details >}}\n')               

            with open(self.filepath, 'w', encoding='utf-8') as file:
                file.write(content)
        except FileNotFoundError:
            sys.exit(f"Could not locate {self.filepath}")

This postprocessor will search for the beginning and end of the detailsInput and detailsOutput placeholders and then format them as our expected details shortcode.

5. Putting it all together:
#

Now we can tie everything together for ease of use. We will be creating a script to execute nbconvert and collapse_postprecessor.py

path/to/project/python/jupyter/ipynb_to_md.py:

import sys
import os.path
import subprocess
from collapse_postprocessor import CollapsePostprocessor

def main():

    # check arguments
    if len(sys.argv) == 1:
        notebook_filepath = input("filepath argument not provided. Please provide: ")
    elif len(sys.argv) == 2:
        notebook_filepath = sys.argv[1]
    elif len(sys.argv) >= 3:
        raise IOError("Invalid # of arguments.  Usage: script.py <filepath/filename.ipynb>")
    
    # check config + pre/post processing filepaths
    directory_path = os.path.dirname(os.path.abspath(__file__))
    config_filepath = os.path.join(directory_path, "nbconvert_config.py")
    preprocessor_filepath = os.path.join(directory_path, "collapse_preprocessor.py")
    postprocessor_filepath = os.path.join(directory_path, "collapse_postprocessor.py")

    for filepath in [notebook_filepath, config_filepath, preprocessor_filepath, postprocessor_filepath]:
        if not os.path.isfile(filepath):
            raise IOError(f"Could not locate {filepath}")
    
    
    # nbconvert w/ preprocessing
    subprocess.run(['jupyter', 'nbconvert', '--to', 'markdown', '--config', config_filepath, '--output', 'index', notebook_filepath])

    # run postprocessor
    output_directory = os.path.dirname(os.path.abspath(notebook_filepath))
    output_filepath = os.path.join(output_directory, "index.md")
    postprocessor = CollapsePostprocessor(output_filepath)
    postprocessor.process()

if __name__ == "__main__":
    main()

To use: python python/jupyter/ipynb_to_md.py path/to/notebook.ipynb

The script will check to see if the notebook file exists, and it will also check for the config files listed above. It then will run nbconvert this time specifying the --output index argument so your notebook doesn’t need to be called index.ipynb (it will generate a index.md regardless of notebook name). Once the markdown conversion is finished the script runs the postprocessor to fix our shortcode placement. The end result is that our input and output collapsed cells will now look like this:

Input collapsed:
print("collapse_input")
collapse_input

And

print("collapse_output")
Output collapsed:
collapse_output

Summary
#

And that should be it! You now will be able to convert your jupyter notebooks to markdown for use within Hugo or other static websites. You also can choose to use the tags hide_cell, hide_input, and hide_output to prevent these from appearing in your markdown. Finally, you can instead collapse cells with collapse_cell (adjacent collapsed cells are grouped), and collapse input/outputs with collapse_input, and collapse_output. This will let you use notebooks for data science or some other purpose, giving a clean report but still preserving the details.