This site is built with Hugo, a framework to build static websites. One issue, however, is that Hugo does not support jupyter notebook (.ipynb) files natively. In my search to find how to use jupyter notebooks with my site, I found and tried many options but ultimately landed with using nbconvert.
Want to see what this will look like at the end? Head over to my predicting recipe website traffic project
Things I tried or looked at #
I tried to use over half a dozen options before landing on nbconvert. Some of these were:
-
hugo_jupyter, CLI converter
-
Academic File Converter, CLI converter
-
Hugo Blox, a theme for Hugo
-
Shortcodes, such as:
-
using nbconvert to convert .ipynb to html and then using an iframe to display).
-
Mercury, hosting an interactive notebook
-
Quarto, a different framework to build a site or use it within Hugo
Some of these options I tried and ran into issues, or other solutions seemed to be different than my current needs. While digging through the various options though I realized that several of the options called for nbconvert. Using nbconvert seemed like a natural choice after seeing it used in several different solutions.
Using nbconvert with Hugo #
Here are the steps I took to use nbconvert with Hugo:
1. Install nbconvert #
See installation instructions. Out of the box, nbconvert works well.
jupyter nbconvert --to markdown path/to/notebook.ipynb
This produces a notebook.md
but in order to make it a page that renders in hugo we’ll need to do a bit more.
2. Add Frontmatter #
In your notebook, create a markdown cell as the first cell and add your frontmatter:
---
date: 2024-02-02T04:14:54-08:00
draft: false
title: Example
---
3. Check Folder Structure #
Put your notebook inside of the folder you want your page to be, and rename the notebook as index.ipynb
.
This assumes the following folder structure:
path/to/project
└── content
└── category-name
└── page-name
└── index.ipynb <--- your notebook titled 'index'
Now jupyter nbconvert --to markdown path/to/index.ipynb
will produce an index.md
within your folder.
Run hugo server
and your page should render your notebook fully. And since it is in markdown, it will render with proper colors for your theme (which was one issue I had with some options that wanted to rely on html output).
Alternatively, you can have your notebooks saved in some other folder such and use the
--output index --output-dir path/to/dir
arguments for it to generate an index.md within your specified directory. The first argument changes the output filename toindex.md
and the second argument changes the output folder.
Hiding Jupyter Cells, Inputs, and/or Outputs #
You now can have your jupyter notebook as a hugo webpage, but sometimes you do not want all of your cells to be displayed.
1. Create config file #
path/to/project/python/jupyter/nbconvert_config.py
:
c.TagRemovePreprocessor.enabled = True
c.TagRemovePreprocessor.remove_cell_tags = ['hide_cell']
c.TagRemovePreprocessor.remove_input_tags = ['hide_input']
c.TagRemovePreprocessor.remove_all_outputs_tags = ['hide_output']
2. Tag Cells #
Open your notebook and go to your cells and add the tags hide_cell
, hide_input
, or hide_output
as approropriate.
3. Run nbconvert with config #
Use the following command to run nbconvert and specifying your config file:
jupyter nbconvert --to markdown --config path/to/project/python/jupyter/nbconvert_config.py path/to/index.ipynb
Now that cells/inputs/outputs that were tagged will be removed (not collapsed), meaning they will not be shown in your markdown.
Alternatively, you can store your config file in
.venv/etc/jupyter
asjupyter_nbconvert_config.py
which will be picked up automatically when you use nbconvert without using the--config
argument. Note this will occur even if you are using nbconvert for some other purpose within your environment. If you run into issues with the file not being picked up add--debug
to your arguments and look for the paths nbconvert is checking for the config file.
Collapsing Jupyter Cells, Inputs, and Outputs #
Removing cells is nice in some cases, but I want to be able to collapse sections of my notebook so they are there in case someone is so inclined to read through them. Collapsed sections can let you structure your notebooks to form a report but still contain all the relevant info. This is a little more involved than removing cells/inputs/outputs but we’ll take it one step at a time. Note that I assume you’re not doing the alternative approaches I listed above and if you are you will of course need to adjust things slightly.
The issue I ran into is that when trying to handle collapsing input/outputs within the preprocess, nbconvert expects your inputs and outputs to look a certain way and it will format them accordingly. Specifically, it will format even our detail
shortcode so we get something like this for inputs:
Input collapsed. Click to expand:
print("collapse_input's output")
collapse_input's output
And then the output looks like this:
print("collapse_output")
Output collapsed. Click to expand:
collapse_output
While this works, it isn’t saving any space because the detail shortcodes are actually being put inside of coding blocks which is just a peculiarity with how nbconvert interacts with what we’re trying to do. To solve this we will be writing our own postprocessing script.
1. Enable collapsing sections in markdown with a shortcode. #
First, we need a way to collapse sections of text. In HTML this is done with a <detail>
and <summary>
HTML tags but Hugo does not have a way to utilize these within markdown directly. I have written a guide on how to enable collapsing sections of text in hugo via a shortcode here.
This guide assumes your shortcode can be enabled with
{{< details summary="Input collapsed:" altSummary="Input expanded:" >}}
Text goes here
{{< /details >}}
Which renders as:
Text goes here
Input collapsed:
UPDATE: Hugo has added a details shortcode in version 0.140.0. This guide assumes you can pass an
altSummary
to the details shortcode that toggles the display when clicked. If you are using the default details shortcode then the code provided will need to be modified to remove references toaltSummary
.
2. Create collapse_preprocessor.py: #
Create the following file:
path/to/project/python/jupyter/collapse_preprocessor.py
:
from nbconvert.preprocessors import Preprocessor
import uuid
class CollapsePreprocessor(Preprocessor):
def preprocess(self, nb, resources):
grouped_cells = []
collapse_group = []
def generate_cell_id(id_length=8):
return uuid.uuid4().hex[:id_length]
def append_collapsed():
# add details shorttag to beginning and end of collapse group.
if len(collapse_group) == 1:
grouped_cells.append({'cell_type': 'markdown', 'id': generate_cell_id(), 'metadata': {'tags': []}, 'source': f'{{{{< details summary="1 cell collapsed:" altSummary="1 cell expanded:" >}}}}'})
else:
grouped_cells.append({'cell_type': 'markdown', 'id': generate_cell_id(), 'metadata': {'tags': []}, 'source': f'{{{{< details summary="{len(collapse_group)} cells collapsed:" altSummary="{len(collapse_group)} cells expanded:" >}}}}'})
for c_cell in collapse_group:
grouped_cells.append(c_cell)
grouped_cells.append({'cell_type': 'markdown', 'id': generate_cell_id(), 'metadata': {'tags': []}, 'source': '{{< /details >}}'})
for cell in nb.cells:
# check for cell.id
if not hasattr(cell, 'id') or cell.id is None:
cell.id = generate_cell_id()
# check for collapse_cell tag and add to collapse group
if 'collapse_cell' in cell.metadata.get('tags', []):
collapse_group.append(cell)
else:
# format and add collapse group to grouped_cells
if collapse_group:
append_collapsed()
collapse_group = []
# collapse input/output
if cell.cell_type == "code":
if 'collapse_input' in cell.metadata.get('tags', []):
cell.source = f'{{{{< detailsInput >}}}}\n```python\n{cell.source}\n```\n{{{{< /detailsInput >}}}}'
if 'collapse_output' in cell.metadata.get('tags', []):
new_outputs = []
for output in cell.outputs:
if 'text' in output:
output['text'] = f'{{{{< detailsOutput >}}}}\n{output["text"]}\n{{{{< /detailsOutput >}}}}'
new_outputs.append(output)
cell.outputs = new_outputs
# add cell to grouped cells
grouped_cells.append(cell)
# format and append last cells
if collapse_group:
append_collapsed()
nb.cells = grouped_cells
return nb, resources
At a high level this is iterating through each cell in your notebook and determining if it has the collapse_cell
tag, and if it does then it will group it with adjacent cells with the same tag and wrapping them in our details shortcode. It is also looking for collapse_input
and collapse_output
and adding placeholder text that will be picked up in postprocessing. Lastly, to ensure compatibility it is adding cell ids if one is not present. This is necessary because the preprocessing is adding in additional cells for formatting and without generating a cell.id nbconvert will throw soft errors that may become hard errors in the future.
3. Update nbconvert_config.py #
You will then need to update your nbconvert_config.py
:
from python.jupyter.collapse_preprocessor import CollapsePreprocessor
c = get_config()
c.TagRemovePreprocessor.enabled = True
c.TagRemovePreprocessor.remove_cell_tags = ['hide_cell']
c.TagRemovePreprocessor.remove_input_tags = ['hide_input']
c.TagRemovePreprocessor.remove_all_outputs_tags = ['hide_output']
c.MarkdownExporter.preprocessors = [CollapsePreprocessor]
This adds your preprocessor to your config so nbconvert knows to use it. Now when you run your conversion with your config it will utilize the collapse_preprocessor.py
.
If you are only planning to use
collapse_cell
(and not collapsing input/outputs) then you can stop here. Using the same command as above nbconvert will use your preprocessing script and group collapsed cells together. To look like this:
3 cells collapsed:
print("collapse_cell_1")
collapse_cell_1
print("collapse_cell_2")
collapse_cell_2
print("collapse_cell_3")
collapse_cell_3
4. Create collapse_postprocessor.py: #
We now will be creating a postprocessor to fix some of the formatting issues. Create the following file:
path/to/project/python/jupyter/collapse_postprocessor.py
:
import sys
class CollapsePostprocessor:
def __init__(self, filepath):
self.filepath = filepath
def process(self):
try:
with open(self.filepath, 'r', encoding='utf-8') as file:
content = file.read()
# collapsed inputs/outputs
content = content.replace('```python\n{{< detailsInput >}}', '{{< details summary="Input collapsed:" altSummary="Input expanded:" >}}')
content = content.replace('{{< /detailsInput >}}\n```', '{{< /details >}}\n')
content = content.replace(' {{< detailsOutput >}}', '{{< details summary="Output collapsed:" altSummary="Output expanded:" >}}')
content = content.replace(' {{< /detailsOutput >}}', '{{< /details >}}\n')
with open(self.filepath, 'w', encoding='utf-8') as file:
file.write(content)
except FileNotFoundError:
sys.exit(f"Could not locate {self.filepath}")
This postprocessor will search for the beginning and end of the detailsInput and detailsOutput placeholders and then format them as our expected details shortcode.
5. Putting it all together: #
Now we can tie everything together for ease of use. We will be creating a script to execute nbconvert
and collapse_postprecessor.py
path/to/project/python/jupyter/ipynb_to_md.py
:
import sys
import os.path
import subprocess
from collapse_postprocessor import CollapsePostprocessor
def main():
# check arguments
if len(sys.argv) == 1:
notebook_filepath = input("filepath argument not provided. Please provide: ")
elif len(sys.argv) == 2:
notebook_filepath = sys.argv[1]
elif len(sys.argv) >= 3:
raise IOError("Invalid # of arguments. Usage: script.py <filepath/filename.ipynb>")
# check config + pre/post processing filepaths
directory_path = os.path.dirname(os.path.abspath(__file__))
config_filepath = os.path.join(directory_path, "nbconvert_config.py")
preprocessor_filepath = os.path.join(directory_path, "collapse_preprocessor.py")
postprocessor_filepath = os.path.join(directory_path, "collapse_postprocessor.py")
for filepath in [notebook_filepath, config_filepath, preprocessor_filepath, postprocessor_filepath]:
if not os.path.isfile(filepath):
raise IOError(f"Could not locate {filepath}")
# nbconvert w/ preprocessing
subprocess.run(['jupyter', 'nbconvert', '--to', 'markdown', '--config', config_filepath, '--output', 'index', notebook_filepath])
# run postprocessor
output_directory = os.path.dirname(os.path.abspath(notebook_filepath))
output_filepath = os.path.join(output_directory, "index.md")
postprocessor = CollapsePostprocessor(output_filepath)
postprocessor.process()
if __name__ == "__main__":
main()
To use: python python/jupyter/ipynb_to_md.py path/to/notebook.ipynb
The script will check to see if the notebook file exists, and it will also check for the config files listed above. It then will run nbconvert this time specifying the --output index
argument so your notebook doesn’t need to be called index.ipynb
(it will generate a index.md regardless of notebook name). Once the markdown conversion is finished the script runs the postprocessor to fix our shortcode placement. The end result is that our input and output collapsed cells will now look like this:
Input collapsed:
print("collapse_input")
collapse_input
And
print("collapse_output")
Output collapsed:
collapse_output
Summary #
And that should be it! You now will be able to convert your jupyter notebooks to markdown for use within Hugo or other static websites. You also can choose to use the tags hide_cell
, hide_input
, and hide_output
to prevent these from appearing in your markdown. Finally, you can instead collapse cells with collapse_cell
(adjacent collapsed cells are grouped), and collapse input/outputs with collapse_input
, and collapse_output
. This will let you use notebooks for data science or some other purpose, giving a clean report but still preserving the details.