Open
Description
The Jupyiter notebook file format supports Markdown cells, which can contain links. Currently, we extract links using our plaintext extractor, which can lead to false-positives. For instance, see the discussion here: #1658.
There is a crate, nbformat
, which would allow us to extract the Markdown cells from a Jupyiter file (.ipynb
). This way, we could use a proper Markdown parser for link extraction.
If anyone wants to contribute, take a look at the markdown extractor. The new "notebook extractor" would look quite similar. It would use nbformat
to get all notebook cells and then call the Markdown extractor to find all links.
Help wanted! Comment here if you want to give it a shot or send in a pull request. ✌