How to Convert PPTX to Markdown

This took a while to figure out. I found three ways to do it — unfortunately they’re all not perfect to work with. But that’s what you get when you pay $0 for this stuff, right?

1/ If you like the visuals in your slides but don’t care about structure

This is what you need to do on MacOS on an M-series Macbook:

% git clone https://github.com/ptsefton/pptx_to_md.git
% cd pptx_to_md
% poetry install
% poetry shell
% arch -arm64 brew install imagemagick

Okay, you’re doing great.

Next, make a directory and put a representative pdf in there.

% mkdir ppts
% cp <your fav ppt> ppts
% cp <your fav ppt converted to pdf> ppts

I’m using an excerpt of one of the past CX reports.

Then you can run the Python script:

% python pptx2md.py --pdf ppts/cxreportgrid_excerpt.pptx

What happens next is nothing short of miraculous. You now have an index.md that is a markdown conversion of the PPTX that includes the snapshot of the PDFs.

Then you can open it up in VS Code to preview it.

% cd ppts/cxreportgrid_excerpt
% code .

Then do a cmd-shift-p to bring up the command palette and open the default markdown previewer.

You’ll see the information extracted — in some cases it’s within the alt tag of the image — and that’s the information you can now work with.

Note that the hierarchy of the slide doesn’t get stripped out automagically. That’s the bummer of this method.

2/ You want to get a little more structure out of your ppt

This is a different repo that’s more recently updated:

% git clone https://github.com/ssine/pptx2md.git
% cd pptx2md
% pip install pptx2md

When we run it on a simple PPT that looks like this:

And it’s in a subdirectory ppts as samplepreso.pptx then:

We get the bones nicely extracted. But when trying it on our more complex cxreport PPTX it fails

The good news is that it really tried to extract the images. I guess it was a little too complex.

Let’s instead cross our fingers and add an image to our simpler PPT:

Okay that’s pretty cool because it actually worked. Does it get the alt tags in there? No. Darn. Also it doesn’t get the title vs subtitle as ‘#’ vs ‘##.’

3/ Using pandoc you can do a conversion but just the text, ma’am.

Just go ahead and install pandoc from brew

% brew install pandoc

Unfortunately it can’t read PPT files and you need to convert it to … RTF. Yeah. That sounds bad. Nope. No ALT image information in here either.

Lastly: Want to write and read a PPT instead?

You can use this package that I haven’t dabbled with yet: https://python-pptx.readthedocs.io/en/latest/