This took a while to figure out. I found three ways to do it — unfortunately they’re all not perfect to work with. But that’s what you get when you pay $0 for this stuff, right?
1/ If you like the visuals in your slides but don’t care about structure
This is what you need to do on MacOS on an M-series Macbook:
% git clone https://github.com/ptsefton/pptx_to_md.git % cd pptx_to_md % poetry install % poetry shell % arch -arm64 brew install imagemagick
Okay, you’re doing great.
Next, make a directory and put a representative pdf in there.
% mkdir ppts % cp <your fav ppt> ppts % cp <your fav ppt converted to pdf> ppts
I’m using an excerpt of one of the past CX reports.
Then you can run the Python script:
% python pptx2md.py --pdf ppts/cxreportgrid_excerpt.pptx
What happens next is nothing short of miraculous. You now have an
index.md that is a markdown conversion of the PPTX that includes the snapshot of the PDFs.
Then you can open it up in VS Code to preview it.
% cd ppts/cxreportgrid_excerpt % code .
Then do a cmd-shift-p to bring up the command palette and open the default markdown previewer.
You’ll see the information extracted — in some cases it’s within the alt tag of the image — and that’s the information you can now work with.
Note that the hierarchy of the slide doesn’t get stripped out automagically. That’s the bummer of this method.
2/ You want to get a little more structure out of your ppt
This is a different repo that’s more recently updated:
% git clone https://github.com/ssine/pptx2md.git % cd pptx2md % pip install pptx2md
When we run it on a simple PPT that looks like this:
And it’s in a subdirectory ppts as
We get the bones nicely extracted. But when trying it on our more complex cxreport PPTX it fails
The good news is that it really tried to extract the images. I guess it was a little too complex.
Let’s instead cross our fingers and add an image to our simpler PPT:
Okay that’s pretty cool because it actually worked. Does it get the alt tags in there? No. Darn. Also it doesn’t get the title vs subtitle as ‘#’ vs ‘##.’
3/ Using pandoc you can do a conversion but just the text, ma’am.
Just go ahead and install pandoc from brew
% brew install pandoc
Unfortunately it can’t read PPT files and you need to convert it to … RTF. Yeah. That sounds bad. Nope. No ALT image information in here either.
Lastly: Want to write and read a PPT instead?
You can use this package that I haven’t dabbled with yet: https://python-pptx.readthedocs.io/en/latest/