Text Summarizers in Python

To become a better data scientist, I thought I should get a little better at what’s out there in the NLP universe. In the 80s I took a Natural Language course with MIT Professor Robert Berwick … is he out there? Let me see. Wow — he’s still out there. Cool! Oh yes, I recall when this cool book came out.

After fighting with a variety of Python 2/3 things and package managers (ick), I finally got a cool piece of code to work called “sumy” by Mišo Belica.

I ran it on text that I wrote about why WordPress is an important thing to consider right now in “Why WordPress?” and the result looks like this with different algorithms applied:

(myenv) sumy luhn --length=5 --file=w.txt

Starting in 2003, a simple solution to make building a website easier evolved into the most common and beloved technology today: WordPress. WordPress is an example of how a focus on simplicity set the course for over a quarter of sites that we know as the World Wide Web today. The breakthrough moment was when the inventors gave away all of WordPress’ underlying code to the rest of the world for free. The complete freedom of WordPress has come with the responsibility to know the underlying code “under the hood” of your site. That way, not only does WordPress remain affordable, it remains your powerful Do-It-Yourself tool for the Internet.

(myenv) JMmbp001:~ maeda$ sumy kl --length=5 --file=w.txt

WordPress is an example of how a focus on simplicity set the course for over a quarter of sites that we know as the World Wide Web today. As a result, WordPress has become the best long-term investment for your time because of its loyal following and its community’s shared determination. But times have changed and the biggest tech companies of the world, and also new upstarts, have been looking to control your participation on the Internet. Over time, they have built sophisticated mousetraps to capture you in their comfortable microworlds, and ultimately limit what you can do within their controlled confines. So today, the WordPress community is hard at work designing coding-free approaches to getting your website going that do not sacrifice any of the power and flexibility of the WordPress ethos.

My favorite is the one-line summary with the ‘lex-rank’ algo:

(myenv) JMmbp001:~ maeda$ sumy lex-rank --length=1 --file=w.txt

There will never be any limitations to what you can build with WordPress — the community fights for your freedom.