Maybe bs4 + newspaper3k rolled into one? But still, what's the gap?

adbarba · on Aug 15, 2023

Regarding content extraction it's more accurate than newspaper3k (especially for languages other than English) and it entails more information: metadata, text, and comments. It works out of the box in most cases so no need to write a particular scraper for a given websites, which saves time. If you care about 2-3 websites and are willing to write and maintain scraping scripts then bs4/lxml/whatever is also fine.

It also features functions and a command-line interface to collect data on your own (say find recent news using feeds). So it's not merely about text extraction in the end but also text discovery.