Well, when "simply" extracting the core text of an article is a task where most ...

LunaSea 4 months ago | parent | context | favorite | on: Minifying HTML for GPT-4o: Remove all the HTML tag...

Well, when "simply" extracting the core text of an article is a task where most solutions (rule-based, visual, traditional classifiers and LLMs) rarely score above 0.8 in precision on datasets with a variety of websites and / or multilingual pages, I would consider that not too bad.