Hacker News new | past | comments | ask | show | jobs | submit login

Author here, nice to see the package on the HN's front page this morning and thanks for the kind words! Just created an account to participate in the discussion, I'll try to answer your questions.



I’ve been using this package and like it a lot.

One problem I’d like to find a solution for is how to get past cookie pop ups when scraping a website. I’ve not found a satisfactory packaged solution for this. Clearly a tough problem in general but wondered if people have found good libs to help with this. I’ve heard of solutions involving playwright etc.


Thanks! Here is what I put together in the docs, you could basically preprocess/render/filter the webpages with the software of your choice and then pass the result to trafilatura: https://trafilatura.readthedocs.io/en/latest/troubleshooting...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: