Hacker News new | past | comments | ask | show | jobs | submit | kwadhwa's comments login

Hey, Kshitij from Rockset here.

You are correct that Rockset is doing text extraction for PDF but the main value here is that you can join this data with other data sets that are in JSON, CSV, XLS or Parquet formats using SQL without doing any ETL.


Hey, Kshitij from Rockset here.

With Rockset you can avoid ETL when it comes to extracting and manipulating the data. Also, the main value here is that you can join this data with other data sets that are in JSON, CSV, XLS or Parquet formats using SQL to help in analysis.


Maybe you could add modules for extracting and manipulating data from popular sources. Such as the most popular social media. Also Amazon, Craigslist, Ebay, etc. And the main search engines.

There are many people who want usable data from such sources. And your service wouldn't be doing any scraping, so you'd probably be OK legally. But IANAL, so do check.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: