Of course, you're right, but we didn't know that at the time. It turns out there...

zenogais · on Sept 22, 2014

This actually sounds like a great fit for Cassandra and Hadoop. You could use Cassandra's built in TimeUUID as the primary key and have your data pre-sorted on disk in time-series. This would make big queries, using something like Hadoop, very efficient.

jhh · on Sept 22, 2014

That's not necessary though, if it works fast enough on a relational database in my opinion. I think this article has some valuable insight on the topic: http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html

zenogais · on Sept 23, 2014

You'll get no argument from me here. Cassandra + Hadoop only make sense at a certain scale.

imaginenore · on Sept 22, 2014

Yeah, that's exactly what I suggested in the other comment. If you can get away with sharding, it's the easiest solution, and the developers/DBAs are relatively easy to find.

Glad it worked out for you guys.

jwegan · on Sept 22, 2014

FYI, for people using less than 1 TB of data, Vertica does have a free community edition. I used Vertica at my last job and it was blazingly fast compared to Hive (like 5x to 10x faster).

michaelcampbell · on Sept 23, 2014

Is that 1TB compressed or "raw"?

jeremymcanally · on Sept 23, 2014

For future reference, it really sounds like InfluxDB might be a perfect fit for you. I've been trying to find a reason to use it myself, but we don't do a lot of time-series stuff at work (right now at least).

nl · on Sept 23, 2014

Isn't this almost exactly the use-case Cassandra is designed for?