10gen raises $20m for MongoDB

zerosanity · on Sept 12, 2011

I really wish they would straighten out their documentation. I've only looked a few times but it seems very conditional in the description of features like replication sets, etc. I think they should have it organized like the MySQL site has it. E.g. view 5.0 documentation here, or 5.1 here, etc.

The way it is now I couldn't tell you right away what version supports what just by glancing at the docs for a minute.

knbanker · on Sept 12, 2011

Thanks for mentioning this. We're just now starting on a massive reorganization and rewrite of the docs. Look for some solid initial progress by mid-October.

jqueryin · on Sept 12, 2011

I definitely concur with the statement on organizing by release cycle (or perhaps using the PHP convention of showing deprecation and support version information for different calls).

I recently ran into issues with replicaSet myself; finding it hard to locate documentation on using user-based authentication. It boiled down to me eventually locating the necessary info on the Master-Slave page for ensuring I did a db.addUser() on the slave's local db. All in all, I'm much looking forward to a rewrite of the documentation.

kennu · on Sept 12, 2011

And I hope they spend some resources in making MongoDB a first class citizen for Django / Rails projects. Like a complete drop in replacement for MySQL. It's already pretty close. Life would be so good if you could use MongoDB as the default for any project and all gems/apps would just work with it. Goodbye migrations, hello autosharding.

TylerE · on Sept 12, 2011

I don't really think that's desirable. Trying to make a document database pretend it's a relational database may work "ok" for simple things, but when you have code that expects to do things that practical in SQL (think: joins, etc) it'll all end in tears.

dasil003 · on Sept 12, 2011

I think the point was that you should be able to flip a switch and get mongo in a fully functional Rails stack just as easily as you flip a switch and get mysql (sqlite being the default).

FWIW, I agree shoehorning mongo into a RDBMS role is a bad idea, but at the same time, devs who don't understand SQL are shoehorning all kinds of horrendous code into their ActiveRecord apps anyway. I've long criticized many NoSQL advocates (the extreme type who say SQL is dead) as simply being ignorant of the value of SQL and throwing the baby out with the bathwater. As much as I stand by that sentiment, it doesn't mean Mongo doesn't have a viable use case as a primary data store, and if you know what you're doing you shouldn't have to wrestle with Rails to make proper use of Mongo.

JonnieCache · on Sept 12, 2011

>I think the point was that you should be able to flip a switch and get mongo in a fully functional Rails stack just as easily as you flip a switch and get mysql (sqlite being the default).

Mongoid, the premier mongodb rails adapter, has fully functioning model generators and its API is built on ActiveModel, the same as Rails' own ActiveRecord. Thus it is fully compatibile out of the box with the majority of other rails components, such as form builders and authentication systems.

It doesn't really get any more 'flip a switch' than that, even in the world of rails. Since 3.0 rails has been decoupled to the extent that other db adapters etc exist pretty much on an even footing with the rails defaults.

EDIT: apparently the mongomapper adapter also uses activemodel these days.

TylerE · on Sept 12, 2011

That would be sane. It would also NOT mean "all gems/app just work with it", which is what GP said.

deweller · on Sept 12, 2011

From what I've read, I wouldn't pick MongoDB my first choice for a large data warehouse. Maybe 10gen can change that.

colonhyphenp · on Sept 12, 2011

Can you go into some more detail why you would not choose it? I am curious.

deweller · on Sept 12, 2011

I don't have facts to back up my opinion - It is formed by memories of articles I've read.

I believe CouchDB is a better choice for very large data sets because of its design.

+ CouchDB uses a Map Reduce design that I believe would scale better over very large data sets. + CouchDB always stores data in a consistent state on disk. You can literally pull the plug on the server at any time and the data will never be inconsistent.

MongoDB is geared for performance and is a great bridge between a relational database and a high-performance No-SQL database. But I don't recall that it's strength is handling large datasets.

ethangunderson · on Sept 12, 2011

Comparing map/reduce in Mongo and Couch is really apples and oranges. They are designe to do two different things. i.e data processing vs building views.

Mongo is designed from the ground up to deal with large datasets. Take a look at their sharding architecture.

catch23 · on Sept 12, 2011

I guess it depends what you consider "very large". If you're talking about multi-petabyte, then I'd probably use hdfs, but otherwise mongodb might fit. I hear craigslist uses mongodb to store their data since 1997, which is a fair amount of data I believe.

jzawodn · on Sept 12, 2011

Uhm. The data goes back that far but the usage is far more recent than that.

Wijnand · on Sept 12, 2011

1997, are you sure?

simonw · on Sept 12, 2011

See http://blog.mongodb.org/post/5545198613/mongodb-live-at-crai...

duggan · on Sept 12, 2011

When he says "store their data since 1997" I believe he means "store data dating from 1997 on" - slight confusion in choice of wording!

alnayyir · on Sept 12, 2011

CouchDB is better'ish for larger datasets, but not for arbitrary scaling. MapReduce in CouchDB requires dumb full-scans if you're not just refreshing an existing view.

Arbitrarily large data is the exclusive domain of hadoop/hypertable/cassandra AFAIK atm.

tmcneal · on Sept 12, 2011

To be fair CouchDB is very explicit that to get any sort of performance, everything must be a view. "Ad-hoc queries" (i.e. queries that are written on the fly instead of uploaded as a view) are clearly stated as "for development only".

Where CouchDB really falls flat is for write-heavy applications. The default configuration in CouchDB is to not reindex a view until it has been read. When a read occurs, any new data in a view that was added since the last read must be re-indexed by executing the map/reduce functions on that data. If you're writing frequently to CouchDB but not reading a lot (as in a data warehouse) the first query you run is going to be extremely slow, since it will need to run map/reduce on a lot of new data. CouchDB doesn't distribute work to multiple nodes like Hadoop, and I've found even simple reduce functions to slow down re-indexing by a factor of 10. I think CouchDB has settings now to update the index on commit, or you could always run a cron job to regularly query the view and force a reindex, but it's still going to be slow.

BigCouch (https://cloudant.com/#!/solutions/bigcouch) might be a potential choice for data warehousing, since it advertises full compatibility with the CouchDB API but offers distributed map/reduce like Hadoop/Hive/etc. I haven't used it though.

alnayyir · on Sept 12, 2011

Couch is definitely a lot more honest about their limitations than mongo or riak, but my experiences make me hesitant to recommend it to anyone not intimately familiar with those limitations.

jchrisa · on Sept 12, 2011

This is part of what we are addressing with Couchbase Server, an autosharding rebalancing Couch fronted by memcached. For K/V read and write we measure microsecond latency.

We are currently optimizing the views for cluster access, but the design goal is to offer at least the query performance CouchDB offers on small datasets, even on very large clusters.

More info: http://blog.couchbase.com/couchbase-server-2-0-tour-and-demo

alnayyir · on Sept 13, 2011

Is it going to have the same/similar multi-replication for enterprise only limitation that Riak has or is this going to be funded in some other way?

noodle · on Sept 13, 2011

i would not use it in future projects, myself, because my company is currently using it in several products in several different ways, and it has been nothing but headaches, problems, etc..

the theory behind the thing is great. in reality, its buggy and not fun to work with.