InfiniDB goes out of business

samcrawford · on Sept 22, 2014

Sad to see. We were one of their early paying customers in 2009/2010 (we got a great deal). Performance was fantastic over very large datasets, but the bugs, storage requirements (very expensive SANs) and query limitations became big problems for us. We moved away after a year.

Just recently one of our team has been looking at them again (following a strong benchmark being posted at mysqlperformanceblog). So I went and checked out the forums and saw it was pretty much in a similar state to when I last used it four years ago.

Sad to see they couldn't make it work. The team was always really friendly and quick to help with issues - good luck in the future.

imaginenore · on Sept 22, 2014

There are so many scalable (and often free) databases out there. What do you that can't be done in Cassandra or Mongo or Vertica?

swartkrans · on Sept 22, 2014

Honest question, is MongoDB actually scalable? I keep reading about how it is not for serious business[1][2][3][4]. Who is using Mongo at scale and what sort of loads are they seeing and are the level of problems on par with other solutions? As a developer I like Mongo's API, but I would worry about using it given what has been written about it.

[1] http://aphyr.com/posts/284-call-me-maybe-mongodb

[2] http://svs.io/post/31724990463/why-i-migrated-away-from-mong...

[3] http://use-the-index-luke.com/blog/2013-10/mysql-is-to-sql-l...

[4] https://news.ycombinator.com/item?id=8353228

deepsun · on Sept 22, 2014

I agree with you, per our experience Mongo is not scalable. We use Mongo in our company (60GB per db.stats().fileSize), both for relational data, and for what we call "fat data" (just to avoid calling these mere gigabytes "big data" -- those are just lots of statistical numbers, more like OLAP cubes).

Over many problems and sad surprises, we came to the point that we'd better have used PostgreSQL (or any other SQL) for relational data, and some NoSQL for "fat data". Or just single PostgreSQL would do better, because Mongo's feature set is a subset of PostgreSQL's feature set. Just Mongo renamed them and pretends it's something new.

StillBored · on Sept 23, 2014

  "fat data" (just to avoid calling these mere gigabytes "big data"

Mere GB's of data is just a normal database.

I have a rule that should be more widespread:

If you can put the database in RAM on a x86 server, its not "big data" by any stretch. Then beyond that, it becomes more complex, but for starters lets consider whether the indexes fit in RAM.

If the indexes/hashes for your data cannot fit in RAM on a commodity x86 then you probably can consider that you have "big data".

So, currently its possible to buy supermicro systems that take 6TB of RAM (just a normal QPI link) without getting into any of the exotic SSI systems (like the SGI UV 2000).

We should also avoid talking about the physical plant requirements for "big data" as well, since as its possible to put over 350TB of storage in 4U with products from Nexsan, JetStor, etc. That is over 3PB per rack...

So, you can call your data set "big data" if the indexes are > 6TB or the actual data set is > 3PB. These numbers will change next year when new machines/storage arrive.

techdragon · on Sept 23, 2014

I find it easier to think about it from a data "usage" angle, "if you can query anything you like while running the 'database' on a single machine and get your answer within 24hrs your data is not 'big'"

I used to see 24hr OLAP cube runs and no one ever called that 'big data'. It's entirely a question of scale in my mind, because these days you can buy truly gigantic servers, and they are phenomenally powerful but if your data needs multiple servers dividing the load in order to perform queries in a timely manner then you start talking about big data, it's a question of scale.

deepsun · on Sept 23, 2014

Yes, we call it "fat data" because of different approach to it, "bigdatey". E.g. it's not relational -- we don't JOIN it with anything (we intentionally designed it so that we stay scalable while growing), and it would perfectly fit a NoSQL datastorage (just don't confuse NoSQL with MongoDB :)

After working in Google I (with another teammate) kinda "feel" what's big data: it's more about approach, mindset and toolset to work with it. I agree with you and @techdragon that if you can fit the data into one machine it's probably not really big. But one can also work with 1GB of data using bigdata approach, what we call "fat data". When we grow out of a single machine we won't need to rewrite our project.

Nevertheless, this all doesn't prevent our sales team to say "big data" and "cloud" in every their phrase :)

nl · on Sept 23, 2014

Or just single PostgreSQL would do better, because Mongo's feature set is a subset of PostgreSQL's feature set.

Just a slight warning on this, because I was mildly burnt by hearing this and assuming it was true.

For background, I'm a long time happy Postgres user.

Recently I decided to use it for a new project instead of Mongo and to utilise the new JSON support.

Turns out it is great for CRD apps, not so much for CRUD.

The current release version of Postgresql (9.3) has no capability to do updates to parts of JSON stored as the JSON datatype (ie, you have to read the entire JSON blob, change it and write it back)[1].

Updates within JSON fields are coming in 9.4.

[1] http://dba.stackexchange.com/questions/54663/how-can-i-updat...

jbellis · on Sept 23, 2014

How sure are you that mongodb doesn't rewrite the entire document with each update? I couldn't find a definitive answer without source diving.

calpaterson · on Sept 23, 2014

Some updates are done in place, some are not. To give a specific example - integer updates are always done in place. The most common reason that a document has to be rewritten is when it is updated size is too big to fit into its previous space. Then it has to be moved.

http://blog.mongodb.org/post/248614779/fast-updates-with-mon...

I have found "source diving" a scary experience with Mongodb in the past.

jbellis · on Sept 23, 2014

Could be, but that article just says that the update is "in place." Which may mean that just the changed int is written, or it may be the entire document.

tracker1 · on Sept 23, 2014

But, it's in the box API functionality... since it's a single record, there's minimal locking involved. Depending on your load that can be very important.

For that matter, imho doing a locked partial update via something like _.assign would be fine imho in postgres. It depends on how you really need to use your data... and how it fits into that.

If you have a lot of recursion in your data, it may be better suited towards sql... if you have a lot of data gathered around a group of objects/documents a document db like mongo/rethink/elasticsearch may be best... if you really need key/value lookups, then cassandra is hard to beat.

For that matter, having data duplicated/replicated to multiple types of db servers is entirely reasonable. You management UI can interact with an SQL datastore, and on save, you also save to Mongo.

That was the interim step I chose in migrating our data structures.. the queries that run against mongo work great, there's three servers in the replica set, for a relatively small data set, and it is really nice.

nl · on Sept 23, 2014

I'm more familiar with CouchDB, and I know Couch doesn't have to do that.

Even if MongoDB does at least you don't have to write the functionality yourself in client code.

jbellis · on Sept 23, 2014

Right. It's a confusing space because even though couch/riak/mongo are all "document databases" their storage engines are very different. You can't usefully generalize from couch's append-only b-tree to mongodb's mmap'd data files.

deepsun · on Sept 23, 2014

You're right. I just meant a little different thing: that we'd better store JSON in conventional way of storing values in schema-full columns. For example, object {foo: {bar: [1, 2], a: 'b'}} would be three columns foo.bar.0=1, foo.bar.1=2, and foo.bar.a='b'. In medium-sized project you'll write DAO layer to convert DBMS view on data to application code view on data anyway for various purposes. It would be even better, because column order in SQL doesn't matter (but in Mongo it does for searching, as it's essentially just BSON string). And column names won't be repeated each time and take space.

Of course, if you have a case to store unstructured data where you don't know the structure in advance, in this case it won't work for you. But for your own data -- we'd better let DBMS maintain the schema, instead of maintaining it in application code (inventing the wheel).

Side note, regarding updates of particular fields in objects: a NoSQL datastore must provide some "compare-and-set" functionality to avoid race conditions during updates. PostgreSQL way is to use row-level-locking transactions, but MongoDB locks the whole collections (well, several months ago it blocked the whole database, so it's and improvement :). They kinda offer findAndUpdate() for "compare-and-set", but see my another comment below on why it doesn't work.

annnnd · on Sept 23, 2014

Great info, thanks! We are using MongoDB for single-server installations and it fits our needs pretty well. Our biggest gripe is that disk space is not freed when collections are dropped, but we can live with it. Other things are just minor inconveniences which plague other DBs too. In our experience MongoDB "just works" - but then again, we don't use its scaling capabilities.

That said, we are looking around to see if there is an even better document-oriented DB available and PostgreSQL looks interesting (with its JSON). Haven't had the chance to try it yet though. Another interesting option is OrientDB (having graph database would be beneficial - but only for small part of our system). Does anyone have experience with other document-oriented storages? (primarily single node usage)

lvca · on Sept 23, 2014

OrientDB is the most interesting alternative to MongoDB with support for Multi-Master replication and relationships: you can decide to embed or link documents:

http://www.orientechnologies.com/orientdb-vs-mongodb/

donw · on Sept 22, 2014

This fits with my experience as well, with (possibly) a larger data set.

MongoDB is actually pretty good as a document store, where you can accept soft commits, and are dealing with non-relational data that doesn't need to be aggregated.

When you start needing more than just a loose pile of documents, or live in a world where you really need ACID, Mongo has this nasty habit of falling down on the floor and twitching.

Those problems are solveable, but it doesn't just happen "out-of-the-box".

e12e · on Sept 23, 2014

I wonder, what do you get over exporting a file system over webdav, if you only do "document storage" of json documents? After all filesystems are made for storing a hierarchy of files... some indexes for searching? More convenient rest api?

donw · on Sept 23, 2014

The API is much more convenient, and last time I checked, WebDAV didn't provide the same wealth of query options or multi-node replication. Plus, I can get solid commercial support for MongoDB, which matters a lot if you're using it to power a business.

e12e · on Sept 24, 2014

Multi-node replication for webdav would be done at the file system level (eg: gfs or whatnot). By query options, to you mean some kind of join, for example? Obviously anything beyond plain id-based get/set (by file path/url) isn't possible with "just" webdav. But if you already limit yourself to looking at single documents (ie: json-files) -- you could always fetch doc1, find the id/path of doc2 in doc1 and fetch that. I'm not saying this sounds like a sane db strategy, I'm just not convinced doing the same thing, with a little easier api sounds like a sane strategy either?

cal2014 · on Sept 23, 2014

Like all technologies MongoDB is a tool and you need to pick the right tool for the right job or in this case the right DB for the right work load. We run 100's of TB of MongoDB with 100's of millions of Mongo Ops. Does it have challenges for sure, does mean you can't scale it no. Here's a talk about scaling MongoDB specifically. http://engineering.objectrocket.com/2014/03/10/scaling-mongo...

deepsun · on Sept 23, 2014

Right, but we're disappointed that MongoDB markets itself to fit into NoSQL niche, while it doesn't. If it honestly declared its shortcomings, we'd have nothing against, we'd love it.

For example, it doesn't support crucial for NoSQL "compare-and-set" functionality to avoid transactions/locks. Their best suggestion is to use findAndUpdate() with the full object for "find". And it works (though slow) when you have static schema. But over time you'll want to change your objects, and findAndUpdate() won't find them anymore. Grief. Also, order of fields in nested JSON object matters for findAndUpdate(), so have a happy time debugging why it doesn't find some objects anymore.

colordrops · on Sept 23, 2014

You can scale the file system too but it doesn't mean that it's well suited to the job.

x0x0 · on Sept 22, 2014

No, and even modest write loads will cause immense pain -- there was a single lock per db when I last used them. Plus lots of the standard tricks for getting performance out of dbs don't work for them. I'd recommend staying away unless your performance needs are very moderate and you must have unstructured tables.

rogerbinns · on Sept 23, 2014

There is still a single lock per db. It also uses huge amounts of space (around double JSON of same data). That puts increased pressure on memory. It also has no usable compaction. And can be rather dumb with indexes. (note)

The main mongo solution for things is to run lots of copies of it across many machines each with small chunks of data. In theory what is then a pain point on bigger systems becomes lots of lesser pains on small systems.

(note) every criticism is answered with how the next release will improve things. Sometimes that happens.

_c3bx · on Sept 22, 2014

Check out tokumx, it fixes a lot of the problems with mongo.

cheald · on Sept 23, 2014

Seconded; we've had a very positive experience with it so far. We're running around 1.5TB in it (and growing) and it's working pretty decently.

Our biggest problems with it have been CPU saturation due to compression (solvable with sharding) and oplog size (due to supporting ACID; supposedly much better in the upcoming release), but both of those are surmountable. In exchange we get massively better disk usage characteristics, no global locks, ACID compliance, transactions, and generally better performance. It's not perfect, but it solved a lot of our problems.

ub · on Sept 23, 2014

My experience with MongoDB has been terrible. Apart from just look-ups I don't think it's meant for much data wrangling. Joins with different collections are harder to do. I see the best use case of Mongo is for data dumps.

tracker1 · on Sept 23, 2014

It's pretty good for a document store.. partial updates to documents, as well as indexing work well. Setting it up for replica sets with auto failover is much easier than say PostgreSQL, as is the API interface (especially geo searches). It's run well for most of my own uses with it, though I do keep an eye towards RethinkDB as well as ElasticSearch and Cassandra.

RethinkDB really needs to get the auto-hot failover and geo searches worked out, geo is on the table for the next release iirc, and failover the next after.

Cassandra is great for key/value searches, but falls down for range queries.

ElasticSearch is pretty awesome in its' own right, but not perfect either.

PostgreSQL has a lot to offer as well. 9.4 should be pretty sweet, and if they get automagic failover in the community versions worked out, I'm totally in.

It really just depends on what your workload is... MongoDB offers a lot of DB-like scenarios against indexes in a single collection, a clean set of interfaces, and a fairly responsive system overall. There have been growing pains, and problems... the same can be said of any database.

To each their own, it really just depends on your needs, and for that matter how far out your project's release is, vs. how long you need to support it.

Right now, I'm replacing an over-normalized SQL database structure that is pretty rancid. Most of the data fits so much better with a document db it isn't funny. When I had done the first parts, I had issues with geo searches in similar alternatives, and that has been a deal breaker for a lot of the options.

You don't use a document store if you need to use joins.. you're better off either duplicating the data, or using separate on-demand queries... odds are your data isn't shaped right and you should have used a structured database, or you aren't thinking of the problem right.

YMMV.

darkarmani · on Sept 23, 2014

> Setting it up for replica sets with auto failover is much easier than say PostgreSQL

MongoDB replica sets are for availability not for consistency. Even with a write concern of majority, you can still have inconsistency. Without heavy load you might never see this race condition.

baudehlo · on Sept 23, 2014

I don't see what's complicated about the Postgres goesearch API. Just use earthdistance not PostGIS.

illumen · on Sept 23, 2014

You may not need joins, but someone else May.

Someone else likely Will in my experience. Or I will, when a new requirement comes in.

tracker1 · on Sept 23, 2014

Again, I'd say it depends on the data... either by shape or need. I also wouldn't use NoSQL for highly structured and relational data. For example, a classifieds site, absolutely yes to nosql... for comment threads, I'd favor SQL.

If you need certain reporting, does it have to be real time, is real time enough okay, and what are the other needs. I find that sometimes duplicating data (one point of authority) is better than using one or the other.

aikah · on Sept 23, 2014

Mongodb is great ... for logging stuff,or quick prototyping.It's not that fast on writes ,but pretty fast on reads.

In my opinion,what people usually really want is a RDBMS + a full text search engine like elastic search. But again,one needs to set these things up.

Mongodb didnt have aggregating features in the past,and their map/reduce feature is not that good.But again,the product is still young,maybe it will get better.

notastartup · on Sept 23, 2014

Agreed but maybe it was because at a startup I worked for a few years ago used Meteor.

samcrawford · on Sept 22, 2014

Of course, you're right, but we didn't know that at the time. It turns out there's nothing we were doing that really warranted using a data warehouse (for that is what InfiniDB was really suited for). We were collecting a lot of numerical, time-series data, bulk loading it and then performing summarised queries periodically.

Mongo, Cassandra, etc are not good fits for this. Vertica was very expensive. In the end we went with a sharded and partitioned MySQL setup (partitioning really is great if you use it well). It's worked very well.

zenogais · on Sept 22, 2014

This actually sounds like a great fit for Cassandra and Hadoop. You could use Cassandra's built in TimeUUID as the primary key and have your data pre-sorted on disk in time-series. This would make big queries, using something like Hadoop, very efficient.

jhh · on Sept 22, 2014

That's not necessary though, if it works fast enough on a relational database in my opinion. I think this article has some valuable insight on the topic: http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html

zenogais · on Sept 23, 2014

You'll get no argument from me here. Cassandra + Hadoop only make sense at a certain scale.

imaginenore · on Sept 22, 2014

Yeah, that's exactly what I suggested in the other comment. If you can get away with sharding, it's the easiest solution, and the developers/DBAs are relatively easy to find.

Glad it worked out for you guys.

jwegan · on Sept 22, 2014

FYI, for people using less than 1 TB of data, Vertica does have a free community edition. I used Vertica at my last job and it was blazingly fast compared to Hive (like 5x to 10x faster).

michaelcampbell · on Sept 23, 2014

Is that 1TB compressed or "raw"?

jeremymcanally · on Sept 23, 2014

For future reference, it really sounds like InfluxDB might be a perfect fit for you. I've been trying to find a reason to use it myself, but we don't do a lot of time-series stuff at work (right now at least).

nl · on Sept 23, 2014

Isn't this almost exactly the use-case Cassandra is designed for?

rubinelli · on Sept 23, 2014

My bet is what really took the wind out of InfiniDB's sails was Amazon's Redshift. It's new, shiny, ridiculously easy to use, has a strong pedigree and a fairly low initial capital outlay.

mhoglan · on Sept 23, 2014

Not really. Redshift is nice don't get me wrong, but we did not run into many people that were choosing Redshift over InfiniDB. The major players out there still run everything on premise and behind their firewall.

btw, InfiniDB was originally going to be the backend to Redshift. Lets just say the previous executive team 'screwed that up'

trhway · on Sept 22, 2014

is where a good way of doing MPP SQL in Cassandra or Mongo? and Vertica is really far from being free, isn't it?

electrum · on Sept 23, 2014

Presto is a fully distributed (MPP) SQL query engine that supports Cassandra: http://prestodb.io/docs/current/connector/cassandra.html

You can even join together data from Hive, Cassandra, MySQL, PostgreSQL, Kafka, etc., all in one query. We don't have a connector for Mongo yet, but contributions are welcome!

The Cassandra connector was actually an external contribution. We don't use it at Facebook.

kermatt · on Sept 22, 2014

Vertica has a community tier, free up to 1TB of source data, and a 3 node limit.

SQL is not part of the Cassandra or Mongo feature sets. They have certain analytic possibilities, but not if you want to use SQL (window functions for example), or most of the BI client tools associated with data warehousing.

imaginenore · on Sept 22, 2014

Postgres-XL can do that.

There's also Shard-Query and (not free) Amazon Redshift.

Or you could just shard your regular community Postgre or MySQL.

kermatt · on Sept 22, 2014

See also https://github.com/citusdata/cstore_fdw

scottkrager · on Sept 23, 2014

Dang....they just raised $7.5m in February? http://venturebeat.com/2014/02/10/calpont-renames-itself-inf...

Curious if they are returning money to VC's or just really burned through that much in 7 months...

psaintla · on Sept 23, 2014

I once worked at a startup that burned through 3x that in about six months. 7.5m in six months is entirely possible.

rdtsc · on Sept 23, 2014

On what? Did they hire all their friends and payed them huge salaries and signing bonuses. Fancy hardware?

Arcanum-XIII · on Sept 23, 2014

For actually being in a company like that it's mostly that and bad overall management - at a time we did have a ratio of two (useless...) PM for on dev. Dev underpaid, PM overpaid (as expert freelance, of course, or with bonus), some huge expense made for stuff that make no sense (branding), negotiating badly some contract so that they cost the company more than they provide...

Of course after a while this stop since all these mismanagement corrupt the products too, and pretty fast you're left with a huge black hole for money that doesn't produce anything meaningful.

pkaye · on Sept 23, 2014

I'm guessing they are no longer around?

notastartup · on Sept 23, 2014

My god building what? 7.5 million dollars in 6 month. I hope it's was worth it.

meritt · on Sept 23, 2014

Building retirement packages probably

NamTaf · on Sept 23, 2014

I've been playing with Infobright community edition but also evaluated InfiniDB. I found InfiniDB was not compressing my data nearly at all whereas Infobright was utterly jamming it down - factors of over 300x for even small datasets of ~2m rows.

I don't know what the differences are to produce that, but when it comes to storing as much crap as I was looking at, I was willing to design around the limitations that Infobright CE has (ie: no insert/update queries) rathre than deal with the massive extra disk cost. I have currently got 223m rows sitting in Infobright and it's taking about 38MB.

I really hope that the OSS project takes off and that InfiniDB sees some better compression implemented, similar to Infobright. The extra features that InifiniDB has over Infobright CE (not only insert/update but also a multi-threaded infile loader, for example) would convert me if only the compression were better.

I'm all new to this though so if there's some good reason why they differ so greatly I'm all ears and would love to know. Maybe I screwed something up in the configuration? I'm not sure.

Either way, it's sad to see them go. Columnstore databases fill a really useful application that I can only see growing over time as more and more operational data is collected by industry.

eis · on Sept 23, 2014

223 million rows in 38MB would mean less than a fifth of a byte per row. Is your data really that extremely repetetive?

NamTaf · on Sept 23, 2014

Phenomenal. Some is binary (sparse, too - 99% false sort of thing), one other is an integer from -1 to 8, some are floats (decimal degrees for GPS) and one is a datetime. 38MB is what is reported when I use the following query, which I found somewhere:

SELECT table_schema, sum (data_length + index_length) / 1024 / 1024 'Data Base Size in MB', TABLE_COMMENT FROM information_schema.TABLES GROUP BY table_schema

You're right though - the cost per row sounds really low, I might try to find it via other means tomorrow and report back.

NamTaf · on Sept 24, 2014

Yup, just confirmed that the raw folder size of the database is 39.8MB and reports 39MB (not 38) in my query. It is also 222.78M rows. Note that you need to remove the space between the sum and the bracket if you want to run that query.

I can only assume that IB does some sort of differential compression on the data to get such a small filesize on that data and that it's an artifact of the machine data I'm using. But that's what I think the next big wave of data will be - stuff generated by data loggers on machines and equipment and analysed to scrape out incremental improvements on efficiency, reliability, etc. from previously relatively low-tech industrial corporations.

dmpk2k · on Sept 23, 2014

Somewhat unrelated: some filesystems also provide online compression (e.g. ZFS). I think that's a more appropriate layer to apply general compression.

darkarmani · on Sept 23, 2014

Doesn't that mean you'll always be loading uncompressed data into memory?

dmpk2k · on Sept 24, 2014

If your database has specialised compression, or keeps most data in main memory compressed (I can think of at least one that does), then this approach won't work.

For almost everything else though, putting compression in the filesystem layer is better.

bradhe · on Sept 23, 2014

InfiniDB is/was a great idea. The unfortunate bit was just what a Rube Goldberg machine of a data store it was. I spent a good few days just getting everything provisioned from our automation.

karavelov · on Sept 22, 2014

It looks most of the successful MPP analytic databases are based on PostgreSQL (e.g. Greenplum, Redshift). It's sad to see that InfiniDB could not make MySQL work for them reliably.

samcrawford · on Sept 22, 2014

I don't think it's fair to characterise this as a MySQL vs Postgres situation. Having used InfiniDB quite a bit I am familiar with how it related to MySQL. It basically only used MySQL as a frontend (meaning as an application you could use the mysql libs and query protocol); everything beyond that was InfiniDB's own stuff.

karavelov · on Sept 23, 2014

Yes, you are right. This is true also for the "PostgreSQL" databases I mentioned - they use only the frontend of Postgres.

_pmf_ · on Sept 23, 2014

> It's sad to see that InfiniDB could not make MySQL work for them reliably.

Integrating with MySQL at this level would probably only make sense if the team was already familiar with the internals; otherwise, PostgreSQL's code base would be a much cleaner choice.

mhoglan · on Sept 23, 2014

The team knew MySQL very well internally, it is not so much that MySQL did not work out, it was very nice for people who needed to grow out MySQL to jump over to InfiniDB. MySQL going from independent to Sun to Oracle changed a lot of how friendly it is to do these kind of interactions.

I imagine the PostgreSQL players in MPP space such as CitusDB have done very similar things that we did with MySQL. And it is not that InfiniDB could not move away from MySQL, but that is a lateral sideways move, and has to be funded for no advancement in benefit.

mhoglan · on Sept 23, 2014

Clear up a few things here for all the speculation. I was an architect at InfiniDB that came on in Nov 2013 to build out the Enterprise Manager which was coming to alleviate many of the provisioning, management and monitoring woes that customers were experiencing and help modernize those aspects. The first beta offering of which was early July, unfortunately the ship had sailed so to speak. So I know first hand how and why things did not work here. As with all things, take your lessons learned, move on. Success is not the path of learning, failure is (see survivorship bias)

Some notes: Labeling InfiniDB as MySQL+ is a gross underestimation of what it does. MySQL is used as the front end query parser, and that is about it. Everything else behind it was custom written, and that is where the power is.

As with all DB technologies, your use case is the primary thing that determines your mileage. Comparing InfiniDB to MongoDB is one of the first signs to me that you don't fully comprehend the differences between database architectures. For the use cases that InfiniDB was made for, we routinely were faster performing on a smaller footprint. Using InfiniDB as a document store can be done, but that is not what it was made for.

What people call 'big datasets' is relative. Some think 500GB is alot, some think 5TB alot. Coming from telecommunications monitoring background, I will appreciate your dataset when we are talking TBs a day of churn per monitoring point with hundreds and thousands of monitoring points. The size of dataset you are working with, along with your use case for analysis is the two most important things in determining the technology stack. InfiniDB operated at these higher end scales very efficiently. There is a reason why Impala was a primary comparison, and we would usually operate on fraction of the hardware they needed.

Best technology does not always win. See InfiniDB.

Decisions made by previous executive teams years in the past can set a course that cannot be corrected sometimes (not efficiently or without alot of money)

Patents are worth their weight in gold.

Being open source is great for the community, but is a challenge to a business to build consistent revenue. There were many big projects running InfiniDB with the open source version, but not contributing to revenue. Even if they did sign up with support, you need custom feature development and other big ticket items to make impact. Or you have to build a large customer base paying for support, and that takes time. With the multiple iterations of adapting the technology to different architectures over the years, that was hard to retain those customers consistently. Also many customers will pay for support for their rollout or initial deployment, but when the project is done, they feel they are adequate enough to live with open source only.

Just because a company raised $X in a month, does not mean all that money is slated for going forward from that month. On top of that, payroll is not cheap, and you would be surprised how quickly you can burn through money keeping the lights on. For those of you who think people at startups are working for pennies on the dollars, I would advise you it is not the case. And if you are one of those people, I wish you the best, and odds are there are other reasons why you are doing so. Why would good engineers work at discount? Equity? There is not enough of the pie to go around to make that sustainable. Most startups pay competitive market salaries.

InfiniDB was at a junction where it was time to go for it, or go home, and that is exactly what happened in 2014. The marketplace for data solutions with Hadoop rising, other MPP vendors consolidating, and bigger players entering the field, made it very competitive, and the time to swing for the fence was now, versus treading water and hoping.

Even with stars aligned and everything else, all you have done in a company is weight the opportunity of it succeeding, not guaranteed.

I really enjoyed my time with InfiniDB and the team there. I really do feel its a missed opportunity with some decisions that could have been changed several years ago. Not securing patents and probably choosing MySQL as a frontend are some of those.

Side note, core group of us at InfiniDB have landed at Tune, a company that has appreciated the technology of InfiniDB and what it can offer for their solutions. Look forward to this new opportunity and what we can provide to the ad and mobile analytics space.

tptacek · on Sept 23, 2014

Patents are worth their weight in gold.

I guess I agree, in that $1220/oz x 0 is 0.

I have my name on a couple patents. I've never seen a situation in which patents actually helped a viable business.

mhoglan · on Sept 23, 2014

Try selling a company and being asked, what patents or protections do you have for someone not doing exactly what you have done.

Also try defending yourself against billion dollar companies the space.

but glad you took analogy literally

tptacek · on Sept 23, 2014

I've been a party to the 8-figure sales of more than one company and stand by what I said. I wasn't at Arbor when it sold (for what I assume to be mid-high 9-figures), but I know a lot about how unserious patents were there as well: not a real factor.

Most importantly: the "door-to-door" time from initiating a patent application to bringing it to bear in a legal dispute is something on the order of a decade.

Incidentally: I didn't take your analogy seriously; I just used it as a hook to disagree with you. I'm not an anti-patent zealot. I've just worked in startups for ~20 years and have come to the conclusion that they are a total waste of time for software companies.

mhoglan · on Sept 23, 2014

I agree that there are a lot of patents that are not factors and generally frivolous. I would argue though that at Arbor the patents / algorithms around classification of flows and DoS detection are a cornerstone of the business being competitive and protected, and should someone else step on their space, they were positioned well for staying competitive. Before InfiniDB I was a Principal Engineer at Tektronix Communications who acquired Arbor and know them very well too (nice company and sounded nice to work at too)

btw, dont get me wrong in saying that if they had their patents they would have been successful. Just one cog in the whole machine. And the timeline I am referring to is there are things back 5-6 years ago that could have been filed before others were doing it, and it would have provided a nice differentiator in the market. It would have helped, but it was not the sole reason.

tptacek · on Sept 23, 2014

They were not.

mp99e99 · on Sept 23, 2014

Sad, wish they made it.