How FarmVille Scales to Harvest 75 Million Players a Month

ehsanul · on Feb 8, 2010

They use cache components to scale their writes, since they have so many writes. I assume this means that instead of hitting disk for every write, data in the cache is updated instead, and asynchronously saved to disk. Really, this pattern should be generalized, in some sort of memcached extension/fork or something.

aristus · on Feb 8, 2010

MySQL has this in the form of the startup option innodb_txn_commit. It performs all writes to memory and flushes them to disk in 1-second batches. It's a good way to scale up to 000's of writes per second per machine.

ehsanul · on Feb 8, 2010

Nice, but when you're scaling to such a large size, you probably want to go NoSQL, unless you use MySQL as a key-value store (which I've heard of some people do).

Another thing: Many applications, can get away with updating to disk much less often, say every 30 seconds or even longer if appropriate. When a machine fails then, you lose 30 seconds worth of updates stored in that machine's RAM, which may be fine. For some applications, you could probably stand losing even a few minutes of updates on a failure, while gaining huge decreases in loads on disk.

z8000 · on Feb 8, 2010

Yes, you could buffer writes. Tokyo Cabinet does this for instance as does Redis. There was an article on the mysql performance blog about Tokyo and how fsync()ing at 1Hz didn't adversely affect performance; there are other issues with TC but that's not my point.

I suppose a system where data could be lost if buffered in RAM for up to 30s could replicate to other hosts in the cluster with the (perhaps naïve) intuition that the probability of > 1 hosts going down within the same 30s interval is low -- "someone's got the data somewhere!". Of course, if they are on the same source of power... :)

One could use a consistent hashing scheme as employed in Riak and others for read operations in such a setup and thus be a bit more fault-tolerant (hand waving a bit).

This is without any hard-earned experience with this other than reading... Thoughts?

ehsanul · on Feb 8, 2010

I have no hard-earned experience with this myself. I've just been thinking about it myself a lot because I have a potentially write-intensive application in the works, and I'm anticipating aperformance issues with writes (but I should probably find out if it actually is an issue). My goal is to squeezing as much performance as I can from as few machines - I'm poor.

To explain my use case.. I'm basically storing a graph in MongoDB, which is a document/object based database. Reads would consist of enumerating a node's edges. Writes would be adding nodes/edges and updating edge weights. It would perform best on reads if I just had one contiguous object for each node, holding information about all nodes connected to it, as well as edge weights. But since some nodes may have too many connected nodes, they are divided in the database by the page they would show up on in the application, as ranked by edge weight. So an object with id 'node1page1' holds information about the top 10 ranked connecting nodes.

Now, if that made any sense, the problem: Edge weights change according to votes from users. So when a weight changes, ranks change, and I've structured things in such a way that the computation of ranks is basically pushed to the write rather than the read. So when someone checks 'node1page2' and votes a node on page 2 up, the corresponding edge weight changes and may require that node's information to be shifted to 'node1page1', while the lowest edge on 'node1page1' is to be shifted to 'node1page2'.

So what's happened here is that while I've made reads fast by requiring just a lookup, writes become complicated. And here's where the buffering of writes comes in. Perhaps I'm overthinking all this, but like I said, I'm poor.

Replication in-memory would definitely be useful, especially for a larger cluster with some spare capacity. You'd probably have good power redundancy when running a large cluster, so single machine failures would be much more common than a total cluster failure I'd think (no hard data here, but seems reasonable enough). So I don't think it's all that naive to believe that generally more than one host would not go down within 30s.

Even if the whole cluster does go down, you should only be doing this with low-value data. In my application's case, losing 30 seconds of votes is no big deal at all, I'd be much more concerned with getting the machine(s) back up.

z8000 · on Feb 9, 2010

I'm in the same low-cost-seeking mode as you, self-funding a teeny "startup" for iPhone services (along with 1000s of other developers) including multiplayer gaming (along with 10s of others developers).

While I can get good dedicated server deals they are still too expensive to me. So I'm diving in deep to learn how to create a almost pure scale-out solution based on a larger (than dedicated) number of VPS instances. Something like

    inexpensive+commodity+more ≥ expensive+powerful+fewer

I really want to be able to have a script that watches load and dynamically adds more capacity. I'm not a fan of calling this "cloud" based but the ideas are pretty similar.

Good luck with your project! BTW, have you looked at Neo4j (which is a graph database that seems to get high marks)?

ehsanul · on Feb 9, 2010

Good luck to you too! You've got an interesting approach to scaling, hope it works out (cheaply).

Yes, I've looked at Neo4j, and I would've tried that if it weren't for this (from their home page):

Neo4j is released under a dual free software/commercial license model (which basically means that it’s “open source” but if you’re interested in using it in commercially, then you must buy a commercial license).

Thus MongoDB. I just heard of Riak the other day, and might check that out too since it has the concept of links built in. I'm also still early enough in the work that I can afford to switch still.

blader · on Feb 9, 2010

I just googled for that option and I couldn't find it. What's the option supposed to be?

aristus · on Feb 9, 2010

Good catch. It's actually innodb_flush_log_at_trx_commit.

http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.htm...

rbranson · on Feb 8, 2010

Redis does this.

ehsanul · on Feb 8, 2010

Which is great. But I think there's a need for something more generalized, just as memcached provides for caching. Then you could use any database at the back.

z8000 · on Feb 8, 2010

I wish there was more data offered!

Request-response, XHR, long-polling, "COMET" (blarg, I hate that moniker) when talking to the backend?

pavlov · on Feb 8, 2010

Isn't FarmVille a Flash game? If so, it presumably uses Flash XML sockets instead of Comet-style browser hacks.

amitt · on Feb 9, 2010

We use AMF protocol as XML is slow to parse on the client

z8000 · on Feb 10, 2010

Thanks for sharing that. What are you using on the backend to handle all of those AMF connections? I'm not intimate with AMF but it seems there are at least 10 server implementations that support AMF.

axod · on Feb 8, 2010

Don't forget WebSocket ;)

z8000 · on Feb 8, 2010

True, but I don't think development versions of browsers have 75MM users at this time. Maybe in 2020 when HTML5 is ratified. :)

axod · on Feb 8, 2010

FWIW, close to 10% of Mibbit are already on WebSocket. It's supported in the non-dev latest Chrome, which the majority of Chrome users are using, since the browser has a proper update facility.

Ignore things at your peril, although you're probably right. People playing FarmVille are unlikely to be early adopters.

catch23 · on Feb 8, 2010

I'm betting majority of farmville players are on work computers... instead of solitaire or yahoo games, there's farmville!

axod · on Feb 8, 2010

idk, I think a ton of bored housewives play it. It'd be interesting to see demographics.

far33d · on Feb 9, 2010

With 75m monthly actives, I'm pretty sure almost every demographic is covered.

richcollins · on Feb 8, 2010

He said that they're using LAMP. I'd like to hear more about how they deal with schema updates and how they model their data in general.

teej · on Feb 8, 2010

As far as I know they... don't. MySQL is used more as a persistent store for bits, not as a relational database.