First, it uses CMake to build - for a long time Google projects had seemed pretty anti-CMake (for example using gyp, plain Makefiles or autotools) so it's nice to see them using CMake. IMO it's the best build tool, though all build tools generate various levels of hate :-)
Second it's another Google project that generates good developer docs from source code using doxygen and markdown. These docs look good on github directly (https://github.com/google/flatbuffers/tree/master/docs/sourc...) as they are markdown, and even better on the dedicated site where they have custom css.
If I were to write a C++ library, I'd definitely copy these 2 approaches.
Thanks! I guess it's because we're a game development group inside Google, who are more externally focused than most Google engineers. We wanted to ensure the library is attractive to outside developers, hence CMake and other choices (like not having any dependencies).
In short: They first used JSON, if I understood correctly. Then
"In last six months, we have transitioned most of Facebook on Android to use FlatBuffers as the storage format. Some performance improvement numbers include:
Story load time from disk cache is reduced from 35 ms to 4 ms per story.
Transient memory allocations are reduced by 75 percent.
This is indeed a very interesting encoding. One part of the optimization comes from the use of binary encoding and the other from using offset tables to optimize random access. Offset tables also allows to modify random data in the structure without having to reencode the whole information. This comes at the price of less compact data.
It looks like I could update my YABE encoding which is a straight JSON binary encoding.
For those are are totally new to Flatbuffers, here is a good video by Colt from Google. It was primarily created for game developers but technically any app can use it - those who are not relying on a library that handles networking and has implementation for JSON.
He also mentioned Flatbuffer use in his recent at Android meetup in San Francisco and we debated it's benefits again. It has a learning curve but well worth it: https://www.youtube.com/watch?v=iQTxMkSJ1dQ
OK, so if I understand this correctly, a binary format where you can seek to arbitrary positions to read only the data you need to currently display is a huge performance win versus a big blob of text you have to read and fully parse. As someone who has had to uninstall FaceBook from my Android phone due to its poor performance, excuse me if I'm bewildered that they didn't realize this years ago (FlatBuffers isn't the first binary serialization format) and write it that way in the first place!
Facebook(the company) values putting any functioning software out there over putting optimized software out. This is intentional, but it leads to bad experiences for people like you.
There's a lot more to it than just being binary. The big deal is that it can be accessed in-place while still being portable and forwards/backwards compatible. For example, Protocol Buffers is binary too, but requires unpacking, causing lots of object allocation, etc.
I didn't like the writeup, because I can't tell how the layout works from his bad description of it. The author makes that classic mistake of technical writing in which the writing assumes the reader already knows what they are talking about.
I ran across Cap'n Proto (https://capnproto.org/) a while back and at a glance it seems to do something similar. Is that true? Are you familiar with the tradeoffs between the two?
They are missing the second major reason why flatbuffers is awesome, cache locality.
It's incredibly hard to layout objects in memory with Java but if you don't mind the lookup hit on bytebuffers flatbuffers is a great way to structure data in the patterns you access it.
All this stuff is pretty old-hat to game dev people bit it's nice to see mainstream dev start caring a bit more about performance.
Good point. Lookup in Java is a bit slower than in the language FlatBuffers was originally designed for (C++), but being able to bypass Java's object allocation may well make up for that.
In general, objects allocated in sequence will be consecutive in Eden. After GC moves stuff around, objects referred to by object X often follow object X itself. YMMV.
Nice to see Facebook acknowledge contributions from Google. It's a departure from typical not-invented-here mentality prevalent at large tech companies.
What does the iOS Facebook app uses? Couldn't be core data based on the fact that they claim they can not normalize the data ahead of time. The iOS app is nice and snappy, at least on my phone, though it does take some time to load.
How much binary size does it consume? Have an app with 100 objects with a few thousand fields total and you have several megabytes consumed by protobuf alone. With 3mb source files if you put it all in one proto file.
That actually makes me wonder if storing things as such might be a great way to go.... like just reading 58283/photo.jpg or 58283/name.txt or whatever... of course you have you then pay the price on decoding whatever you get from the network into these files.
JSON parsing is a major problem on Android. When I also first ran into the 35ms parsing I thought for sure I was doing something wrong. Nope.. the latest phones can get down to 4-5ms JIT'd but still iOS is an order of magnitude or two faster which completely changes your architecture decisions.
Why this can't be integrated in to be fast & native I've never understood. It should have landed in Android a long time ago.
A bit surprising but I'm guessing it's the multi-architecture fat binary problem they don't want to solve? Until recently debugging native plus Java was very difficult, and most of the UI stuff is in Java not native.
As far as I'm aware Dropbox uses c++ between iOS and android shared for all the model/network bits so other biggies do it.
Seems to be native versus JVM. My guess is it's a combo of string processing, many temp objects, etc with too many interpreted versus JIT'd code paths.
Flatbuffers targets C++ and Go[1], so it does not appear to require JIT. It would appear the necessary code is generated statically even for Java and C#, to take advantage of strong typing.
Makes sense but do people use it for conventional web. If there is such a great deal of efficiency why isn't there a great deal of adoption?
Edit: Added some more text
Because JSON is the "default" (everybody knows it, everything has tooling for it) and most people don't care about efficiency on that level until they have to.
First, it uses CMake to build - for a long time Google projects had seemed pretty anti-CMake (for example using gyp, plain Makefiles or autotools) so it's nice to see them using CMake. IMO it's the best build tool, though all build tools generate various levels of hate :-)
Second it's another Google project that generates good developer docs from source code using doxygen and markdown. These docs look good on github directly (https://github.com/google/flatbuffers/tree/master/docs/sourc...) as they are markdown, and even better on the dedicated site where they have custom css.
If I were to write a C++ library, I'd definitely copy these 2 approaches.