It would be nice if the protocol was published. I understand it's just msgpack and zeromq, but exception handling and message ID's escape me until I dig into the source (or finding the wire format on the pycon notes). I could definitely see myself (or someone else) creating a compatible, low-level C implementation of both a server and a client.
I had a bunch of questions about why it had "Zero" in the name, and why should I use this instead of just building RPC on ØMQ (zeromq). Then I went to the Github page.
Currently using 0MQ to set up our cluster...handles concurrency, the guide has plenty of thorough examples (not to mention it's humorous, and it has examples in every mainstream language (C, C++, python, java, ruby, haskell, etc).
The documentation is pretty sparse, a few questions:
- How does it handle concurrency? Is each procedure run in a Python thread? Process? Gevent thread?
- How does it handle faults and recovery?
- Do you differentiate errors on the RPC layer with that on the procedure layer? For example, what if the RPC call I do does an RPC call that times out?
On a philosophical note, I'm not sure more RPC layers are what the world needs. The transparency a library like this gives you is really a big lie unless you are willing to accept that any call can now be an RPC one. Otherwise it greatly affects the correctness of your program (method M can now fail with some network failure).
Faults are handled differently depending on whether they are ZeroRPC-layer errors or application errors. To maintain the integrity of the connection itself, there is a heartbeat system independent of any given request. There is also an optional timeout that can be set for a given call's response. Application-level errors are propagated as "RemoteError" exceptions in the python interface. In order to collect more info about remote errors, there is also support for ZeroRPC[0] in Raven[1], the Sentry[2] client (Disqus' error logging system). In any failure case, both sides of the connection are notified and given an opportunity to clean up after themselves.
As to the philosophical concerns, I do agree on some level: RPC in general (and ZeroRPC in particular) are powerful weapons that need to be treated with care. That being said, there are a number of cases that are greatly simplified by the higher-level abstractions and more-robust error handling ZeroRPC provides.
> That being said, there are a number of cases that are greatly simplified by the higher-level abstractions and more-robust error handling ZeroRPC provides.
IMO, c.call(hello, "foo", "bar') would be better than c.hello("foo", "bar"). I think the latter gives too much of an illusion of local call.
You can use the alternative syntax: c('hello', 'foo', 'bar')
Usually, "c" will be called "logger_service" or "metrics_service", thus reminding you that you are talking to a remote service, but its obviously just a convention...
Yes I know, people dont like convention, ee had cases when this illusion of a local call leaded us to do bad shit, this is true.
I think its the trade-off of any abstraction. More power under your fingers-tip mean easier use for good... or bad.
Thanks for the detailed criticism and the kind words.
All procedures are run in their own gevent thread.
Yes, we differentiate errors on the RPC layer - there is a builtin heartbeat system so timeouts can be detected even when the procedure returns an infinite stream (think logs or system metrics). There is no built-in retry mechanism.
On the philosophy question: we discourage "pretending" that a call is local. That lead to the demise of original rpc and as you point out leads to weak systems. It remains the job of the developer to be aware of the network boundary and design accordingly.
Interesting, because it seems the opposite to me. The example client code makes an RPC call look exactly like a regular Python call. If you believe (which I do) that the syntax of something implies semantics of that something, this would mean you are trying to tell people an RPC call is the same as a local call.
> we differentiate errors on the RPC layer
What does an error look like if my RPC call times out? What does it look like if my RPC does an RPC call that times out?
The website says: "Built-in heartbeats and timeouts detect and recover from failed requests.", what does the 'and recover' part mean?
Syntactically ZeroRPC calls look mostly the same as local calls, but semantically we don't hide erroneous states like you might expect a typically opaque RPC layer to do.
If an RPC call times out, you get a special exception that you can check against. Same with heartbeat failures.
"Recover" might be bad copy; what we mean is that ZeroRPC cleans up pending results. It is up to you to figure out what to do to fully recover. You have to develop with this in mind, but the trade-off is that failures are not hidden from you, and you have full flexibility in deciding what to do in failed states.
My point is that syntax implies semantics. c.hello(..) implies a local call and the semantics that go with it. But the semantics are actually different, which is bad.
It's such a grief that there is no ruby support yet. It would help a lot with our current project where we have site written in ruby/rails and backend services in python. We currently use Redis a broker between the two. At the same we've been looking at zeromq as a possible and simpler alternative.
Can developers share some info about their current roadmap? What languages will be supported next?
We did python and node.js first because we use them a lot internally. There's no firm roadmap for future implementations, but we're hoping once the protocol is well-documented, implementations can start growing organically.
No, it would be an additional option. ZeroMQ will always be supported - we use it enormously at dotCloud. Another transport we plan on supporting is websocket.
Each transport has its advantages. Zmq is incredibly efficient, very flexible and leaves you in full control of your network topology, but it requires a firewalled trusted network, and doesn't provide ready-to-use broker or naming facilities.
Redis can be used as a very efficient broker and discovery component, and can be more easily included in your web stack, but it limits your flexibility and becomes a potential spof.
The good news is, once zerorpc supports multiple transports you can stary with one and swap it out for another without changing your code.
We started writing zerorpc in 2010. We did test msgpack-rpc at the time, but it was barely a proof-of-concept.
A few key differences: zerorpc supports zeromq (including pub-sub and push-pull topologies), streaming responses, and has a bunch a of great instrumentation built on top of it.
Somewhat of a side-note, but interestingly enough, MessagePack's author cited ZeroRPC as a good example of MessagePack's use case: https://gist.github.com/2908191
One of my clients has been looking at implementing back-end communications between processes, including transaction handoffs... using RPC calls. We had been looking at using dnode, but I've noticed zeroRPC before as well. Is there anyone out there with experience using both or programming for both at least that has any input?
I did not play with dnode long enough to judge it. On the other hand, I can give some details about zerorpc that might help you to determine if it will be a better fit or not.
zerorpc support:
heartbeat: on remote loss, any pending action is canceled (either on server or client side), and your application is notified properly. You can disable the heartbeat. You can also require that a client wait for a server to come to life before checking for a heartbeat.
timeout: you can specify a timeout for how long a request should take (independently of any heartbeat).
streaming: one call, and a stream as a result. Its serves two purposes:
- transferring data sets that that would not fit in memory as well as reducing the transfer latency for big data.
- push/pull stream to get events whenever they come. The client effectively "subscribe".
Note that it's up to you to decide what to do when a client can't consume the stream fast enough, so you can do push/pull style (server blocks for client), pub/sub style (server discards messages), punish style (server shits on the client ;)).
Both implementations use asynchronous io. The difference is simply the API they expose - each implementation simply uses the best api available. With Node the dominant programming model is callback-based so that's what is used. Python has the advantage of a strong and well-supported coroutine library (gevent) so python developers aren't used to putting up with callback spaggheti.
tl;dr zerorpc is agnostic and each implementation uses the best tool for the job.
Looks very convenient. Are their any plans for other language support (in particular on the client side). Ruby and PHP client libraries would be useful.