I think this is largely accurate, but one thing the article misses is that SOAP was never as interoperable as it was claimed to be. If you had a Microsoft client and a Microsoft server, things would work great, but as soon as you started mixing vendor implementations, you quickly ended up in configuration hell. For simple SOAP messaging you could usually get it to work with enough elbow grease, but for any of the WS-* protocols you might as well be doomed. I lost days, if not weeks, of work to failed interop of WS-Security across two different vendors (both Java, and both with the code under my control).
I spent several years integrating Java/Linux/Unix/Microsoft/etc and I never ceased to be amazed at how many companies/programmers thought it required different code to make them interoperable. I had very few issues with SOAP interaction between systems because they are just text files.
For all intents and purposes, SOAP is a formatted pipe connection. The only issues I ever ran into were related to things like Java incorrectly implementing RSA encryption or Microsoft adding so much complexity that nobody used (and did not require).
There are still several major banks/insurance companies out there whose payroll transaction systems run on the SOAP implementations I put in place 10 or more years ago.
Now I see people claiming REST is better (and in many ways, they are correct), but I also see that large numbers of programmers don't really grasp REST any better than they did SOAP.
> I had very few issues with SOAP interaction between systems because they are just text files.
Did you write parsers for the messages by hand every time?
I did a bit of work with SOAP, and found interoperability to be a problem. Encapsulation and signatures both spring to mind. But then, i was using tooling on both sides. I shudder to think what dealing with SOAP messages manually would have been like.
Why would I write manual parsing code every time? There already are/were XML parsers in every language I needed to integrate. SOAP is just XML. XML is easy. I've moved to REST implementations but I have just quite got why anyone thought SOAP itself was anything complicated.
Microsoft's 'WS' implementations were a mess of SOAP, but that's a whole different story.
WCF is a mess and poorly documented. I have several significant implementations of WCF I support now and I really often wonder WTF Microsoft was thinking.
I once spent a day or two looking at WCF in Reflector. Interestingly, while the API makes it look like a tiered, flexible architecture it is not. Instead, at the bottom (!) layer you basically have case distinctions, depending on the layers that were configured on top.
Oh god, I have to deal with a SOAP service from a perl client...try dumping out a SOAP::Lite data structure once, and you will be wondering what in gods name the SOAP::NotLite one would have to look like.
Off topic question here. Does an archive.org article get cached so that when it's linked to it comes up faster? Or does it display at the same speed of a random archive.org page?
That page came up pretty fast for me don't know if that is random or because other people are hitting the link from HN.
The entire thing hide itself behind a "don't worry about how that works, the tools and libs are supposed to do that for you" philosophy, and then those would fail and break so often it was almost comical. You would then end up debugging the entrails of a SOAP exchange, which is one of the most terrible things I've seen.
I feel like I spend a troubling amount of time as a software developer dealing with and/or avoiding solutions that are much more complicated than the problems they're trying to solve.
tell me about it... I've been searching for a C++ library that will encode x86 instructions from syntax that looks moderately similar to assembler code. Not a disassembler, not a JIT toolkit, not a special cross-platform IR that gets compiled to real instructions. Just an instruction encoder with a moderately pretty syntax. No luck so far.
also Xed does not seem to be open source. It ships with a bunch of headers that say "Intel Open Source License" at the top but there is a binary library instead of source files.
yes, I don't need its decoding capability and I don't think its syntax is pretty but if I ever decide to write my own library I'll probably wrap xed in some C++ sugar.
I'm not sure I understand your requirements but might DynASM be useful? It's one component of the JIT library behind LuaJIT but many people use it for run-time code generation completely outside a JIT setting.
DynASM looks cool but its preprocessing step and fancy C integration definitely place it outside the description "Just an instruction encoder with a moderately pretty syntax."
I guess it's not fair to say that XED and DynASM are "much more complicated than the problems they're trying to solve." They are much more complicated than the problem I'm trying to solve. But I am surprised that there is no minimal X86 encoder with nice C++ syntax out there.
The fact that SOAP is not reasonably supported in a browser was a huge reason it fell down. When SOAP first came on the scene, complicated AJAX based applications were not typical. But as more and more JSON/HTTP APIs emerged in conjunction with browsers becoming more powerful and rich apps build built in them, the meaninglessness & complexity of SOAP grew.
Also, another major issue with SOAP is that most all of the popular tools would generate classes based on a WSDL. This creates a toolchain issue that is readily solved for someone experienced, but can really suck if you are new to the idea of generated code working it's way into your project.
Much worse was the scenario of WSDL versioning in conjunction with these tools that generated classes. If the API never broke backwards compatibility, you should be OK and can just use the newer class representations, counting on the service and tools to deal with null fields appropriately (not always true unfortunately). But if version bumps of an API/WSDL broke backwards compatibility... then you had to have separate class hierarchies for different versions of the WSDL; what an intense headache. Contrast that to web REST APIs; without a formal schema and a lack of class-centric tooling, client libraries would often let the author stuff in the params themselves and let the serializer stuff in the body based on the params; so yes your API is not as formal, but this isn't a big issue but offers infinite flexibility in dealing with one-off versioning issues or interop issues.
As someone else mentioned, some platforms couldn't interop with others (differences as it related to simply stuff like nullable fields or primitives existed all over; total nightmare), and some features of WSDLs didn't translate well to certain languages.
A great WSDL author would know to make their WSDL as simple as possible, because they had spent time working with various language toolchains and new the limitations out there, but that's asking way too much and not realistic for someone to have to spend all their time trying to understand all the ways someone in language XYZ might use their WSDL.
Turns out the "definition language" (WSDL) and the presumption that it would be simpler to deal with than just showing people the messages that needed to be placed on the wire... was the broken assumption.
We wanted automagic seriaization and de-serialization and we wanted every language-based construct (linked lists, arrays of structures that contain arrays) to be supported. Different tool vendors built that differently, hence the interop problems.
In REST/JSON, we don't have an IDL. We don't have a JSON Schema (yet! thank god) We generate human-readable documentation containing examples, for the definition. And developers build to those examples. There's nothing - no runtime or static tool - that gets in the way of developers serializing their data into JSON, the way they need to.
My observation of the rise of JSON/HTTP was that it was a thing happening in the dynamic language communities, and that only came into the Java world when the battle was already over.
I suspect the dynamic language communities preferred JSON/HTTP so strongly because it was so much easier to use: they didn't have big corporate backers releasing SOAP tooling, but it's trivial to parse JSON straight into structures that are more or less native to the language (arrays of objects in JavaScript, lists of dicts in Python, arrays of hashes in Ruby).
Java did have heavy-duty SOAP tooling, and didn't (and doesn't) have a natural way to represent JSON's objects, so it was much less of a win there.
The Python version is the sort of thing you find yourself writing in Python anyway (at least, if you write tabloid-level Python the way i do); it's reasonably idiomatic. The Java version is cumbersome and feels very un-Javaish.
Bear in mind that with SOAP, in Java, by that point you would have mangled the data into realistic-looking objects, and would write something like:
I think SOAP has lost in a sense that it's no longer very popular around many web developers.
But has it really lost? I still see it used a lot in large corporate and high tech environments.
I was never a big fan of SOAP myself, but that's also partly because when I started, the tooling wasn't great and I got mostly a negative user-experience. But this is many years ago.
My understand is that these days it's quite good, and it has a pretty good user-experience. Furthermore, everything is backed by xml schemas, which makes it easier to create and require strictly valid xml being sent back and forward.
While some equivalents exist in REST-ish API's, it's arguably not very common and with SOAP it's a pretty much a free and built-in feature that everybody uses.
Modern web API design is great for getting things up and running fast, but it's not amazing for rigorously robust design.
I don't really like the "SOAP is bloated and over-engineered" narrative. It is uninteresting now, and was uninteresting 10 years ago.
I think it's much more interesting to learn about the things that SOAP did get right, and see how we can apply some of these learnings in our API's.
I work on an SOA platform for a very large software company. We offer both REST and SOAP versions of our RPCs. In my experience, SOAP services are okay plug-and-play if you own both the provider and the consumer. But consuming external APIs can be nightmarish, especially with SOAP 1.2. SOAP exists in our organization for legacy reasons but we recommend all new consumers use RESTful services.
Our mobile clients like REST because it's faster and easier to consume from outside the .NET platform than SOAP. Even within the .NET platform, a client can get up and running consuming a "RESTful" service in fewer lines of code than a SOAPy one. Not only SOAP tooling, but HTTP client tooling in general, is a lot easier to use these days.
WSDLs are cool because they are a fairly standard metadata format. Our clients like them because they can perform data validation with proxy/firewall systems before the request ever enters the corporate network. I've yet to see a concrete benefit from that kind of strictness, though. IMO, SOAP schema validation is so brittle that it causes more problems than it fixes.
Of course, you can also use non-SOAP document description formats like RAML for describing your RESTful services. It's not like SOAP offers anything you can't get for REST. But once you start adding all that on top of your REST API stack, you might as well use SOAP 1.1. (Never use SOAP 1.2).
I don't think the main problem with SOAP is that it's over-engineered, although it is. The issue is that the supposed benefit of "interoperability" was never really realized. It's supposedly protocol-agnostic, but nobody cares. It's supposedly interoperable, but everyone implements it differently.
It's too confusing, I think. The effort it takes to try to understand what's going on is too great. You can explain our REST service infrastructure to a developer in a few hours. I don't know how long it would take to explain the SOAP infrastructure, because I don't think anybody understands the whole thing.
If you are trying to communicate with a vendor that uses SOAP 1.2, it almost always seems to come down to guesswork. What set of properties do I need to specify before they accept my request? The WSDL gives you an object schema but it's not sufficient for the masses of WS-* extensions that you might have to support. Meanwhile, the vendor just exposes the WSDL and assumes that's sufficient "documentation" for clients and you don't need any examples or explanations or other info. I find the best way to approach this is to open Soap UI and start messing with settings until requests are accepted. There's no point asking the vendor for documentation because they probably don't know themselves.
Also, speaking from the perspective of a framework developer, I mainly start to pay attention when things stop working or clients are having problems. SOAP 1.2 doesn't have any more problems, but the problems are proportionally harder to solve. No matter how good your tooling is, it's not perfect, and it's a lot easier to look "under the covers" for RESTful services than for SOAP services. Of course that could also be attributed to the hellish nature of WCF, and not SOAP 1.2 per se.
...it almost always seems to come down to guesswork.
Meanwhile, the vendor just exposes the WSDL and assumes
that's sufficient "documentation" for clients and you
don't need any examples or explanations or other info. I
find the best way to approach this is to open Soap UI and
start messing with settings until requests are accepted.
There's no point asking the vendor for documentation
because they probably don't know themselves.
Having just dealt SOAP API last week, I can relate to this so much. I'm glad that I'm not the only one feeling the pain here.
> But has it really lost? I still see it used a lot in large corporate and high tech environments.
That's because of inertia -- a lot of those companies invested big in SOAP in the early to mid-2000s. They're not going to throw away an investment that big just because it's not cool anymore.
But don't mistake that inertia for SOAP being popular. It's just that so much money has been sunk into it that enterprise users are resigned to being stuck with it for a while.
My guess is that more and more people are coming to the realization that xml is great for fluid text and document formatting, but too ambiguous to serve as a data structure format.
XML nodes and children are just too slippery. I've actually argued with enterprise developers about why serving the following format was a terrible idea.
Objects are edge-labeled graphs. XML documents are node-labelled trees with more than one type of leaf. The ambiguity is intrinsic and ought to have been obvious long ago.
Let me count the ways that's underspecified in ways that can go wrong...
(The actual problem I tried to solve was unrelated to this. The fields are in latin1 while the document itself is in utf8, with attributes specifying content type. It all validates apparently.)
I've followed the rule that if the property is of the underlying data representation, use children, and if it's a (meta-ish) property of the XML implementation, or some contextual annotation, use an attribute.
My rule is that things are almost never attributes. :-)
There is less flexibility there. Escaping seems harder, and you can't have sub-elements or lists. Avoiding attributes makes things more verbose, but then that's XML for you.
Both ways still leave the structure ambiguous. Should <user> be mapped to an associative array of mixed objects, or to a fixed object using a custom mapper to shove the account objects into their own sub array.
I don't understand either, and would benefit from a clear statement. Is there a universally agreed upon better way? Are those criticizing it presuming the particular use case of schema-less data deserialization and saying that it doesn't fully specify the object?
In JSON the server would have returned account as an array with 2 elements. In XML you are left to figure that out for yourself. It's hardly clear that there could be several accounts but never several ids.
(Yes, you can wrap it with an <accounts> node and make your data extremely verbose.)
What about attributes? Should we ever expect them on the user? What if there's an attribute id as well, is it the same? (Oh, just download a schema to tell you what to expect! Argh...)
GET /customers/43456 HTTP/1.1
Host: www.example.org
Response:
HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
<customer>Foobar Quux, inc</customer>
Seems fine to me. In fact, having named elements without the requirement for a dictionary makes parsing straight to an object-representation (without an intermediate property-list) much easier.
It never equates XML with SOAP. The point is that even while SOAP was being pushed by Microsoft, JSON was gaining mindshare among API developers, and API design was moving away from complex XML documents.
Using a simpler format for object serialization into XML is definitely an option, and it's a perfectly fine middle-ground between SOAPy verbosity and JSON compactness, but IMO it doesn't really have a lot to offer over a JSON version of the same API. Many API providers support both formats using content-type detection.
In my opinion, WS-* lost not because of the format or verbosity. WS-* lost because the concept is about the continuation of RPC, where you expose custom methods that can be invoked from a client. When state is altered in custom ways, complexity goes through the roof - to the point where you need a DSL (WSDL) to define these methods and how to interact with them.
REST simplified things because the concept is about limiting the method invocation to the bare minimum and always have an explicit understanding of the state that is being changed or communicated.
The XML/JSON question is not as important as the method/resource question.
The whole baked-in type system also adds a lot of complexity. On top of that you get the complexity that is xml it self (namespaces, while theoretically a good idea, are pure insanity).
Hold on to this, "tldr: Simplicity and utility trump large corporate backing."
When your manager tells you that your API needs to handle something "just in case", when your customer asks, "But what if I want to use this API with mumblefratz?", and when you look at your code and say, "Gee, if I make this a variable I could handle any special edge case by encapsulating its specialness in the variable ..."
I've had the "fortune" to work with SOAP as well (java implementation) and I never could shake the feeling that it was designed with the intention of being obscure. It seems you could employ an entire batch of consultants just to deal with defining, implementing and changing SOAP APIs (kind of a misnomer I know). The whole change from SOAP 1.1 using an ActionHeader to SOAP 1.2 using an action parameter seems downright misleading. Or downloading WSDL files to automatically create Java classes sounds great in theory but now your program has additional obscure classes with very weird syntax and you have to read through numerous reference guides just to know how you can modify the header of an outgoing request. Such a waste of time.
In my experience, it wasn't that. Larger companies put value in supporting standards, because it is an important checkbox in their marketing, and SOAP is definitely a standard.
One nice example of how MS really made things more complex with SOAP is the MSN Messenger Protocol (MSNP), which started out quite simple and sane, then got extremely SOAPy in the later versions; compare protocol interactions pre-SOAP:
I think this is a great example of SOAP's failing. While it is simple, it looks difficult. As you scroll through all that XML your eyes begin to glaze and you don't even notice the small section at the bottom. Despite being the most important, the section is at the bottom, has weird grammar, and feels like an afterthought.
>Here the WSDL and XML schemas for the web service descripted here, you can use them to generate a web service binding for your programming language.
I wouldn't consider requiring a ton of mostly redundant and useless information in messages (each namespace needs a whole damn URL every time it's referenced? Does the parser use it, or even care that the entire URL is correct?) and XML parsing (which isn't so bad, in comparison to all that redundantly redundant redundancy) to be "simple" at all. WSDL/XSD is yet another layer on top of this mess of complexity and I suppose you're referring to the fact that it looks simple if everything goes according to plan, but when it doesn't it's anything but simple. Trying to get any kind of performance out of it is another counterpoint to "simplicity"... This is all from the experience of someone who has worked with XMLA-based OLAP systems.
With JSON, if the data returned by the server doesn't exactly match the documentation, it's the client's problem.
With SOAP, if the data returned by the server doesn't exactly match the documentation, the server is broken. The client, reading the WDSL file which documents the API, will report an error and refuse to continue. This means bug reports the server provider cannot make go away by the usual defect-denial techniques.
SOAP is past its peak but far from dead. In the Microsoft ecosystem, SOAP is nearly transparent, except for a bit of fiddling - you publish your WSDL from your definitions, I consume it to generate an API class on my end, and after maybe ten minutes on either side, we don't even think about SOAP again.
It's funny that the joke starts with a dig at CORBA. When I was worked on a CORBA-based system a decade ago, I thought I'd never see something more over-designed. But this whole document-oriented web services thing makes me feel like: https://screen.yahoo.com/unfrozen-cave-man-lawyer-1-22341242....
Sure, SOAP's complexity is a problem, but the issue with "RESTful" APIs is that they're so loosely defined, almost all the way to the other extreme. Being able to generate code and have a type-checked API, which is as easy to use as any other local API was a very nice feature we've now lost. I wish there were a better compromise.
I think we'll see more and more schema related projects for JSON-based API (like swagger). It will be interesting if they can compose in a way that keeps the complexity down, as opposed to what happened with SOAP.
* Originally, Google Protocol Buffers. These are still in wide use; unfortunately, they never published/blessed an official RPC stack.
* Apache Thrift, aka Facebook's answer to Protocol Buffers.
* Capn Proto, by one of the Protocol Buffers authors, now has an RPC standard!
After having used Protocol Buffers extensively, I could never go back to untyped APIs. I'd at least use Apache Thrift for everything. Hopefully Capn Proto gets more languages supported for its RPC soon.
Protocol Buffers is great. Thrift generates much nicer C++ code but performs slower and wants you to do everything inside their networking world. I'm excited for Cap'n Proto. It's a good time to be doing binary IPC.
I'm still uncertain about melding the message format and RPC mechanics though. Part of me feels like network communications is just too big of a deal to abstract away into something that looks like a normal procedure call. But hey, even if that's always true for big systems, it probably won't be for smaller apps.
SOAP Web Services around still frequently used in a lot of large corporations, where technology evolves a little slower and Microsoft products are still very popular.
Being a Rails fan myself, I have always enjoyed using HTTP Web Services and interacting with tech companies' Restful APIs so easily.
This post provided a clear, side-by-side comparison of the technologies including the historical context of SOAP, which I found really useful and enlightening. If I could give you some Bitcoin for writing it, I would. Have a new Twitter follower instead.
It's hard to find succinct, objective technology articles in this day and age. Most reviews I find are either unconstructively biased ("dynamic typing sucks!") or too long and convoluted to be effective.
I always liked XMLRPC. It seemed really simple. At least in Python they simply mapped to function calls and all of the XML stuff was handled for you. It took minutes to get an API set up with that from any program.
I still have to use SOAP more often than I'd like to these days. It's a pain testing and dealing with all the different tools you have to use just to stand up a basic request.
A lot of vendors I've worked with will publish REST web services for public data but as soon as any private information needs to be accessed they use SOAP. I don't know if they just don't trust the tools to make REST secure or what but it's disappointing for sure.
Also the people who wrote the SOAP code are long gone which makes support for them even more difficult when I'm trying to get bugs fixed. REST server code has not been anywhere as bad to deal with.
SOAP is still common in healthcare as many IHE profiles rely on SOAP for Cross-community document/data exchange. They are a pain to work with if you don't have a toolset that handles it well.
1) JSON coming up to replace XML as a serialization technology, and
2) and Ruby on Rails (etc)
I remember even serious pressure from w/i the XML community in the sense of XML-RPC, and the joke that the entirespec for XML-RPC was smaller than the Table of Contents for the SOAP spec. SOAP was tough. It was confusing to use (fwiw, I had the best luck w/ Perls SOAP::Lite) and difficult enough to implement that devs I knew simply abandoned it.
New Finnish gov digital infrastructure is going to built using Estonian X-road (well next year). And yes, its only SOAP, so I guess either Finns are pretty crazy or SOAP is making a comeback. Go figure.
You can't argue that PUT, DELETE, and POST methods are usable in a browser, and once a rest api starts going hogwild with headers, the GET really doesn't work either.
The fact REST didn't provide some way for all methods to be invoked in a simple browser was to me a step towards the inaccessibility of SOAP.
That is, a POST containing _method=delete will be interpreted as a DELETE.
It seems faintly silly; if you're exposing an API to the browser, why use methods that you can't use directly and then have to go to the effort of emulating them, when you could just use supported methods directly? Is the use of the proper methods really that important?