[荐]RPC is bad?
偶像 Steve Vinoski 在 maillist 的回帖中一不留神就泄漏了他为 ErlangeXchange 准备的 session ,我们可以先一睹为快。Steve 大拿是 CORBA 界的牛人,对 RPC 是 bad 很有发言权地。这篇文章也写得很漂亮,水分相当少,我就不干“损失味道”的事情了。
为方便阅读,将 mail 内容盗版如下:
Well, if you had time you could dig through my various IEEE Internet Computing columns from the past 6 years and find many reasons listed there. For example, “RPC Under Fire“(note that it’s PDF) from the Sep/Oct 2005 lists a number of problems:
Also, pretty much any of my columns that cover REST to any degree would also contain mentions of RPC’s shortcomings. All the columns can be found here:
But if you don’t have the time or energy, the fundamental problem is that RPC tries to make a distributed invocation look like a local one. This can’t work because the failure modes in distributed systems are
quite different from those in local systems, so you find yourself having to introduce more and more infrastructure that tries to hide all the hard details and problems that lurk beneath. That’s how we got
Apollo NCS and Sun RPC and DCE and CORBA and DSOM and DCOM and EJB and SOAP and JAX-RPC, to name a few off the top of my head, each better than what came before in some ways but worse in other ways, especially footprint and complexity. But it’s all for naught because no amount of infrastructure can ever hide those problems of distribution. Network partitions are real, timeouts are real, remote host and service
crashes are real, the need for piecemeal system upgrade and handling version differences between systems is real, etc. The distributed systems programmer *must* deal with these and other issues because
they affect different applications very differently; no amount of hiding or abstraction can make these problems disappear. As I said about such systems in a recent column:
“The layers of complexity required to maintain the resulting leaky illusion of local/remote transparency are reminiscent of the convoluted equations that pre-Copernican astronomers used to explain how the Sun and other planets revolved around the Earth.” (from “Serendipitous Reuse“)
RPC systems in C++, Java, etc. also tend to introduce higher degrees of coupling than one would like in a distributed system. Typically you have some sort of IDL that’s used to generate stubs/proxies/skeletons
– code that turns the local calls into remote ones, which nobody wants to write or maintain by hand. The IDL is often simple, but the generated code is usually not. That code is normally compiled into each app in the system. Change the IDL and you have to regenerate the code, recompile it, and then retest and redeploy your apps, and you typically have to do that atomically, either all apps or none, because versioning is not accounted for. In an already-deployed production system, it can be pretty hard to do atomic upgrades across the entire system. Overall, this approach makes for brittle, tightly-coupled systems.
Such systems also have problems with impedance mismatch between the IDL and whatever languages you’re translating it to. If the IDL is minimal so that it can be used with a wide variety of programming
languages, it means advanced features of well-stocked languages like Java and C++ can’t be used. OTOH if you make the IDL more powerful so that it’s closer to such languages, then translating it to C or other
more basic languages becomes quite difficult. On top of all that, no matter how you design the IDL type system, all the types won’t — indeed, can’t — map cleanly into every desired programming language. This turns into the need for non-idiomatic programming in one or more of the supported languages, and developers using those languages tend to complain about that. If you turn the whole process around by using a programming language like Java for your RPC IDL in an attempt to avoid the mismatch problems, you find it works only for that language, and that translating that language into other languages is quite difficult.
There’s also the need with these systems to have the same or similar infrastructure on both ends of the wire. Earlier posters to this thread complained about this, for example, when they mentioned having to have CORBA ORBs underneath all their participating applications. If you can’t get the exact same infrastructure under all endpoints, then you need to use interoperable infrastructure, which obviously relies on interoperability standards. These, unfortunately, are often problematic as well. CORBA interoperability, for example, eventually became pretty good, but it took about a decade. CORBA started out with
no interoperability protocol at all (in fact, it originally specified no network protocol at all), and then we suffered with interop problems for a few years once IIOP came along and both the protocol itself and implementations of it matured.
Ultimately, RPC is a leaky abstraction. It can’t hide what it tries to hide, and because of that, it can easily make the overall problem more difficult to deal with by adding a lot of accidental complexity.
In my previous message I specifically mentioned Erlang as having gotten it right. I believe that to be true not only because the handling of distribution is effectively built in and dealt with directly, but also because Erlang makes no attempt to hide those hard problems from the developer. Rather, it makes them known to the
developer by providing facilities for dealing with timeouts, failures, versioning, etc. I think what Erlang gives us goes a very long way and is well beyond anything I’ve experienced before. Erlang really doesn’t provide RPC according to the strict definition of the term, BTW, because remote calls don’t actually look like local ones.
(BTW, this is the kind of stuff I’ll be talking about at Erlang eXchange next month.)


Comments
RPC的根本问题是,它尝试让一个分布式的调用看起来像本地调用一样。因为 分布式系统出错的模式(failure model) 和 本地系统的 有着很大差异,所以不停的添加更多的基础架构来遮掩潜在的细节和问题。这是导致我们现在有:APOLLO NCS, SUN RPC, DCE, CORBA, DSOM , DCOM, EJB, SOAP 以及JAVA-RPC(仅仅举出能想到的),每一个标准在某些方面都比前边一个好一点,但是在另些方面又差一些,特别是足迹和复杂度上。但是他们加起来是0–因为没有任何一个底层架构可能掩盖这些分布式的问题–网络划分是总会有的,超时timeout是总会有的,远程主机和服务崩溃也是总会有的,零星分布的系统升级的需求和处理不同系统之间的版本号问题也是总会出现的。分布式系统程序员必须处理这些以及其他的事物–因为它们对于不同的应用程序总是不同的;没有任何隐藏,或者抽象化(abstraction)可以消除这些问题。
像我在最近的专栏中说过的一样:
复杂的分层需要维护漏洞百出的本地/远程的透明调用的幻觉,让人想起哥白尼之前的天文学家,是如何用复杂的等式来解释太阳,以及其他的星球是怎样绕地球旋转的。
使用C++,JAVA等语言的RPC系统也尝试使用比分布式系统更高度的连接性(coupling)。代表性的,一些接口定义语言(IDL)可以用来自动生成stubs/proxies/skeletons的代码–这些代码将本地调用转化为远程调用,但是没有人愿意自己手写或者维护这些代码。如果使用了别的IDL你需要重新生成代码,重新编译,重新测试,重新发布你的程序,而且你必须原子性的做这些操作,全部都重新来过或者一点都不改动–版本号是主要原因。在一个已经交付实施的产品系统中,全部系统中原子性(atomic)的升级需要巨大艰辛的工作量。总的来说,这种方法会产生易破碎的,高度耦合性(tightly-coupled)的系统。
这些系统在将IDL和你所采用的语言之间进行翻译时也会产生类型(type)不匹配的问题。如果一个IDL是极小的语言所以可以被很多程序语言采用,那么一些高级语言的特性(例如c++,java)就不能被使用。而当你将IDL语言强化,使他更接近高级语言时,将它翻译成C或者其他的更基础的语言就变的非常困难。更主要的问题是,无论你如何设计IDL的类型(type)系统,所有的系统不会–实时上是不能–清晰的映射到需要的程序语言中。这些导致在一个或更多的支持的编程语言中使用不符合习惯的编程,使用那些语言的开发者会很抱怨。如果你采用了特定的语言例如JAVA作为你的RPC IDL语言来解决类型不匹配问题,这样做只会对着一种语言有效,对于其他的语言仍是异常艰难。
仍有一些需求要求网线的两端的系统采用同样或者相似的底层架构。早期的程序员经常对此抱怨,例如当他们提起需要在所有程序底层架构中使用CORBA ORBs。如果你不能在所有的终端中采用绝对一致的底层架构,那么你需要架设在互用性标准上的可以互相操作的底层架构。这些,不幸的也是经常性的问题。CORBA的互相协作性最终表现的非常好–但是经过了10年的发展才达到。CORBA是从没有任何互相协作协议开始发展的(事实上最初它甚至没有列出任何网络协议),当我们忍受了数年的互相协作问题后,IIOP出现,然后两个协议和实现才都变得成熟了。
最终,RPC是一个漏洞百出的抽象(abstraction),它不能隐藏想要隐藏的,因此,它添加更多意外的复杂的做法可以轻易的将所有的问题变的更难处理。
在我之前的消息中我提到ERLANG采用了正确的解决办法。我认为这样做是对的,不仅因为它高效,直接的处理分布式,也是因为ERLANG没有向开发者隐藏这些难解决的问题。它通过提供处理超时,出错,版本号问题的特性来使程序员注意这些问题。严格意义上来说,ERLANG没有提供RPC,因为远程调用并不能真的看起来像本地调用一样。
翻译得很好
oh yeah. well done, PINKDAWN!
Write a Comment