Archive

Archive for September, 2008

Erlang基本原理

September 30th, 2008 :: jackyz

大名鼎鼎的 Robert Virding 最近发布了一篇名为《The Erlang Rationale》(中文名《Erlang 基本原理》)的文档(进行中)。正如其名,这篇文档着力去描述一些 Erlang (长期缺乏文档的)重要的特征,及其背后的基本原理。需要说明的是,这并不是另外一篇关于 OTP 的文章,而是更多的关注于 Erlang 本身的语言和系统,也就是说——非常基本,而且非常重要。

One major point I hope to show here is that most of the features of Erlang, both the language and the system, are not isolated properties or were developed in isolation. They were designed to all interact with each other. For example: processes, process communication, distribution and error handling are all based on common principles which allow them to interact more or less seamlessly with each other; pattern matching, which is ubiquitous, is always the same irrespective of where it is used and is the only way to bind variables.

这样的文档极有价值,翻译它的价值也同样重大。同学们,动手翻吧!
本地下载[The Erlang Rationale]

news

关于SMP Erlang的一些事实

September 17th, 2008 :: jackyz

ICFP2008 召开在即,好东西纷纷出笼。下面这一篇是 Erlang/OTP Team 的 Kenneth 为 ICFP2008 准备的 SMP Erlang 相关 Topic 的主要内容。透过这样一篇文档,我们能够了解到 SMP Erlang 的一些内情、策略以及展望,其中对于 SMP Erlang 的不足以及潜在的性能陷阱,也提供了相当富有价值的绕路指引。

Here are some short facts about how the Erlang SMP implementation works and how it relates to performance and scalability.

There will be a more detailed description of how multi-core works and on the future plans available in a couple of weeks. I plan to include some of this in my presentation at the ICFP2008, Erlang Workshop in Victoria BC, September 27

The Erlang VM without SMP support has 1 scheduler which runs in the main process thread. The scheduler picks runnable Erlang processes and IO-jobs from the run-queue and there is no need to lock data structures since
there is only one thread accessing them.

The Erlang VM with SMP support can have 1 to many schedulers which are run in 1 thread each. The schedulers pick runnable Erlang processes and IO-jobs from one common run-queue. In the SMP VM all shared data structures are protected with locks, the run-queue is one example of a data structure protected with locks.

From OTP R12B the SMP version of the VM is automatically started as default if the OS reports more than 1 CPU (or Core) and with the same number of schedulers as CPU’s or Cores.

You can see what was chosen at the first line of printout from the “erl” command. E.g.

Erlang (BEAM) emulator version 5.6.4 [source] [smp:4] [asynch-threads:0] …..

The “[smp:4]” above tells that the SMP VM is run and with 4 schedulers.

The default behaviour can be overridden with the “-smp [enable|disable|auto]” auto is default and to set the number of schedulers, if smp is set to enable or auto “+S Number” where Number is the number of schedulers (1..1024)

Note ! that it is normally nothing to gain from running with more schedulers than the number of CPU’s or Cores.

Note2 ! On some operating systems the number of CPU’s or Cores to be used by a process can be restricted with commands. For example on Linux the command “taskset” can be used for this. The Erlang VM will currently only detect number of available CPU’s or Cores and will not take the mask set by “taskset” into account. Because of this it can happen and has happened that e.g. only 2 Cores are used even if the Erlang VM runs with 4 schedulers. It is the OS that limits this because it take the mask from “taskset” into account.

The schedulers in the Erlang VM are run on one OS-thread each and it is the OS that decides if the threads are
executed on different Cores. Normally the OS will do this just fine and will also keep the thread on the same Core throughout the execution.

The Erlang processes will be run by different schedulers because they are picked from a common run-queue by
the first scheduler that becomes available.

Performance and scalability
————————————

- The SMP VM with only one scheduler is slightly slower than the non SMP VM. The SMP VM need to to use all the locks inside but as long as there are no lock-conflicts the overhead caused by locking is not significant (it is the lock conflicts that takes time). This explains why it in some cases can be more efficient to run several SMP VM’s
with one scheduler each instead on one SMP VM with several schedulers. Of course the running of several VM’s require that the application can run in many parallel tasks which has no or very little communication with each other.

- If a program scale well with the SMP VM over many cores depends very much on the characteristics of the program, some programs scale linearly up to 8 and even 16 cores while other programs barely scale at all even on 2 cores. This might sound bad, but in practice many real programs scale well on the number of cores that are common on the market today, see below.

- Real telecoms products supporting a massive number if simultaneously ongoing “calls” represented as one or several Erlang processes per core have shown very good scalability on dual and quad core processors. Note, that these products was written in the normal Erlang style long before the SMP VM and multi core processors where available and they could benefit from the Erlang SMP VM without changes and even without need to recompile the code.

SMP performance is continually improved
——————————————————

The SMP implementation is continually improved in order to get better performance and scalability. In each service release R12B-1, 2, 3, 4, 5 , …, R13B etc. you will find new optimizations.

Some known bottlenecks
———————————

- The single common run-queue will become a dominant bottleneck when the number of CPU’s or Cores increase.
Will be visible from 4 cores and upwards, but 4 cores will probably still give ok performance for many applications. We are working on a solution with one run-queue per scheduler as the most important improvement right now.

- Ets tables involves locking. Before R12B-4 there was 2 locks involved in every access to an ets-table, but in R12B-4 the locking of the meta-table is optimized to reduce the conflicts significantly (as mentioned earlier it is the conflicts that are expensive). If many Erlang processes access the same table there will be a lot of lock conflicts causing bad performance especially if these processes spend a majority of their work accessing ets-tables. The locking is on table-level not on record level. Note! that this will have impact on Mnesia as well since Mnesia is a
heavy user of ets-tables.

Our strategy with SMP
—————————–

Already from the beginning when we started implementation of the SMP VM we decided on the strategy: “First make it work, then measure, then optimize”. We are still following this strategy consistently since the first stable working SMP VM that we released in May 2006 (R11B).

There are more known things to improve and we address them one by one taking the one we think gives most performance per implementation effort first and so on.

We are putting most focus on getting consistent better scaling on many cores (more than 4).

Best in class
—————–

Even if there are a number of known bottlenecks the SMP system already has good overall performance and scalability and I believe we are best in class when it comes to letting the programmer utilize multi-core machines
in an easy productive way.

by.Kenneth Erlang/OTP team, Ericsson

呼唤达人同学出来贡献此文的翻译版本。

update: 出去玩了几天回来,不得不惊呼“Erlang社区的达人真的很多啊!”。竟然有两份中文译版(都翻得不错哦,下下来研究下吧):

luoyi同学的:http://www.luoyilinux.cn/erlang_smp_zh.pdf
ShiningRay同学的:http://shiningray.cn/some-facts-about-erlang-and-smp.html

misc

package or not? it’s a problem

September 8th, 2008 :: jackyz

Erlang 的 maillist 这几天有这么一个帖子

Time to update programming rules?

7.7 Module names

Erlang has a flat module structure (i.e. there are not modules within modules). Often, however, we might like to simulate the effect of a hierarchical module structure. This can be done with sets of related modules having the same module prefix.

If, for example, an ISDN handler is implemented using five different and related modules. These module should be given names such as:
isdn_init
isdn_partb
isdn_…

We have packages! http://www.erlang.se/publications/packages.html

是的,这的确是一个看上去似乎是故意要让人迷糊的问题。一方面 erlang 允许使用 abc.bcd.cde 这样的 module 名称,另一方面 erlang 的 program rule 文档又建议大家在 module 上使用 flat name ,对于初学者而言,想不晕恐怕都很难。

name space 或者说 package 机制对于有 java 背景的人来说,是再熟悉也没有的了。它的好处就是“命名的自由”,“从前”的命名是一件很让人很苦恼事,又要担心“冲突”,又想足够“简洁”,常常让人挠破头皮而不得要领。但在 java 的世界里,这件事就变得异常简单——只要建了一个自己的 com.company.application 的 package,然后你就想怎么命名就怎么命名。想叫 String 也没有关系,大致不会冲突(只能是大致如此,仍有特殊情况)。而且,有了 eclipse 强大的 refactor 功能,你甚至还有了“随时改主意的自由”,又想出了更加牛叉的名字?直接改就是了。一个命令,全部自动搞定。……诸多好处不一而足,而所有这些荣耀,尽皆归于伟大的 name space 。

但从我个人的经验而言, erlang 的 package 虽说已经有了(雏形),但仍算不上成熟。回帖中的一个举例相当切中要害。

We can, and should, do MUCH MUCH better.

For example, if a module is going to be referred to elsewhere as a.b.c, and if we compile a module M by invoking “erlc M.erl”, then erlc a.b.c.erl should work. (It doesn’t.) Certainly if a module announces itself as “-module(a.b.c)” then compiling it should not produce a simple c.beam. (It does.)

就目前的状况而言,erlang 的 package 仍然只是“允许在 module name 中使用句点的 hack”而已,远远还谈不上“全面的 package 支持”。从 erlang/otp 自己的 souce code tree 来看,其代码的组织方式也并非“基于 package”,而是“基于 application”。在 package 这个问题上,现在的 best practice (到目前为止)仍然是“建一个 application 目录”然后在其中采用 flat module name 即 app_sys_module 这样的命名方式以避免命名冲突。

package 机制对于语言的应用繁荣非常关键(至少是大有裨益) —— cean 项目的诞生也许正是有感于此 —— 多个应用不再需要为了彼此之间有可能产生命名冲突而心存顾忌。这更利于代码重用,比较容易产生“多个应用彼此引用”的叠加效应(比如 mochiweb+couchdb 能重用 json 模块),而不是目前的“独立应用+runtime”模式。希望 erlang team 能够关注和改善这个问题。

misc

Google Chrome 崭新浏览器

September 2nd, 2008 :: jackyz

从“小道消息”变成“大道消息”的最新例子是 google 的浏览器项目 —— Google Chrome 浏览器看[这里]。

据说这个浏览器是开源地。
据说这个浏览器基于 webkit 。
据说这个浏览器内置一个任务管理器。
据说这个浏览器的JSVM叫做V8。
据说这个浏览器的标签长在头上。
据说这个浏览器的9月2日发布(没错,就是今天)。

还等什么,去[看热闹]不?

misc