Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very good comparison, stuff that most other blogs don't talk much about -- scheduling, fault isolation, garbage collection strategies. I guess they don't because other frameworks/languages don't provide that. It usually stays at syntax level with obligatory mention of generics.

Fault isolation and pauseless garbage collection is something that is very important in some contexts. Often the need for it becomes apparent the second time around, after one version of the system has been plagued by large mutable shared state bugs, or strange, un-predictable response times in a highly concurrent system.

Do you pay in terms of raw CPU performance by copying messages and keeping private heaps per lightweight process? Yes you do. There are no magic unicorns behind the scene. That is the trade-off you get for getting fault isolation and soft realtime properties. But keep this in mind, one of the biggest slowdowns a system incurs is when it goes from working to crashing and not working.

Also, no matter how strong the static compile checking is, your system will still crash in production. It is usually a hard bug that has been lurking around not a simple "I thought it was an int but got a string", those are caught early on. It will probably be something subtle and hard to reproduce. Can your system tolerate that kind of a crash in a graceful way? Sometimes you need that.

In the end, it is good for both systems to be around. If you need fault tolerance, optimization for low latency responses, supervision strategies, built-in inter-node clustering(node = OS process or instance of running Erlang BEAM VM) you cannot get that in any other way easily.

Now, one could think of trying to replicate some of the things Erlang provides. Like say build tools static analysis tools to check if go-routines end up accessing shared state. Or say, devise a strategy to use channels-based supervision tree. Heck, if you don't have too many concurrency contexts (processes, go-routines) you can always fall back on OS process-based isolation and use IPC (+ ZMQ for example), as a mailbox. But then again Erlang provides that in one package.



>Like say build tools static analysis tools to check if go-routines end up accessing shared state

I believe this is what the go race detector does: http://blog.golang.org/race-detector


That's not static analysis though.


To replicate erlang, is easier if used the Actor model? Or can be done with CSP?

Also, what is need to made a soft realtime language/runtime?


Good questions.

> To replicate erlang, is easier if used the Actor model? Or can be done with CSP?

There is overlap between then two. CSP is _usually_ synchronous and Actor model is asyncronous. CSP is focused on channels. As in you send messages to a channel and the other side receives them. Channel has an identity. In Erlang you send a message to a process (its main concurrency context). A process has a process ID used to identify it. Like say you have an address at your house and and I send a letter to you.

Goroutines don't have identity. You can't easily send a message to a goroutine. Kill a goroutine, see if it died to start another one and so on.

> Also, what is need to made a soft realtime language/runtime?

Quite simply you need one part of the system to not block other parts from making progress (returning a result). Imagine your serve a page to one user but because some other user is sent some input that take a long time to process, the first user doesn't get a response, he has to wait.

Or say you have some data structures and a common heap between concurrency contexts. A garbage collector might have to stop all goroutines in order to check if it can collect any garbage. So that introduces a pause. Java Azul JVM the only VM with shared that that has concurrent and pauseles garbage collector. It is very impressive, look it up how it works. In Erlang it is easy. Processes heaps are private to each process so they can be collected independently without getting in the way.


> Java Azul JVM

Oh I agree 100% here, I had a Java version of a project that went from 15k qps to 60k qps just by switching to the Azul JVM.

That said I still ended up crushing that with Nginx/LuaJIT and I didn't need a proprietary JVM that wouldn't work on some systems because of kernel modules it needed to install etc.


"Channel has an identity". In wikipedia (https://en.wikipedia.org/wiki/Communicating_sequential_proce...) say "CSP processes are anonymous", but don't tell about the channels. So, how can a channel have a identity? Why is usefull? If the channel is a type declaration, why I need something else? This is something that actor have (PID, because you send "emails") that I don't see why is necesary in CSP


> In Erlang it is easy. Processes heaps are private to each process so they can be collected independently without getting in the way.

But that comes at the expense of throughput, and it doesn't help with shared state (ETS)


Do you pay in terms of raw CPU performance by copying messages and keeping private heaps per lightweight process?

This can also reduce cache misses ping-ponging between CPU cores, including those incurred by a single overarching GC.


The problem is that in most systems you cannot completely avoid shared state. We usually move the responsibility to manage shared state into the DBMS, but the DBMS has to be written in some programming language as well.

As the amount of available RAM grows, in-memory architectures become more desirable, especially for analytics workloads. But how do you do in-memory analytics without shared state? It just doesn't scale.

What we need is a way to make access to shared state explicit, not the default. We also need pausless garbage collection or optional garbage collection so that large amounts of shared in-memory data doesn't cause latency problems.

Pauseless garbage collection conditioned on never using shared state is not good enough for the kind of systems I'm talking about.


This is exactly what ETS in Erlang provides :)


As I understand it, ETS is itself an in-memory database that comes with its own data model, but it doesn't let me build one out of my own data structures and algorithms, which is what I had in mind.

Also, I think, all data has to be copied in and out of ETS. It cannot be referenced in-place, which is going to slow things down quite a bit. (Correct me if I'm wrong. I have no first hand experience with ETS)


Cache misses between disparate CPU cores will also slow you down. Sharing memory is not without its efficiency drawbacks as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: