Showing posts from February, 2014 - Full service hosting

I needed a special infrastructure configuration for testing a new architecture for high performance computing requirements. This architecture isn't offered through Amazon's Elastic Cloud, Google's Cloud, or Microsoft Azure.'s team took the time to understand our exact requirements and offered to build what we needed on their infrastructure at no additional cost. This team knows how to please. They've already earned my business. I was planning to move my personal hosting and business to Linode, however now I'll be moving to

They offer more value for the $$$ with virtual servers starting at $3.65/month which include the following features:
Infiniband QDR 40Gb interconnect between serversSSDs1Gb/s port1000 GB / 1TB outbound transfer includedFull root admin - Linux or WindowsDedicated IPFree nightly backupNo commitments, no contract, not setup feeRedundant Tier-1 internet connections with automatic failoverRedundant powerRedundant HVACREST A…

Performance Metrics

From WhatsApp scalability talk


pmcstat - processor hardware perf countersdtracekernel lock-countinggproffprof w/ & w/o cpu_timestampBEAM lock-counting (invaluable)
contention most significant issues
backported TSC-based kernel timecounter gettimofday(2) calls much less expensive
backported igp network driver had issue with MSI-X queue stalls
syssctl tuning obvious limits (e.g. kern.ipc.maxsokets) net.inet.tcp.tcphashsize=524288

BEAM is erlangVM - lot of other info on that


AsynchronousServerSocketChannel - Channels of this type are safe for use by multiple concurrent threads though at most one accept operation can be outstanding at any time. If a thread initiates an accept operation before a previous accept operation has completed then an AcceptPendingException will be thrown.

AsynchronousChannelGroup specifies the thread pool to manage the async operation callbacks, if no Executor is specified, a default

Need to pass in a custom ThreadFactory to AsynchronousChannelGroup or all the threads will have generic names and not be clear what they are for without inspecting the stack. Here are options for custom threadFactory.


class Consumer implements Runnable .....

class SocketAcceptHandler implements CompletionHandler .....

AsynchronousServerSocketChannel socket =


Unsafe - very fast serialization

PAXOS Concensus Protocol vs 2PC & 3PC

PAXOS is used by Google's Chubby, Apache Zookeeper, and FoundationDB

Majority Concensus

If the leader is relatively stable, phase 1 becomes unnecessary. Thus, it is possible to skip phase 1 for future instances of the protocol with the same leader.
Multi-Paxos reduces the failure-free message delay (proposal to learning) from 4 delays to 2 delays.


The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some cohorts will never resolve their transactions: After a cohort has sent an agreement message to the coordinator, it will block until a commit or rollback is received.


The main disadvantage to this algorithm is that it cannot recover in the event the network is segmented in any manner. The original 3PC algorithm assumes a fail-stop model, where processes fail by crashing and crashes can be accurately detected, and does not work with network parti…

How much does thread context cost

Threaded vs. Evented servers

HPC Infiniband 40/56Gb vs 10GbE

Processors now have many hyper-threaded cores and lots of memory and cache. Standard high performance disk technology has lazy-write caches and battery backup for reliability. Disks are stripped/parallelized to alleviate them as performance bottlenecks. This means network I/O for Internet traffic, replication, caching, and disk access will typically be the most substantial bottlenecks. 40/56GbIB at $~5.6/Gb/s is currently lower than 10GbE at ~$11.5/Gb/s giving it better value. You can also run TCP/IP over IB too, IPoIB. There is a 40GbE switch for ~$208/Gb and even 100GbE, but nothing much available in 40Gb and nothing I could find in 100Gb. You can't get the 40/56Gb/s BW or lower latency of InfiniBand RDMA on 10GbE. RDMA supposedly exists on 10GbE, but after looking on Intel's site, I only found one card supporting it.

Here is a paper which compares the performance of a custom key/value store, memcached, an…

Ring-buffer for high performance, reduced contention, parallel processing - LMAX Disruptor

GIT Disruptor
Martin Fowler's write up

Example code for v3.0

Queues are wrong for inter-process communication
SEDA, Actor mechanisms bottleneck on contention

Mechanical Sympathy - know how to drive the systems to get the most out of them
DRAM not getting faster but is getting cheaper
BW to memory increasing
GHz race is over, CPU aren't getting faster
bigger caches, more cores

Networks getting faster
Standard 10 gig-e can RDMA bypass kernel to transfer userspace memory between memory in sub 10 uS
Java 7 SDP
User RDMA to HA DR replicate to another node

Mechanical Disks have great sequential access/streaming
SSD not much better for sequential access and single threaded
Great for multi-threaded random access
Disk controller is limited Standard SSD interface not very fast, new PCie much faster
Fusion IO card very fast

10 gig-e can do process on one system to process on another system in 10s of us

move data between cores in L3 cache for best performance

bytes read in cach…

Distributed monitoring

Ganglia - Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.

Can be embedded

Screen Casting options


Quicktime Player - free already installedScreenflow $100 - used by Lifehacker, simultaneously records screen, audio, and cameraCamtasia $99SnapzProX $69
Linux Kdenlive screen + Audacity for audiorecordMyDesktopWinkByzanz - records to gif which auto-plays

Load Balancing

Round-Robin DNS


Apache Kafka Java
Apache ActiveMQ Java JMS
RedHat HornetQ Java JMS
ZeroMQ 0MQ
Kestrel JVM Scala Twitter

Grid Computing

Open Grid Scheduler/Grid Engine
SLURM: A Highly Scalable Resource Manager

Univa took over Sun Grid Engine

Free web diagramming tool


Saves to Google Drive or Dropbox
No multi-user collaboration support