Java Software Development

Posts

Showing posts from February, 2014

Atlantic.net - Full service hosting

February 28, 2014

I needed a special infrastructure configuration for testing a new architecture for high performance computing requirements. This architecture isn't offered through Amazon's Elastic Cloud, Google's Cloud, or Microsoft Azure. Atlantic.net's team took the time to understand our exact requirements and offered to build what we needed on their infrastructure at no additional cost. This team knows how to please. They've already earned my business. I was planning to move my personal hosting and business to Linode, however now I'll be moving to Atlantic.net . They offer more value for the $$$ with virtual servers starting at $3.65/month which include the following features: Infiniband QDR 40Gb interconnect between servers SSDs 1Gb/s port 1000 GB / 1TB outbound transfer included Full root admin - Linux or Windows Dedicated IP Free nightly backup No commitments, no contract, not setup fee Redundant Tier-1 internet connections with automatic failover Redunda...

Performance Metrics

February 28, 2014

From WhatsApp scalability talk Slides http://www.erlang-factory.com/upload/presentations/558/efsf2012-whatsapp-scaling.pdf Talk http://vimeo.com/44312354 pmcstat - processor hardware perf counters dtrace kernel lock-counting gprof fprof w/ & w/o cpu_timestamp BEAM lock-counting (invaluable) contention most significant issues FreeBSD backported TSC-based kernel timecounter gettimofday(2) calls much less expensive backported igp network driver had issue with MSI-X queue stalls syssctl tuning obvious limits (e.g. kern.ipc.maxsokets) net.inet.tcp.tcphashsize=524288 BEAM is erlangVM - lot of other info on that

AsyncIO

February 28, 2014

AsynchronousServerSocketChannel - Channels of this type are safe for use by multiple concurrent threads though at most one accept operation can be outstanding at any time . If a thread initiates an accept operation before a previous accept operation has completed then an AcceptPendingException will be thrown. AsynchronousChannelGroup specifies the thread pool to manage the async operation callbacks, if no Executor is specified, a default Need to pass in a custom ThreadFactory to AsynchronousChannelGroup or all the threads will have generic names and not be clear what they are for without inspecting the stack. Here are options for custom threadFactory. Server/Consumer class Consumer implements Runnable ..... class SocketAcceptHandler implements CompletionHandler ..... AsynchronousServerSocketChannel socket = AsynchronousServerSocketChannel .open(AsynchronousChannelGroup.withThreadPool(Exe...

Unsafe - very fast serialization

February 28, 2014

Copy directly from an objects bytes into DirectByteFuffer http://java.dzone.com/articles/fast-java-file-serialization http://mishadoff.github.io/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/

PAXOS Concensus Protocol vs 2PC & 3PC

February 26, 2014

PAXOS is used by Google's Chubby , Apache Zookeeper , and FoundationDB Majority Concensus http://the-paper-trail.org/blog/consensus-protocols-paxos/ If the leader is relatively stable, phase 1 becomes unnecessary. Thus, it is possible to skip phase 1 for future instances of the protocol with the same leader. Multi-Paxos reduces the failure-free message delay (proposal to learning) from 4 delays to 2 delays. 2PC The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some cohorts will never resolve their transactions: After a cohort has sent an agreement message to the coordinator, it will block until a commit or rollback is received. 3PC The main disadvantage to this algorithm is that it cannot recover in the event the network is segmented in any manner. The original 3PC algorithm assumes a fail-stop model, where processes fail by crashing and crashes can be a...

How much does thread context cost

February 24, 2014

http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html

Threaded vs. Evented servers

February 24, 2014

http://mmcgrana.github.io/2010/07/threaded-vs-evented-servers.html

HPC Infiniband 40/56Gb vs 10GbE

February 22, 2014

http://www.mellanox.com/related-docs/case_studies/CS_Atlantic.Net.pdf Processors now have many hyper-threaded cores and lots of memory and cache. Standard high performance disk technology has lazy-write caches and battery backup for reliability. Disks are stripped/parallelized to alleviate them as performance bottlenecks. This means network I/O for Internet traffic, replication, caching, and disk access will typically be the most substantial bottlenecks. 40/56GbIB at $~5.6/Gb/s is currently lower than 10GbE at ~$11.5/Gb/s giving it better value. You can also run TCP/IP over IB too, IPoIB. There is a 40GbE switch for ~$208/Gb and even 100GbE, but nothing much available in 40Gb and nothing I could find in 100Gb. You can't get the 40/56Gb/s BW or lower latency of InfiniBand RDMA on 10GbE. RDMA supposedly exists on 10GbE, but after looking on Intel's site, I only found one card supporting it. Here is a paper which compares the performance of a custom key/value store, memcached,...

Ring-buffer for high performance, reduced contention, parallel processing - LMAX Disruptor

February 19, 2014

LMAX GIT Disruptor Martin Fowler's write up Example code for v3.0 Queues are wrong for inter-process communication SEDA, Actor mechanisms bottleneck on contention Mechanical Sympathy - know how to drive the systems to get the most out of them DRAM not getting faster but is getting cheaper BW to memory increasing GHz race is over, CPU aren't getting faster bigger caches, more cores Networks getting faster Standard 10 gig-e can RDMA bypass kernel to transfer userspace memory between memory in sub 10 uS Java 7 SDP j-zerocopy User RDMA to HA DR replicate to another node Mechanical Disks have great sequential access/streaming SSD not much better for sequential access and single threaded Great for multi-threaded random access Disk controller is limited Standard SSD interface not very fast, new PCie much faster Fusion IO card very fast 10 gig-e can do process on one system to process on another system in 10s of us move data between cores in L3 cache for be...

Distributed monitoring

February 13, 2014

Ganglia - Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. Can be embedded

Screen Casting options

February 12, 2014

MAC Quicktime Player - free already installed Screenflow $100 - used by Lifehacker, simultaneously records screen, audio, and camera Camtasia $99 SnapzProX $69 Linux Kdenlive screen + Audacity for audio recordMyDesktop Wink Byzanz - records to gif which auto-plays

Search This Blog

Java Software Development

Posts

Atlantic.net - Full service hosting

Performance Metrics

AsyncIO

Unsafe - very fast serialization

PAXOS Concensus Protocol vs 2PC & 3PC

How much does thread context cost

Threaded vs. Evented servers

HPC Infiniband 40/56Gb vs 10GbE

Ring-buffer for high performance, reduced contention, parallel processing - LMAX Disruptor

Distributed monitoring

Screen Casting options

Load Balancing

Queuing

Grid Computing

Free web diagramming tool