Posts

Showing posts from 2014

Java 8 Dynamic Filtering with Lambdas

Dynamic Filtering Gist

REST API design

REST API design REST Best Practices REST tools and frameworks API Transformer Best Practices

Java Runnable & Serializable

Java Runnable & Serializable Runnable r = (Runnable & Serializable)() -> System.out.println("Serializable!");

Google Angular vs Google Polymer

Article Angular is a complete framework for building webapps, whereas Polymer is a library for creating Web Components. Those components, however, can then be used to build a webapp. Can Angular and Polymer be used together? Yes! You can use Polymer custom elements inside of an Angular app. Web components are just regular DOM elements, so they have attributes, emit events, and can contain child elements. What about the future of both projects? Angular and Polymer will remain separate projects with their own goals. That said, Angular has  announced  they’ll eventually move to use the Web Components APIs in their underlying architecture. For Angular 2.0, Web Components will work seamlessly within Angular apps and directives, and components written in Angular will export to Web Components to be used by Polymer or other libraries.

Aeron Messaging 6M mesgs/sec

Aeron: Do We Really Need Another Messaging System At its core Aeron is a replicated persistent log of messages. And through a very conscious design process messages are wait-free and zero-copy along the entire path from publication to reception. This means latency is very good and very predictable. It’s not a full featured messaging product in the way you may be used to, like Kafka. Aeron does not persist messages, it doesn’t support guaranteed delivery, nor clustering, nor does it support topics. Aeron won’t know if a client has crashed and be able to sync it back up from history or initialize a new client from history. Aeron - Martin Thompson Aeron - github Design Principles Garbage free in steady state running Apply Smart Batching in the message path Wait-free algorithms in the message path Non-blocking IO in the message path No exceptional cases in the message path Apply the Single Writer Principle Prefer unshared state Avoid unnecessary data copies

High Throughput

Martin Thompson HDR Histogram Batching for efficiency Observable state machines without locking - histogram state changes

Asynchronous retry pattern

Libary ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor(); RetryExecutor executor = new AsyncRetryExecutor(scheduler).     retryOn(SocketException.class).     withExponentialBackoff(500, 2).     //500ms times 2 after each retry     withMaxDelay(10_000).               //10 seconds     withUniformJitter().                //add between +/- 100 ms randomly     withMaxRetries(20);

Java 8 concurrent asynchronous lambda

Intellij recommends lambda alternatives where possible CompletableFuture       .supplyAsync(() -> validateRequest(id, headers))       .thenAccept(id -> processRequest(id, req))       .exceptionally(t -> handleException(throable, req)); supplyAsync returns a new CompletableFuture that is asynchronously completed by a task running in the  ForkJoinPool.commonPool() Thread Runnable:     Runnable r = () -> {...};    Thread t = new Thread(r);     t.start();     new Thread( () -> {...} ).start(); Predicate - boolean expression:     while (!acctStates.values().stream().allMatch(s -> {       return s.getState().equals(AcctState.OPEN) ;     }))     {       Thread.sleep(100);     } Local method call: getSomeCollection().forEach( this::processElement ); Stream, Filter, Map, to new Collection: Collection acctStates =         accts.values(). stream().filter (acct -> curUser.equals(acct.getOwner()))             . map (acct -> acctStates.g

wget and Install Oracle JDK 8 on VPS

wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u25-b17/jdk-8u25-linux-x64.tar.gz

Quorum Service Failure Tolerance

Image
For zero-downtime rolling-updates, services need to maintain a quorum while nodes are being added and removed. When a  node is added, the total member count and quorum size must be increased to avoid a potential split brain upon a network partitioning scenario if a failure or extended delay were to occur during a rolling upgrade. A complex service with many nodes may be continuously in a state of rolling-updates to upgraded OS or foundation software, add security patches, or deploy application bug fixes or new features. Assume we want a service to always be able to tolerate two node failures before halting for reliability/availability. From the table below, the minimum  is 5 initial member nodes. With less than 5 nodes two node failures cause service updates to halt in order to guarantee consistency.  When performing a rolling update a new node is added. When the new node is synchronized and brought into the quorum, the total member node count increases by one. This increases the n

Ubuntu gnome classic Add to launcher, menu, toolbar

I prefer the gnome classic interface for easy of use and productivity. I like the tiny launcher bar and utilities on the top and the tiny open apps bar on the bottom. Unfortunately adding back the gnome classic isn't the same as the original. Can't just right click to add new launchers anymore. Super(Window Key) + Alt + Mouse Right Click I can't stand the current Ubuntu Unity interface. Apparently I'm odd thinking that a windows menus don't belong divorced from the window itself. Yes, I can't stand the OSX interface either - if OSX supported highlight to copy and middle click to paste on more than just terminal window I might be a bit more tolerant. Rearrange launcher icons with:      Alt + Mouse Middle click and drag Remove launcher icon with:     Super(Window Key) + Alt + Mouse Right Click then Remove From Panel

Mount remote filesystem via ssh: sshfs

Ubuntu doc Install sshfs sudo apt-get install sshfs Add user to fuse group sudo adduser troy fuse Update config to allow other users to access remotely mounted directory sudo gedit /etc/fuse.conf uncomment allow_other Create mount point: sudo mkdir /media/troy/test1 Mount sudo sshfs -o allow_other -o ServerAliveInterval=120 troy@111.222.33.123:/home/troy /media/troy/test1/ UnMount sudo umount /media/troy/test1/ following did not work sudo fusermount -u /media/troy/test1 fusermount: entry for /media/troy/test1 not found in /etc/mtab

Fonts for readability

Fonts for readability

Java ProcessBuilder.exec and hanging processes and silent failures

When using ProcessBuilder to start consul and then other applications, processes were silently failing and hanging. DO NOT USE bash -i with consul. The -i is needed for some other process execs to initialize the environment. The following command appears to complete properly. After the processBuilder.start(), future process silently hang and fail for some unknown reason. Execution always worked properly from eclipse, but never from command line. Works from eclipse, but from command line caused future ProcessBuilder silent failures. String[] commandArray = {"/bin/bash", "-i", "-c", new "/opt/consul/consul agent -server -bootstrap -data-dir /tmp/consul -ui-dir /opt/consul/consul-ui/dist/"}; Works from eclipse and command line, no future ProcessBuilder failures. String[] commandArray = {"/bin/bash", "-c", "/opt/consul/consul agent -server -bootstrap -data-dir /tmp/consul -ui-dir /opt/consul/consul-ui/dist/"};

nginx - installation

Ubuntu 12.04 sudo apt-get install nginx will install everything, but might not be the latest version. get the latest: Download the latest version from http://wiki.nginx.org/Install install missing dependencies sudo apt-get install zlib1g-dev sudo apt-get install libpcre3 libpcre3-dev build and install nginx ./configure --prefix=/opt/nginx make sudo make install setup some permissions - optional cd /opt sudo chown -R root:opt nginx sudo chmod -R g+w nginx cd /opt/nginx start/stop/relolad configuration sbin/nginx -c conf/nginx.conf sbin/nginx -s stop sbin/nginx -s reload

nginx - persistent back-end server connections for performance

TCP open/close is expensive and limits throughput. Clients like browsers that can utilize persistent connections (keepalive) do not present a problem. IoT/M2M devices that are low power can't use persistent connections unless they utilize a gateway/proxy. NGINX has built-in support for handling the client TCP open/close and proxying the requests over a pool of persistent back-end server connections similar to Apache's AJP, but supposedly more efficiently. nginx configuration tested while watching  "sudo tcpdump -i 6.lo -nnvvXS port 8181"  to verify that nginx indeed does not repeatedly reestablish connections between nginx and back-end server. http {         ....     upstream foo_backend {         server localhost:8181;         # maintain a maximum of 10 idle connections to each upstream server         keepalive 10;     }     server {         listen       8080;         server_name  localhost; # serve anything from nginx_home/html

Ansible

AND/OR  - name: stop app   shell: pidof java | xargs -r kill -9 || /bin/true   when: pid.stdout != "" and (appUpdated|changed or configUpdated|changed) Overriding args to roles roles:    - { role: app_user, name: Ian    }    - { role: app_user, name: Terry  }    - { role: app_user, name: Graham }    - { role: app_user, name: John   } Conditional roles     - { role: hazelcast, when: target_env=='dev' } # hazelcast only required if providing service     - { role: nginx, when: target_env=='dev' }     - bootstrap # unconditional role Role Dependencies

File read/write in single line no libraries

Simple single line of Java without any library dependencies to read a file into a String and write a String to a file. Read file into String new String(Files.readAllBytes(Paths.get(filename))); Write String to file Files.write(Paths.get(filename), someString.getBytes());

Java hostname

  private static String getHostname()   {     try (BufferedInputStream in = new BufferedInputStream(Runtime.getRuntime().exec("hostname").getInputStream()))     {       byte[] b = new byte[256];       in.read(b, 0, b.length); //guaranteed to read all data before returning       return new String(b);     }     catch (IOException e)     {       String message = "Error reading hostname";       Log.error(message);       throw new RuntimeException(message, e);     }   }

Image/Environment build tools

Vagrant - creates VM dev/ops environments, i.e., VBox, VMware Packer - creates cloud instance images using other tools like chef, puppet, ansible, etc. Ansible - executes playbooks to install dependencies and configure OSs for specific purposes Terraform - creates and manages environments across disparate IaaS providers, resolves dependencies and order creation and changes

Load Generators

Cloud Locust Storm Forger App Apache JMeter HP LoadRunner ApacheBench Grinder Gatling

Inexpensive Hosting Options

Digital Ocean Atlantic.net

Low Latency Web

Low Latency Web The following kernel parameters were changed to increase the number of ephemeral ports, reduce TIME_WAIT, increase the allowed listen backlog: echo "2048 64512" > /proc/sys/net/ipv4/ip_local_port_range echo "1" > /proc/sys/net/ipv4/tcp_tw_recycle echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse echo "10" > /proc/sys/net/ipv4/tcp_fin_timeout echo "65536" > /proc/sys/net/core/somaxconn echo "65536" > /proc/sys/net/ipv4/tcp_max_syn_backlog The load generator is  wrk , a scalable HTTP benchmarking tool. wrk -t 10 -c N -r 10m  http://localhost:8080/index.html  where N = number of connections.  Apache Benchmarking tool

Service Discover and Configuration

Consul Start consul on localhost with UI consul agent -server -ui-dir /opt/consul/consul-ui -bootstrap-expect 1 -data-dir /tmp/consul http://localhost:8500/ui/dist/#/dc1/services

SSH login notification

Use SNORT Use /etc/profile Use pam_notify module Use auditd auditctl -A exit,always -S connect auditctl -A exit,always -S accept Monitor the /var/log/auth.log

Linux monitoring

htop - interactive process viewer atop - interactive load monitor top - interactive task monitor ss -s - connection counts and states

12 million concurrent connections - stock linux

http://mrotaru.wordpress.com/2013/06/20/12-million-concurrent-connections-with-migratorydata-websocket-server/

NoRouteToHostException: Cannot assign requested address

On socket s.setReuseAddress(true); check and set max number of file handles ulimit sudo su echo "1" >/proc/sys/net/ipv4/tcp_tw_reuse echo "1" >/proc/sys/net/ipv4/tcp_tw_recycle Other OS settings

Map lat/long

http://universimmedia.pagesperso-orange.fr/geo/loc.htm

HTML/CSS/JavaScript editor

http://jsbin.com/aqajoy/11/edit

Gradle

HowTo To use the gradle daemon and make it faster since it doesn't have to spin up a JVM every time: alias gradle='gradle --daemon'

Vertx Polyglot server - Java, JavaScript, Ruby, Python, Groovy, Scala

http://vertx.io Vertx appears to outperform node.js  by >= 2x for basic socket handling and also when serving small static pages RESTful service

Shark SQL for Hadoop, HBASE, Cassandra

Shark (SQL) uses Apache Spark which can read from hadoop, HBASE, Cassandra

Ansible and Salt

Ansible and Salt  are frameworks that let you automate various system tasks. The biggest advantage that they have relative to other solutions like Chef and Puppet is that they are capable of handling not only the initial setup and provisioning of a server, but also application deployment, and command execution.

Ephemeral ports

Ubuntu only uses ports >32768 for outgoing connections /usr/src/linux/Documentation/networking/ip-sysctl.txt ip_local_port_range - 2 INTEGERS         Defines the local port range that is used by TCP and UDP to         choose the local port. The first number is the first, the          second the last local port number. Default value depends on         amount of memory available on the system:         > 128Mb 32768-61000         < 128Mb 1024-4999 or even less.         This number defines number of active connections, which this         system can issue simultaneously to systems not supporting         TCP extensions (timestamps). With tcp_tw_recycle enabled         (i.e. by default) range 1024-4999 is enough to issue up to         2000 connections per second to systems supporting timestamps.

On prem - cloud managment like AWS

OpenStack Eucalyptus

Java REST service framework - DropWizard

DropWizard  - lots of boilerplate node.js would be far simpler

Open Cloud - Real-time Application Server (Rhino)

Open Cloud Rhino

Share Ubuntu drive on Mac

HowTo

Caching in-process vs distributed

Distributed caching is dead In-memory data grid

Node.js performance tips

LinkedIn node.js performance tips

Single Writer Design - Mechanical Sympathy

Mechanical Sympathy There is also a really nice benefit in that when working on architectures, such as x86/x64, where at a hardware level they have a memory model, whereby load/store memory operations have preserved order, thus memory barriers are not required if you adhere strictly to the single writer principle. On x86/x64 "loads can be re-ordered with older stores" according to the memory model so memory barriers are required when multiple threads mutate the same data across cores. The single writer principle avoids this issue because it never has to deal with writing the latest version of a data item that may have been written by another thread and currently in the store buffer of another core.

Intel 800Gbps interconnects

http://arstechnica.com/information-technology/2014/03/intels-800gbps-cables-headed-to-cloud-data-centers-and-supercomputers/

Mellanox OFED and Messaging Accelerator (VMA) RDMA/offloading

Mellanox OpenFabrics Enterprise Distribution (OFED) Messaging Accelorator (VMA) ConnectX-3 Virtual Protocol Interconnect VMA Tuning tips

RDMA zero-copy

Seems Socket Direct Protocol was declared obsolete/deprecated by OFA Alternatives are rsocket which can only be found referenced in IBM Java SDK JSOR Other alternatives include VMA , iWarp, and RoCE all supported in OFED VMA White Paper

Atlantic.net performance

uname -a ubuntu01 3.2.0-23-generic #36-Ubuntu SMP x86_64 cat /proc/cpuinfo | grep model\ name | uniq Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz cat /proc/cpuinfo cpu MHz : 1200.000 cache size : 15360 KB cpu cores : 6 ping between 1 GbE ports ~0.267mS ping between 40GbIB ports ~0.250mS ibping between 40GbIB ports ~0.305mS iperf between 1 GbE ports root@ubuntu01:~# iperf -c 209.208.8.163 -P4 ------------------------------------------------------------ Client connecting to 209.208.8.163, TCP port 5001 TCP window size: 23.5 KByte (default) ------------------------------------------------------------ [ 6] local 209.208.8.162 port 52414 connected with 209.208.8.163 port 5001 [ 3] local 209.208.8.162 port 52412 connected with 209.208.8.163 port 5001 [ 4] local 209.208.8.162 port 52413 connected with 209.208.8.163 port 5001 [ 5] local 209.208.8.162 port 52415 connected with 209.208.8.163 port 5001 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 377 MBytes

Infiniband configuration and testing

Add Infiniband modules to /etc/modules Either vi or echo/append them to /etc/modules modprobe modules to dynamically load them without restart ### Set the IB modules they may wish to use (these are just some of the available modules, but should get them started): IBMOD="ib_umad mlx4_core mlx4_ib mlx4_en ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_sa ib_mad ib_core ib_addr" Load the modules (now and during next boot). for i in $IBMOD ; do echo $i >> /etc/modules; modprobe $i; done Install opensm sudo apt-get install opensm Start opensm service opensm start Check it's running ps aux | grep opensm Determine the IB card model : lspci -d 15b3: Query IB interface status ibstat ibstatus ibhost ibswitches iblinkinfo Install other software ibverbs-utils libibcm1 librdmacm1 libaio1 Add/configure interface  vi /etc/network/interfaces and add auto ib0 iface ib0 inet static          address    

Atlantic.net - Full service hosting

I needed a special infrastructure configuration for testing a new architecture for high performance computing requirements. This architecture isn't offered through Amazon's Elastic Cloud, Google's Cloud, or Microsoft Azure. Atlantic.net's team took the time to understand our exact requirements and offered to build what we needed on their infrastructure at no additional cost. This team knows how to please. They've already earned my business. I was planning to move my personal hosting and business to Linode, however now I'll be moving to  Atlantic.net . They offer more value for the $$$ with virtual servers starting at $3.65/month which include the following features: Infiniband QDR 40Gb interconnect between servers SSDs 1Gb/s port 1000 GB / 1TB outbound transfer included Full root admin - Linux or Windows Dedicated IP Free nightly backup No commitments, no contract, not setup fee Redundant Tier-1 internet connections with automatic failover Redunda

Performance Metrics

From WhatsApp scalability talk Slides http://www.erlang-factory.com/upload/presentations/558/efsf2012-whatsapp-scaling.pdf Talk http://vimeo.com/44312354 pmcstat - processor hardware perf counters dtrace kernel lock-counting gprof fprof w/ & w/o cpu_timestamp BEAM lock-counting (invaluable) contention most significant issues FreeBSD backported TSC-based kernel timecounter gettimofday(2) calls much less expensive backported igp network driver had issue with MSI-X queue stalls syssctl tuning obvious limits (e.g. kern.ipc.maxsokets) net.inet.tcp.tcphashsize=524288 BEAM is erlangVM - lot of other info on that

AsyncIO

AsynchronousServerSocketChannel - Channels of this type are safe for use by multiple concurrent threads though at most one accept operation can be outstanding at any time . If a thread initiates an accept operation before a previous accept operation has completed then an AcceptPendingException will be thrown. AsynchronousChannelGroup specifies the thread pool to manage the async operation callbacks, if no Executor is specified, a default Need to pass in a custom ThreadFactory to AsynchronousChannelGroup or all the threads will have generic names and not be clear what they are for without inspecting the stack. Here are options for custom threadFactory. Server/Consumer class Consumer implements Runnable ..... class SocketAcceptHandler implements CompletionHandler ..... AsynchronousServerSocketChannel socket =           AsynchronousServerSocketChannel               .open(AsynchronousChannelGroup.withThreadPool(Executors.newFixedThreadPool(1))); socket.setOption(Standard

Unsafe - very fast serialization

Copy directly from an objects bytes into DirectByteFuffer http://java.dzone.com/articles/fast-java-file-serialization http://mishadoff.github.io/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/

PAXOS Concensus Protocol vs 2PC & 3PC

PAXOS is used by Google's Chubby , Apache Zookeeper , and FoundationDB Majority Concensus http://the-paper-trail.org/blog/consensus-protocols-paxos/ If the leader is relatively stable, phase 1 becomes unnecessary. Thus, it is possible to skip phase 1 for future instances of the protocol with the same leader. Multi-Paxos reduces the failure-free message delay (proposal to learning) from 4 delays to 2 delays. 2PC The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some cohorts will never resolve their transactions: After a cohort has sent an  agreement  message to the coordinator, it will block until a  commit  or  rollback  is received. 3PC The main disadvantage to this algorithm is that it cannot recover in the event the network is segmented in any manner. The original 3PC algorithm assumes a fail-stop model, where processes fail by crashing and crashes can be accurately detected, and d

How much does thread context cost

http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html

Threaded vs. Evented servers

http://mmcgrana.github.io/2010/07/threaded-vs-evented-servers.html

HPC Infiniband 40/56Gb vs 10GbE

http://www.mellanox.com/related-docs/case_studies/CS_Atlantic.Net.pdf Processors now have many hyper-threaded cores and lots of memory and cache. Standard high performance disk technology has lazy-write caches and battery backup for reliability. Disks are stripped/parallelized to alleviate them as performance bottlenecks. This means network I/O for Internet traffic, replication, caching, and disk access will typically be the most substantial bottlenecks. 40/56GbIB at $~5.6/Gb/s is currently lower than 10GbE at ~$11.5/Gb/s giving it better value. You can also run TCP/IP over IB too, IPoIB. There is a 40GbE switch for ~$208/Gb and even 100GbE, but nothing much available in 40Gb and nothing I could find in 100Gb. You can't get the 40/56Gb/s BW or lower latency of InfiniBand RDMA on 10GbE. RDMA supposedly exists on 10GbE, but after looking on Intel's site, I only found one card supporting it. Here is a paper which compares the performance of a custom key/value store, memcached,

Ring-buffer for high performance, reduced contention, parallel processing - LMAX Disruptor

LMAX GIT Disruptor Martin Fowler's write up Example code for v3.0 Queues are wrong for inter-process communication SEDA, Actor mechanisms bottleneck on contention Mechanical Sympathy - know how to drive the systems to get the most out of them DRAM not getting faster but is getting cheaper BW to memory increasing GHz race is over, CPU aren't getting faster bigger caches, more cores Networks getting faster Standard 10 gig-e can RDMA bypass kernel to transfer userspace memory between memory in sub 10 uS Java 7 SDP j-zerocopy User RDMA to HA DR replicate to another node Mechanical Disks have great sequential access/streaming SSD not much better for sequential access and single threaded Great for multi-threaded random access Disk controller is limited Standard SSD interface not very fast, new PCie much faster Fusion IO card very fast 10 gig-e can do process on one system to process on another system in 10s of us move data between cores in L3 cache for be

Distributed monitoring

Ganglia  - Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. Can be embedded

Screen Casting options

MAC Quicktime Player - free already installed Screenflow $100 - used by Lifehacker, simultaneously records screen, audio, and camera Camtasia $99 SnapzProX $69 Linux Kdenlive screen + Audacity for audio recordMyDesktop Wink Byzanz - records to gif which auto-plays

Load Balancing

GSLB Round-Robin DNS Anycast Redirects Cookies etc. http://www.tenereillo.com/GSLBPageOfShame.htm

Queuing

Apache Kafka Java Apache ActiveMQ Java JMS RedHat HornetQ Java JMS ZeroMQ 0MQ Kestrel JVM Scala Twitter Redis

Grid Computing

Open Grid Scheduler/Grid Engine http://gridscheduler.sourceforge.net/ SLURM: A Highly Scalable Resource Manager https://computing.llnl.gov/linux/slurm/ Univa took over Sun Grid Engine http://gridengine.org/blog/

Free web diagramming tool

Diagramly Saves to Google Drive or Dropbox No multi-user collaboration support

Google Compute Engine

Remote control of Google Compute via Javascript library: Auth List instance Create instance - insert API Delete instance Google APIs Console Help Keys oath playground apis-explorer

Amazon DynamoDB

AWS Tips I Wish I'd Known Before I Started DynamoDb is a noSQL schema-less storage. Each entity can have different attributes. Total item/entity size 64 KB (UTF-8), name and value count towards size Maximum tables 256/region/account, can request limit increase No practical limit in table size, bytes or items Attributes are name=value pairs and can be single value or multi-value set Attribute types: String, binary, number (38 digits of precision after the decimal point, and can be between 10^-128 to 10^+126), or sets, no empty values or sets Idempotent conditional updates: only applied if conditions are met, safe to retry in the event of failure to acknowledge Atomic counters: increment/decrement, not idempotent, use conditional for idempotent Consistent reads consume more resources, time, and cost more money (avoid them when possible) Must create indexes when tables are created, can't update, add or delete later Hash key distributes data among resources Range key

Serialization Benchmarking

https://github.com/eishay/jvm-serializers/wiki Kryo is the fastest and produces the smallest output. json/fastjson/databind  looks like one of the fastest json serializers https://github.com/alibaba/fastjson/wiki https://github.com/alibaba/fastjson http://mvnrepository.com/artifact/com.alibaba/fastjson