Creating a Good Developer Experience

One common aspect of building in and for a Service Oriented system is the idea of “Standardizing Developer Experience”.

What is Developer Experience ?

A Developer in a Service Oriented project has to deal with multiple concerns. Each such concern is vital to the success of the Project. These concerns are essential to managing the chaos in a Distributed system.

Let us look at some of these concerns :-

1. How does a Developer decide on the Size of a Service ?

2. How does a Developer ensure that she is building a consistent set of API Interfaces to her service

3. How does a Developer provide API Mocks or Stubs for other developers to use when consuming her service

4. How does a Developer provide consistent documentation for the Service APIs that she developed

5. How does a Developer help prevent failures for others as a result of using the Service she built

6. How does a Developer ensure she is providing Health check capability in the Service

7. How does a Developer ensure the service is ready for instrumentation and monitoring

8. How does a Developer know how to publish her service

9. How does a Developer able to provide consistent Test Cases and Test coverage for the service

10. How does a Developer get access to test the service she developed along with the other dependent services

11. How does a Developer able to deploy the Service anywhere, be it Private Data center or a Public Cloud service

12. How does a Developer able to consistently provide Authentication and Authorization capability to its service

13. How does a Developer manage and support the variety of Consumers / API Clients

14. How can a Developer reliably make a change to a code and ensure that this change does not affect other dependent services

15. How can a Developer ensure that a change can be moved to Production environment reliably without major human interference :)

16. How does a Developer ensure that the Service she built is easy for Debugging

17. How does a Developer know how much capacity the Service needs to be run

18. How does a Developer know how to scale the services horizontally and independentaly

19. How does a Developer reliably use the Good patterns when developing for a Distributed system like Circuit Breaker, Timeouts etc.

All these above patterns are essential for creating a Good developer experience. A Good Developer experience ensures better code quality, reliable and stress free changes to a system, and independent Teams.

As we develop the POC for demonstrating Service Design and Domain Isolation, it is essential for us to also show that this isn’t possible without creating a Good Developer experience.

The goal is not just to create an experience, but Standardize it.

What do I mean by Standardizing ?

Standardizing means making all these concerns a part of a standard template. For example, we know about Email templates. Email templates are used to help create consistent emails for all recipients. Similarly, Standard Templates for Developer experience will avoid mismatch amongst developers. Standard Templates for Developer experience will reduce the burden of managing all these ideas for a Developer. This will ensure that the Developer can quickly and reliably focus on building the actual Logic, instead of managing this mess.

The good part of this template is that it can be shared with anyone. Internal teams or Partners, anyone can have access to this template and build reliable code with certainity.

But, a Good developer experience cannot prevent Bad code :) It can help Good developers become better, and build consistent and standard system required for a successful Service Oriented product.

And creating good Developer experience is NOT expensive or time taking. It is easy with the wealth of tools available. All we need to do is to build this Template from the various tools available and bring them all together.

Revolution in Large Enterprise

In a few weeks, I will be consulting a large organization that is in midst of a revolution. A revolution that strives to induce new energy and innovation into the organization while dealing with its legacy and traditional mindsets. In fact, the major revelation as a part of this, is the idea that the organization is no longer a traditional brick-and-mortar company. Instead like its predecessors and competitors, it is destined to become a Platform company. It is destined to become a Technology company with deep roots in traditional business. It is finally destined to take IT as not just a cost-center or a support function, but as a key ingredient for its future.

The bigger challenges that surround this revolution is the idea to be able to maintain a delicate balance between the current short term targets and the bigger, bolder bets that it needs to make in the times ahead.

For a while now, the majority of activities surrounding this revolution is tactical, and less strategic. Partly, because of the increasing need to have the house in order, and target achievables that have missed the timelines multiple number of times.

Although the organization understands its position in the market, it still has miles to go in order to be able to gain significant presence in the new business areas that the company wishes to expand.

I, for most, will concentrate my energy into creating an engine of Software and Technology innovation that provides a standard boilerplate for all existing and new projects.

As I plan to own these activities, I have been looking around for ideas, inspiration and practices. Partly, this will help me to pile up my arsenal on how I would approach solving this technical riddle for this organization.

My research and curiosity leads me to multiple different lanes, each with its own ideology and practices.

One of my first involvement was the Lean Startup movement by Eric Reis, inspired by Steve Blank’s Customer Development Methodology. I borrowed some ideas around Lean Startup, and the MVP (Minimum Viable Product) in previous projects, and found them deeply useful to reduce noise and create essential focus when building projects. However, a common trail of disappointment that I faced was the constant withdraw of people on adopting these ideas in mainstream enterprise, primarily due to the fear of unknown or new. Where people got interested, they got deeply disappointed as they faced practical challenges and resistance from other teams.

I found interesting work by Telefonica in adopting Lean Startups ideology to create very useful and innovative new services for its customers. The Lean Enterprise book by Trevor Owens and Obie Fernandez was also a fantastic investment for me, helping me navigate through the particulars of what it takes to adopt Lean Startups in large organizations. I was able to resonate with the entrepreneurial framework that the book introduces for innovating and making their enterprises go Lean.

My next stop was IDEO, and its Human Centred Design Thinking. I was introduced to this Design Led thinking a couple of years back, thanks to a project I did while participating in Acumen-IDEO Course with some friends. My experiences with the course and the project was extremely exhilarating, and I found it to be an excellent framework to innovate and create demonstrable results in short amount of time. Although the focus of that course was Social Innovation, the learning from the project did hint me to believe that they are as useful in a large enterprise to solve problems.

Steve Blank, however had a very interesting take on the commonality and differences between the Lean Startup and Human Centred Design Thinking on his blog. He advocated that indeed both these processes are useful for large enterprises, but Lean Startups is more about getting their first, and then iterating. Design Thinking was more about getting it right.

I then looked at some essential readings that my friends and mentors recommended. The Goal from Eliyahu Goldratt, and The Phoenix Project by Kevin BehrGeorge Spafford,Gene Kim were my first stop. Both are highly recommended readings, and kept me going back to them on each step of my research. I have a laundry list of things to do, thanks to these two books. I am still amazed on how relevant and similar Operations Research and Lean Manufacturing is to modern IT and Technology development. I still have multiple bookmarks and To-Dos pending till date, even after going over the books a couple of times.

After having looked at ideas from Lean Startups in Enterprises, IDEO’s Human Centred Design Thinking, DevOps and Lean Manufacturing, I focussed on my other favorite topics – Cognitive Psychology and Behaviour Economics.

Cognitive Psychology and Behavior Economics have been a good casual readings for me since sometime, thanks to writers like Steven PinkerDan Areilly and Daniel Kahneman. One of the aspects that stuck to my thinking was how deeply we underestimate and undervalue the study of human psychology as an essential ingredient for creating a technology rich and innovative organization. After making some notes on the science behind how we think, perceive and decide, it was quite evident that the more we understand about what’s inside, the better we could build things and structures outside. More than ever, it provided me some interesting ideas on asking the right question, introspecting, and creating opportunities for adoption of new technologies. I had, in my past, found considerable trouble in teams to take risk, make bold bets and have faith in doing something innovative. One of my mentor, rightly said that the problems in most cases is not the technology, but the humans who develop and use it.

After spending multiple weeks and months into this, I am still dumbstruck with the exact answer to my original quest. Though I am aware that the answers will not be evident immediately, but I am glad to raise some key questions, and have pointers to experiment and evaluate.

In this quest, I have created a new Github project that will be part a collection of my notes, questions and inferences. The other part will be toolsets that I build and use as I work towards creating a Technology Platform for this organization.


What to look out for when building Micro Services

Sam Newman had a great presentation currently listed on Vimeo about “Practical considerations for Microservices Architectures”. This is from the 2014 edition of JavaZone conference in Oslo. I found the talk valuable for first-movers who are planning or in the middle of building / transitioning to Microservices based architecture.

I wanted to put together the important points covered in the talk as a checklist for myself when building Microservices, and hopefully this is useful for others as well.

A Summary of all the important points covered in the talk :-

  1. To understand what composes a Microservices, its important to know about Bounded Context as a way to define the Service Boundary. Eric Evans has a fantastic coverage on this in his book : “Domain Driven Design”.
  2. Microservices has more to do than just technology and new architecture style. It has to do with how organizations and teams are formed. A knowledge of Conway’s Law, and its implications is useful to understand the “people” part of this paradigm shift.
  3. The main goal for Microservices is to improve the speed of innovation by allowing heterogeneous technology choices, and building agility.
  4. Its important to standardize the gaps within the services, instead of worrying what goes inside a service. Common things to standardize between the services :-
      1. Interfaces – Restful over JSON for example.
      2. Monitoring – Application, Infrastructure
      3. Deployment and Testing
      4. Safety practices – making sure a service does not fail others in a system
  5. If everybody “owns” a service, then “nobody” owns the service. Make independent teams that are accountable to services. Having shared responsibility leads to reduced accountability. Assign a set of services to a team, and let them own it, and let them be responsible for decisions to build, operate and maintain.
  6. For shared services, it would be advisable to rotate the ownership to teams like a “rotating custodian”. Hence, these kind of services should have clear custodian model.
  7. Strong coupling as always is bad. Its advisable to avoid shared databases, serialization protocols to communicate across services. Instead use lightweight open protocols like REST which are resource oriented.
  8. A good tip to break down existing functionality into services : “Separate Databases before separating services”. This is a good thumb role, when isolating services, and a good model when breaking down monolithic systems.
  9. When thinking about designing the service behaviour and interface definition, a good advise is to adapt “Consumer” first approach – This means to think through the various types of consumers who would use the service, and what could they use it for. Planning for a API Documentation and Developer Test Console is also greatly advised. Tools like Swagger are useful in this context.
  10. Monitoring is an essential need, not just a last-minute thing in Microservices world. Investing in Monitoring across all layers and different constituents of a Microservices architecture is as important as developing services in the first place. A good start is to think how different monitoring information could be collected, aggregated and visualized. Sam recommends Logstash, Kibana stack from the Open Source world for log monitoring and analysis. Yammer metrics or Netflix’s Servo is a good pick if interested in Program counters. Graphite with statsd also stood out as good picks to be used when thinking about monitoring.
  11. Synthetic transactions or Semantic monitoring is an interesting way to check health status of a Production system. This could mean for example: having tests that create an order, cancel and return an order to check if everything is working fine or not periodically in an e-commerce system. We need to be careful in picking up these end to end tests to ensure we don’t do any thing destructive while verifying the end to end flow from time to time. One other tool which could be useful here is Mountebank, that allows quick development and testing for service stubs, useful when building service that needs to be tested in isolation.
  12. Using correlation id that is generated and passed along all upstream and downstream systems via logs is a good step towards helping debugging issues. Together with metrics in the log via Yammer Metrics, its a potent combination for devising call graphs when diagnosing issues with Service issues and latencies.
  13. Allowing teams to build their services independently needs a standard way of deployment to ease fast production rollouts. Therefore, one of the recommendations is to evaluate toolsets that abstract underlying deployment differences like Packer from HashiCorp. Container based deployment is also a common place using technologies like Docker.
  14. Independent teams building micro services frequently run into problems wherein they need to test a service without breaking other dependent service consumers. Change in service should not break Upstream or Downstream systems. Hence, the concept of Consumer Driven Contracts is recommended, wherein service consumers can specify their expectations as “tests”. These tests are run as “unit tests” when building the service. Tools like “Pact” could be used to test Consumer Driven Contracts.
  15. Reduce the tendency to have a large scope of release. This means to eliminate the need for targeting release of multiple services together, as it breeds coupling. Instead, we would need to try as much as possible to release services independently. The idea is to not let the change build up. Release one at a time, as often as possible.
  16. Usually in a multi service environment, cascading failures are a big risk. This can lead to conditions where a particular service failures could lead to downtime of the entire system. In most cases, the system should survive outages of service by working in a deprecated mode or with partial failure. There are multiple ways we could use to reduce cascading failures, mostly originating from the book “Release It” by Michael Nygard. This includes patterns like “Bulkaheads”, “Circuit Breakers” and “Timeouts”. Hystrix from Netflix is an interesting implementation of the Circuit Breaker pattern and is widely used.
  17. In micro services based system, the onus is on to move fast and strong ownership. To allow teams to independently own services and be accountable for the operation requires some sense of discipline and co-ordination. The essential idea is to have these teams to be able to leverage polyglot technology and techniques to deliver services with standard interfaces, monitoring and fail safe mechanisms. A common idea is to built Service template as Boilerplate that are self contained and are a good starter project for any service team. The Service template encapsulates common essentials for any service : Monitoring, Metrics, API tools, Fail Safe like Circuit breakers and deployment among others. This allows independent teams to be able to have the easiest way to do the Right thing – built polyglot services in a standardized non-chaotic manner. Netflix Krayon and DropWizard are good examples of Service templates.
  18. When starting with micro service orientation, its important to focus on – “How many services are too many ?” instead of “How big or small is the micro service ?”. Its preferable to gradually start with small set of services and ramp up as confidence improves. More number of services at initial stage would lead to the need for managing more moving parts, which may disappoint progress.

I think most of the ideas are very practical and definitely a must in the arsenal for a Microservices practitioner.

The upcoming book by Sam Newman titled “Building Microservices” should also be a great read, based on the content that he already covered in the aforementioned talk.



EC2 Deep Dive Notes

I recently chanced upon an interesting high level coverage of the insights that go into making EC2 Instances performant. This coverage was in a talk titled “Amazon EC2 Instances Deep Dive” at AWS re:Invent 2014, and was well articulated by the speakers.

I would like to share a couple of points that stood out for me. However, going through the video is more likely a recommended option, and good use of 40 minutes of your week.


Extracting the maximum juice from EC2 instances :-


  1. It’s important to understand the difference between PV and HVM. In short, PV refers to a modified Operating system that is Hypervisor aware, thus reducing the cost of translating the system calls into hypervisor calls. HVM refers to Hardware assisted Virtualization through support from the Intel VT extension. In HVM, the Operating system need not be modified.
      1. AWS Engineers recommended PV-HVM AMIs compared to just PV AMIs.
  2. It’s important to understand the difference between the new generation and class of EC2 instances. The new generation instances include c3, i2 and t2. t2 is among the most cheapest instances available in EC2 family.
      1. t2 leverages CPU Credits. CPU Credits provides full CPU core performance for a minute. Its important to understand how t2 uses CPU Credits, and ability to monitor this parameter for greater performance.
      2. Its also interesting to understand Burstable performance, and workloads that can thrive Burstable performance. t2 instances provide Burstable performance.
      3. i2 instances provide High I/O, especially for NoSQL Databases.
          1. AWS Engineers recommend to use 3.8.0+ kernel version to leverage High I/O Rate.
          2. The new Amazon Linux AMI are already running 3.8.0+
          3. Issuing TRIM using fstrim -a for “SSD” backed i2 instances if using new Kernels.
          4. For older kernels or Windows, reserve 10% of disk using partition and never touch it. This is refered to “Over Provisioning”
          5. TRIM and Over provisioning avoid Garbage Collection and improve performance of I/O.
      4. c3 instances provide High Compute with Enhanced Networking (SR-IOV)
  3. PV-HVM is better than PV based Xen modes. So using PV-HVM AMIs are recommended.
  4. Use TSC as Clocksource instead of xenpvclock by modifying kernel parameters.


Here is the video that covers it all, and is my pick for the week :-

Till then happy Optimizing !



Go beyond just being Cloud Ready : Be Cloud Native

 I recently got a chance to author a write-up for Newstack.IO about building Cloud Native applications. There is just so much happening around Cloud services and building Cloud native applications, that not a single day goes by without a mention on the internet. 

In the article, I have talked about some pattens and technologies that would be useful to consider as individuals and organizations embark on building applications on cloud platforms. 

Here is the link to the article :-

I am writing more on this soon, with more practical tips around technology stacks and open source projects which could be helpful.


Learning from Netflix – Part 2


In the last blog post, I had listed down the tools and practices introduced by Netflix in the presentation at AWS Re-invent 2013. In this second part of the blog series, I will attempt to uncover the real learning pointers that can be derived from such techniques and its effectiveness to any Cloud application developer.

1. Using a Cloud Provider is not same as using a hosting provider. It requires delicate planning, process and engineering efforts over a period of time. Without all this, an organisation cannot leverage all the benefits of a Cloud service.

2. Having an Agile infrastructure alone cannot solve problems if your developers have to perform too many rudimentary operations to use it. That also leads to another problem – Giving direct access to your developers and not be able to manage it effectively. AWS Admin Console is good from an Operations point of view (read System Admin, DevOps). One of the choices could be creating Development User Roles in the IAM, but the developers still have to live with the Ops view of the entire infrastructure. At a certain usage level of cloud services, organisations may want to build higher abstractions (read AWS Beanstalk style) over the existing functionalities, that reduces the effort of Developers. Netflix built Asgard as a useful abstraction over the AWS infrastructure to be able to provide powerful and easily consumable capabilities to its developers, thereby making them empowered. Workflow and Approvals could also be built to create the most minimal denominator for moderating access to the AWS infrastructure.

3. Avoiding Infrastructure sprawl is also one of the important needs for an organisation dealing with Cloud services. The elastic nature of infrastructure, over the period of time, tends to create resources which are no longer required or used. Having to deal with this manually means either to have approval and expiration windows, or to create your own “Garbage Collectors”. Netflix created their own version of Cloud Garbage collector with Janitor Monkey. But the secret sauce is the service called Edda. Edda records historical data about each AWS resource via the Describe API calls, and makes it available for retrieval via Search and as an input to Janitor monkey for disposing old resources which no one uses. Building an Engine that records historical data about the AWS resources and provides easy consumable interface to identify old resources is the first part of the puzzle. The second part is to use this information automatically to delete these resources when not needed.

4. Dealing with multiple environments and ever-growing infrastructure requires a deep discipline in how teams use the resources and perform day-to-day operations. Also, identifying who performed what operation on which resource is essential to the overall system monitors to identify fault situations and perform effective resolutions in time. Netflix way of handling this is by introducing Bastion machine as an intermediary / jump off to access the EC2 instances. This allows Netflix to moderate the access to the EC2 instances, implement audit trail for operations performed, and also implement security policies.

5. As a Service business models tend to provide organisations with flexible Opex model, but if not managed well, can soon create more problems than solutions. One of such issues is the unwarranted increase in utilisation and thereby total accumulated costs of cloud resources. Also, it is paramount for an organisation to be able to slice and dice the costs incurred across multiple divisions and projects if a common AWS account is used. A visualisation into how organisation uses AWS resources and costs incurred during the course of operation can be a very helpful for effective charge backing, and even reducing wastage. Netflix open sourced a tool by the name ICE( to provide visibility into the utilisation of AWS resources across the organisation, especially through the Operational expense.

6. One of the general rules of cloud native development and deployment is not to spend time on recovery. Essentially replacing supersedes recovery. Replacing can only win if the time and money to replace is dramatically more than recovery. Cloud and EC2 is a volatile environment like every other cloud provider. Hence, things can go wrong, and EC2 instances may vanish off in thin air. Hence, the rule of the game requires organisation to spend time on Automation. Automation includes fast replacement, and minimal time spent on replacement. One of the ways this is possible is by ensuring minimal number of steps are performed after instantiation of EC2 instances. This is possible via creating packaged AMIs that are baked with installation and configuration of the required application stack. Alternatively, using services like AWS CloudFormation and integrated tools like Chef, the entire process of bootstrapping the application could be scripted. If the time and cost of performing bootstrapping via scripts is more than the threshold, then baked AMIs work as a good choice. Baked AMIs, however, have drawbacks if frequent changes are required to be made on the application installation and configuration. In my experience, a balance of baked AMIs and bootstrapping scripts provides a good alternative. Netflix through the OSS tool Aminator allows baking of AMIs. One time effort of creating these AMIs lead to faster instantiation of EC2 resources. This can be also used with CloudFormation to fully automate the infrastructure & application provisioning.

7. Netflix has provided a good insight into how it leveraged SOA to accelerate its Cloud strategy. Eureka from Netflix is a good fit in the overall SOA infrastructure, that provides a Service Registry for loosely coupled services and dynamic discovery of dependencies. Different Services can lookup for other remote services via Eureka and in return get useful meta-data about services. Edda also help short-circuit the connection between co-located (in the same zone/region) services. Services in the same zone can talk to each other rather than talking to their distant counterparts (located in other zones / regions). Eureka also helps in recording the overall health of individual services, thereby allowing dynamic discovery of healthy service alternatives and thereby increasing the fault tolerance capability of the system as a whole.

8. One of my favourite applications in the Netflix OSS toolset is Edda. Although I briefly touched upon the service in previous points, I would still want to elaborate the learning from this tool. Edda as described in the last blog post, is a service that records historical data about each and every AWS resource that was used by the Overall System. Through continuous tracking of state information about each AWS resource, it creates an index of state history, and thereby allows one to identify the changes that has gone into the resource over the period of time. The possibility for this kind of tool is limitless. Not only it creates a version for all the cloud assets / resources an organisation uses, it allows search functionality on it, thereby allowing queries like “what changed in this resource over last few days” or “when was this property set to this attribute” ? All this helps in resolving complicated configuration problems, and can be used to perform analytics on how a Cloud resource changes over time. The output of analytics can then be used to perform better system design and effective use of AWS resources. Look here for more :-

9. I got introduced to “Circuit Breakers pattern” from the book Release It ! ( Michael Nygard provided a useful abstraction to contain further degrading of a system by failing fast. For instance, a service consumer calling a remote API is prone to many exceptional conditions like Timeouts, Service unavailable etc. Having to manage each and every such scenario across all layers of your code is the first hurdle that a developer has to go through. The second hurdle is to ensure the system does not keep going through the repeated process of failure realisation on a separate invocation of our service consumer. Circuit breakers can be configured with threshold number of such failures that can happen in these kind of situations. If the threshold number is crossed, circuit breaker comes to action, and returns a logical error without trying out the actual call to the remote API. This allows the system to be reactive and not to waste critical resources on retrying failed scenarios. A Circuit breaker dashboard can trigger alerts to the operations team letting them be aware of such scenarios, and plan for resolutions. The overall system however goes through a degraded performance without any actual blackout. Netflix created its own version of circuit breakers via Hystrix project. Together with Hystrix dashboard, it’s an effective tool to fit the arsenal of a Cloud Geek.

Learning from Netflix – Part 1

I finally found some time to go over this awesome introduction to Netflix OSS.

Netflix OSS has emerged as a comprehensive Platform for any organisation using AWS for its business. It has also inspired alternate cloud (read private cloud) enthusiasts. Netflix OSS is a bold vision of how application should be architected and designed on a cloud scale, and is no short of a standard in its field.

I present here a summary of the tools and techniques discussed in Netflix OSS and Netflix Cloud scale engineering :-
  1. Netflix uses 4 different AWS accounts – each for Dev/Test/Build, Production, Audit and Archive environment.
    1. Engineering team uses the Dev, Test and Build environment hosted under AWS Account #1
    2. The builds are promoted to the Production environment which is hosted under AWS Account #2
    3. Continuous backup of the Production environment is taken and hosted under AWS Account #3 used for Archiving. Usually the Backup includes RDS and Cassandra Backup
    4. A separate Account #4 is used for Infrastructure that will be used for auditing purposes, for eg:- SOX compliance. This is separate infrastructure and a separate account is used to isolate the developers to keep releasing their functionalities with minimal intervention from auditing requirements.
    5. Every weekend, Production Data is moved over to the Test Environment, so that the QA can start using the data the very next week for performing Test activities
  2. Two-factor authentication and IAM roles are standard for all access to AWS infrastructure (read: Good Practices for IAM)
  3. Security Groups are configured on each Resource ensuring the right ingress and egress filters.
  4. Netflix uses a Bastion machine that works as a Jump-off server for everybody in the engineering team to access EC2 instances. The Bastion machine is given exclusive access to the AWS infrastructure. This means that the Team cannot access the instances directly, and they have to SSH to Bastion machine to access EC2 instances.
    1. The Bastion machine is also used for sideways copying (copy from one EC2 instance to another EC2 instance). This means copying within inter EC2 instances is NOT allowed directly without Bastion. 
    2. Audit Trail and Command history can be enabled for Bastion machine allowing moderation for each user
  5. AnsWerS founder – Peter Sankauskas has done an awesome contribution to Netflix OSS by providing pre-baked AMIs for each of the Netflix components –
  6. Tool #1 : Asgard (
    1. Developer focussed AWS console that complements AWS Admin console (which is more Operations focussed)
    2. Used for Red/Black Deployment (
  7. All application state (including application sessions) is available in Cassandra and Memcached to allow stateless application behaviour
  8. Tool #2 : ICE (
    1. Visualizes detailed costs and usage of AWS Resources
    2. Billing reports are sent to Managers for detailed analysis of utilisation per Project / Division
    3. Ensures effective use of AWS resources as per Budget allocated
  9. Build pipeline is managed via Jenkins
    1. Uses Cloudbees for Continuous Delivery
    2. Whenever a developer pulls a repo, CloudBees runs all the test cases associated for the code revision and verifying the sanity of the build
  10. Tool #3 : Aminator (
    1. Used for baking custom AMIs that can be used for spinning up Application instances
    2. Takes a Base AMI as an input to the tool
    3. It uses an EBS volume that is mounted on a test EC2 instance while baking. The mounted partition is then chroot-ed, and all the relevant packages (as per the requirement of the application) are installed on this volume.
    4. This Volume is then used for creating the Custom AMI
  11. Tool #4: Eureka (
    1. Acts as a Service Registry
    2. Used to help discovery of Services and related meta-data
    3. All Services in a SOA environment registers with Eureka to help it know about following :-
      1. When the Service boots up
      2. When the Service is ready for accepting requests
      3. All the meta-data about the Service – like IP Address, Zones where the service is running
      4. Health information of each Service
    4. Allows for in-memory lookup of information about services
    5. Every 30 seconds registered Services have to update Eureka about its meta-data
    6. Interesting use-case #1:- Service A needs to call Service B and for that it needs to use Eureka to identify the Endpoint of the Service B. The meta-data for Service B can include IP addresses of its end-points. Service A can use this IP Address to directly call the Service B without performing DNS resolution.
    7. Interesting use-case #2:- Service A can identify the information about whereabouts of Service B’s endpoints which is closest. This means it allows for intra-Zone calls instead of Cross-Region calls between Service A and Service B.
  12. Tool #5: Edda (
    1. Records the historical data about state of each AWS resource
    2. Data about 30 days/15 days/7 days/3 days/yesterday/12 hour etc.
    3. Edda continuously uses Describe API call for each AWS resource
    4. Timestamps the Information retrieved from Describe API calls
    5. Netflix Tool – Janitor Monkey can use this information to identify stale resources that can be released
    6. The information is indexed and allows for Search
  13. Tool #6: Archaius (
    1. Archaius is a universal Property Console to set and read Properties
    2. Used to setup Properties across the entire Infrastructure
    3. Properties can also be bounded to a particular AWS Region
      1. For example:- Suppose we want our application hosted across multiple AWS regions to use a particular value for a property based on the origin of the request. If the application’s request comes from Europe, Archaius will give a different value of the Property compared to the other regions
    4. Performs global configuration management
    5. Uses Simple/Dynamo DB
    6. Uses Cassandra within Netflix – for multi-region support
  14. Tool #7: Priam (
    1. Performs deployment automation for Cassandra
    2. Self organises Cassandra cluster
    3. Priam is deployed on a Tomcat server on each Cassandra instance
    4. It helps manages a Global Cassandra Cluster
  15. Tool #8: Astyanax (
    1. Cassandra Client
    2. Has a range of Astyanax recipes
      1. For eg:- Replacing the Multi-part write functionality available in Amazon S3
  16. Tool #9: EVCache (
    1. Extends the functionality of Memcache
    2. Allows low latency data access
    3. Automates replication of data in different AWS zones
    4. When reads are issued, it gives you the data from the same Zone as the requestor
    5. Used for storing session state
  17. Tool #10: Denominator Library (
    1. Library that provides common Interface to AWS Route 53, UltraDNS, DynEct DNS etc.
    2. Programmatic access to various DNS functions
    3. Used for building Resilience in the DNS infrastructure
  18. Tool #11: Zuul (
    1. Routing layer for any requests to the service
    2. Groovy scripts can be configured for Pre, During, Post filters for API Requests
    3. Allows quick changes to be configured on API to test new behaviours
  19. Tool #12: Ribbon (
    1. Internal Request routing
    2. It allows for Common Client – Server communication
    3. Also encapsulates common expected behaviour for errors, timeouts
    4. Built in Load balancer configured as Zone-aware
      1. Takes Health of Service Dependencies across Zones
    5. It can also use Eureka service
  20. Tool #13: Karyon (
    1. Server Container that provides base implementation of common service abstractions of Netflix OSS :-
      1. Capability to connect to Eureka
      2. Capability to connect to Archaius
      3. Provides hooks for Monkey testing (Refer to Simian Army toolkit of Netflix OSS tools)
      4. Monitoring hooks for health and status information
      5. Exports JMX information
    2. Allows the Developer to be productive by preventing setting up of development infrastructure for common Netflix OSS components
  21. Tool #14: Simian Army (
    1. Chaos Monkey – kills instances randomly and can be configured
      1. Ensures new instances can be spawned to allow for replacement
      2. Ensures architecture is resilient to failures
    2. Janitor Monkey – works with Edda to clean unused resources on AWS
      1. Reduces money spent on AWS by deleting resources which are not being used
    3. Conformity Monkey – detects discrepancy across configuration and infrastructure on AWS
      1. For e.g.:- Is the AMI used for each instance same or different ?
      2. By alerting on discrepancy, devops can enforce standardisation
  22. Tool #15: Hystrix Circuit Breaker (
    1. Allows for Failing fast and recovering fast
    2. Encapsulates scenarios when failure happens while working with non-trusted dependencies
    3. Can be used to wrap a third party call with Hystrix wrapper and all the exceptional scenarios are handled
    4. Also encapsulates Circuit breakers (, and toggles ON/OFF for each Circuit Breaker
  23. Tool #16: Hystrix Dashboard using Turbine (
    1. Provides visual information about Circuit Breakers
    2. Identifies whether a service is healthy or not
  24. Tool #17: Blitz4J (
    1. Non-blocking Logging
    2. Built on the top of Log4J
    3. Allows for Isolation of application threads from Logging threads
  25. Tool #18: GCViz (
    1. Visualizer of Garbage Collector
    2. Allows for identifying change in GC behaviour when a particular JVM configuration is used
  26. Tool #19: Pytheas (
    1. Tooling framework
    2. Allows creation of Dashboards, Data Insights
    3. Archaius tool was built out of Pytheas framework
  27. Tool #20: RxJava (
    1. Functional Reactive framework
    2. Allows for building observables when dealing with multi-threaded systems

This is indeed a great learning about tools and techniques, and I hope to use some of them in my project to get a real hands-on with Netflix OSS. Next target – Architectural Learning from Netflix. This is coming soon.