Developing Apache Kafka Producers and Consumers

All Things Hadoop

I gave a presentation recently on Real-time streaming and data pipelines with Apache Kafka.

A correction in the talk (~ 22 minutes in) : I said that you have to have all your topic data fit on one server.  That is not true, you can’t span logs so you have to have all of your data for a partition fit on one server.  Kafka will spread your partitions around for you within topics.

For that presentation I put together sample code for producing and consuming with an Apache Kafka broker using Scala.

To get up and running, use vagrant.

1) Install Vagrant
2) Install Virtual Box

Your entry point is the test file

On the producer side I have started to look more into using Akka. The prototype for this implementation is in the test case above…

View original post 44 more words

The last difference between OpenJDK and Oracle JDK

Technology and fun

Recently I’ve spent a lot of time investigating font rasterization (a great topic which deserves a separate post). Most applications use font engine which is built into graphics library or widget toolkit. Only few cross-platform applications which badly need to provide consistent text layout (Acrobat Reader, for example) are using their own font engines (like Adobe CoolType). Java platform is one of such applications, since it has its own graphics library. If you are curious take a look at this article comparing font engines, including one from Java platform. From publically available information I understood that OpenJDK uses FreeType library. I thought: “That’s great, I have JDK 1.7 installed so this library must be there, let’s take a look”. But I could not find any traces of freetype.dll in JDK. I was puzzled and tried to find some answers in sources of OpenJDK. Imagine my surprize then I’ve found…

View original post 257 more words

My experience of learning R – from basic graphs to performance tuning

Mani's fun & useful blogs


R as some of you may know is a statistical and graphics programming language (see Wikipedia [1]) used by academia and recently by IT professionals of our ever growing software industry. There is a sudden demand for Data Scientists, Data Analysts and Statisticians with a background in R among other things data and development related subjects.

I have been fortunate to work with such a programming language, even though I haven’t had any prior experience working with such a programming language and moreover with Data Scientists. My interest in Mathematics and affinity for numbers drew me to learning it, and with further help of Herve Schnegg our in-house Senior Data Scientist, I was able to pick a fair bit of the subject.


R is a mix of a object-oriented programming, Clojure-like functional programming, Javascript-like style of writing code and a Smalltalk-like programming interface. And…

View original post 2,942 more words

Go vs D vs Erlang vs C in real life: MQTT broker implementation shootout.

Átila on Code

At work we recently started using the MQTT protocol, which uses a publish / subscribe model. It’s simple in the good way and well thought out. We went with an open source implementation named Mosquitto. A few weeks ago on the way back from lunch break my colleague Jeff told me he was writing an MQTT broker in Go, his new favourite language. We’re using MQTT at work, I guess he was looking for a new project to write in Go and voilà. It should be a good fit, after all, this is the type of application that Go was made for. But hubris caught up to him when he uttered “And, of course, it’ll be super fast. It won’t even be fair to other languages”. I’m paraphrasing, but that’s how I remember it. You can read Jeff’s account here.

I’m not a fan of Go at…

View original post 1,180 more words

Large Java Heap with the G1 Collector – Part 1

Matt Pouttu-Clarke's Blog

Demonstration of efficient garbage collection on JVM heap sizes in excess of 128 gigabytes.  Garbage collection behavior analyzed via a mini-max optimization strategy.  Study measures maximum throughput versus minimum garbage collection pause to find optimization “sweet spot” across experimental phases.  Replicable results via well-defined open source experimental methods executable on Amazon EC2 hardware. 

Experimental Method


  • Demonstrate maximum feasible JVM size on current cloud hardware (specific to Amazon for now) using the G1 Garbage Collector (link).
  • Vary the JVM heap size (-Xmx) exponentially to find performance profile and breaking points.
  • Vary the ratio of new versus old generation objects exponentially.
  • Using in-memory workload to stress the JVM (avoids network or disk waits).
  • Produce replicable results on commodity hardware, open source operating systems, and open source tools.
  • Provide gap-free data for analysis, in spite of garbage collection pauses.

Not (Yet) in Scope

In followup to this study, subsequent efforts may…

View original post 1,671 more words

Sorting Algorithms

Coping With Computers

Hope everyone had a wonderful Thanksgiving holiday! Right before the break, I had the opportunity to go up to MIT for a program called Splash. In this, students can spend their Saturday and Sunday taking classes taught by MIT students. I had many interesting classes up there, and topics may find their way into posts I write. The basic idea behind this one came from my Interactive Algorithms class, where we acted as elements in a list and moved around, instead of simply writing down pseudocode. Here, we’ll be taking a less interactive approach to sorting algorithms.


One of the first types of algorithms students are taught in a computer science class (after learning some basics and information about Big O notation) are sorting algorithms. Sorting algorithms are methods used to organize a group of objects in a specific order based on some set  of characteristics. When we first…

View original post 1,426 more words

Algorithm – Knapsack

Sada Kurapati

This is one of the optimization algorithm where we try lot of options and optimize (maximize or minimize based on the requirement) the output. For this algorithm, we will be given list of items, their size/weight and their values. We have a knapsack (bag pack) with a limited size/weight and we need to find the best choice of items to fit those in the knapsack so that we can maximize the value.

One of the examples where we can use this algorithm is while preparing for travelling (back packing for trekking or etc). We wish we can take everything but we will have limited backpack and need to choose the best items. One other classic example is a thief trying to rob a house or a shop. He can carry limited items and he needs to take the best choice so that he can maximize the value. (If he is…

View original post 855 more words

Books that make you a better programmer

Cafe Affe

Computer Science is a fascinating area. And there are many enlightening books that make it more fun to study and explore stuff. The net is filled with such lists, that tell you what all books should there be in your book shelf if you are a programmer or trying to learn to program. This is a similar list from my shelf. Either I have read (some chapters) or reading at present. Will keep updating as I come across new enlightening books.

1. Introduction to Algorithms, CLRS : Found in almost all must-read-computer-science-books lists. A misnamed book. It’s beyond introduction, and almost a bible of fundamental algorithms. Does contain the necessary maths and theorems. In certain chapters it might feel a bit too theoretical, but makes a great reference book. Common data structures and algorithms like sorting, searching, trees, graphs etc., are perfectly explained with diagrams. Surely a must have…

View original post 640 more words

Android Systems Engineering: A Quick Look Under the Hood

Tom W Wolf


Most of us we have become comfortable with the understanding that Android is a very specialized and customized version of Linux. The resulting level of customization does however render the OS sufficiently different that most general purpose Linux Systems Engineering expertise is of limited value. We also need to acknowledge that this simply means we need to get back to the foundations and explore / relearn from the bottom up (or top down – depending on your perspective).

If we take a quick survey of the multitudes of Android devices available, it is clear that there are both similarities and significant differences. Some of these are cosmetic and some are not, and one way to understand which is which,  is by dissecting the construction of the OS – and identifying where these modifications were interjected. From the bottom up we have:

  • The Android Kernel: The Android kernel is…

View original post 1,363 more words