Monitoring Kafka, Storm and Cassandra Together with SPM

Kafka, Storm and Cassandra — Big Data’s Three Amigos?  Not quite.  And while much less humorous than the movie, this often-used-together trio of tools work closely together to make in-stream processing as smooth, immediate and efficient as possible.  Needless to say, it makes a lot of sense to monitor them together.  And since you’re reading the Sematext blog you shouldn’t be surprised to hear that we offer an all-in-one solution that can be used for Kafka, Storm and Cassandra performance monitoring, alerts and anomaly detectionSPM Performance Monitoring.  Moreover, if you ship logs from any of these three systems to Logsene, you can correlate not only metrics from these three systems, but also their logs and really any other type of event!

Enough with all the words — here is an illustration of how these three tools typically work together:


Of course, you could…

View original post 319 more words

Hadoop YARN RPC (part I)


I have spent some time digging into YARN RPC source codes. Personally, I like the use of Factory pattern to inject different RPC proxy client protocol and server implementations to the framework. It looks way cleaner and better compared to the older versions of Hadoop.

For example: RpcServerFactoryPBImpl is the implementation of RpcServerFactory interface to create Protobuf RPC Server. Basically, it delegates the creation using the standard Hadoop RPC class.

Look at the following snippet in RpcServerFactoryPBImpl,

We could create a new type of RPC Server Factory called RpcServerFactoryMyOwnImpl that implements the above createServer method which would return our own RPC Server.

In YARN framework, HadoopYarnProtoRPC is the class uses these factories.
Basically, it calls RpcFactoryProvider.getServerFactory(conf) to get the right RpcServerFactory implementation.

To be continued…

View original post



There are different RPC engines available for Hadoop. They are WritableRpcEngine, ProtobufRpcEngine, and AvroRpcEngine. Every RPC engine implements the RpcEngine interface. As you see in the interface, any new RpcEngine has to provide implementations of getProxy, getServer, getProtocolMetaInfoProxy, and call methods.

The main difference among different RpcEngines is the data exchange wire format they uses. For example,
WritableRpcEngine uses Writable as the data exchange wire format whereas ProtobufRpcEngine uses Protobuf as the wire format.

If you examine the codes of these RpcEngines, you will notice that all of them have a static Server class which extends RPC.Server to inherit basic remote server networking services from the base abstract class. In my previous post, we look at the underlying networking mechanism in the RPC.Server. (eg. start Listener for incoming request and start multiple Readers to read incoming RPC data and queue RPC calls to be processed by multiple Handlers)

The static…

View original post 78 more words

Deep learning on the Raspberry Pi!

Pete Warden's blog


Photo by Clive Darra

I’m very pleased to announce that I’ve managed to port the Deep Belief image recognition SDK to the Raspberry Pi! I’m excited about this because it shows that even tiny, cheap devices are capable of performing sophisticated computer vision tasks. I’ve talked a lot about how object detection is going to be commoditized and ubiquitous, but this is a tangible example of what I mean, and I’ve already had folks digging into some interesting applications; detecting endangered wildlife, traffic analysis, satellites, even intelligent toys.

I can process a frame in around three seconds, largely thanks to heavy use of the embedded GPU for heavy lifting on the math side. I had to spend quite a lot of time writing custom assembler programs for the Pi’s 12 parallel ‘QPU’ processors, but I’m grateful I could get access at that low a level. Broadcom only released the technical…

View original post 138 more words

Check out what SDN can do! Google lets you load balance across regions


Google is adding two new storage and networking features to its Google Cloud Platform ahead of its user conference next week, both designed to make its cloud offerings faster and easier when compared to competing products from Amazon Web Services or Microsoft. Google is adding persistent flash storage, which my colleague Barb Darrow has already covered, and HTTP load balancing across regions.

The load balancing is a fulfillment of the hope for automatic shifting of compute resources from data center to data center without disrupting the workload. It offers developers the opportunity to scale up compute in certain regions closest to demand and could theoretically offer a developer a chance to follow the cheapest computing costs around the globe if Google offered something like spot pricing.

This is a pretty big deal, so I asked Tom Kershaw, product management lead at Google, how the company manages it. He credited…

View original post 202 more words

GemFire In-Memory Map-Reduce with Java 8

Jonas Dias

When I started studying Gemfire, I decided I needed some hands-on experience with it. I enjoy learning by experience, thus I decided to share my own experience with you. Gemfire is an in-memory distributed database and it supports both relational-based store (Gemfire XD) and object store (key-value hash maps, Gemfire). On my first hands-on I started with the object store approach. Gemfire works great for data streams and it has a good distributed cache mechanism, which is very easy to use. I used vmware vFabric documentation and also this quickstart.

I started creating my IDE project. Using Maven, I only needed to add the repository on my pom file:

Then, Maven takes care of downloading necessary Jar libraries and etc. Creating a Gemfire cache is quite simple. First you create a cache:

Then, you will be able to access the cache and its “regions” just as a typical java HashMap. To store a…

View original post 481 more words

GemFire functions with Java 8, Nashorn and Groovy

William Markito

GemFire functions offers a powerful and flexible way to send distributed work to multiple servers, where this work can be data-dependent as smart units of work that act in parallel on a given region or parallel on all available members of the system. This work can even be filtered to only work on a set of keys or only a sub-set of specified members, which can be really convenient according to the use case being implemented. For example:

✓ If you have some kind of external resource provisioning or initialization of a third-party services (as a Linux service for example) you may implement that wrapped into a GemFire function and distribute command on a sub-set of members.
✓ If you have a partitioned data set that you require to perform an aggregation or any kind of data processing, you can implement a data-dependent GemFire function.

These functions can leverage high…

View original post 2,242 more words