Яндекс.Метрика

Juggler Data Processing Platform

Juggler is a real time stream processing platform designed for building both simple and complex event processing (CEP). Juggler uses Apache Messos, Kafka and T-streams to construct scalable and flexible processing algorithms. Juggler functions on the same principle as Apache Samza, but enables exactly-once processing and provides an integrated solution with a RESTful interface, JavaScript UI and an ad hoc repository for modules, services, streams and other data processing pipeline components.

Juggler's main differentiating feature as compared to other real-time stream processing systems such as Apache Spark Streaming and Apache Flink is that Juggler enables the creation of off-the-shelf generic processing modules that can be used by analysts and not just developers. The ultimate goal of this project is to build a fully declarative real-time stream processing environment in which programming is the exception rather than the rule. Using pre-created modules with flexible functionality, a Juggler user can create a processing sequence of arbitrary complexity, watch it in action and manage it via UI or REST.

Juggler's architecture is designed so that exactly-once processing is performed not only within a single processing block but throughout the entire sequence, starting from the moment stream events are fed the system and up to the moment when the output data is stored to conventional data storage.

The approach based on loosely coupled blocks with exactly-once processing support throughout the entire sequence allows for decomposing data processing in order to provide better modularity, performance management and simplicity in development.

Juggler supports four models of feeding events for processing:

  • no delay; this model dictates that the events be processed immediately after becoming available in the streams;
  • in a time slot where events are grouped into blocks, each a certain amount of seconds long; processing is scheduled based on overlapping or non-overlapping sliding time slots;
  • in a length-based slot where events are grouped into blocks, each a certain number of transactions long; processing is scheduled based on overlapping or non-overlapping sliding time slots;
  • in a real-time slot where events are grouped into blocks, each including transactions in a certain time interval; processing is scheduled based on overlapping or non-overlapping sliding time slots;

Juggler easily integrates with in-memory grid systems, for example, Hazelcast, Apache Ignite.

The systems is available under Apache License v2. All documentation on the system is available on the product web site.

Following technologies are used in project

  • Frameworks
    • Akka
    • Apache Mesos
    • Angular 2
    • Twitter Bootstrap
  • Data management
    • Apache Kafka
    • Apache Cassandra
    • Apache Zookeeper
    • MongoDB
  • Languages
    • JavaScript
    • Scala
  • Integration
    • SBT
    • Mantl
  • General
    • Git-flow
    • Scrum