Apache Spark is a quite popular framework for massive scalable data processing. It is the heart of big-data processing systems for many companies which is very convenient for small computations in a single workstation, a corporate server or a high-performance cluster with thousands of nodes. Apache Spark has a very sophisticated design and, at the same time, an easy development model for software developers, which is especially important on early stages of the product adoption. The most attractive feature of Spark is that when the computations are designed well, Spark utilizes all the available compute capacity. Engineers don’t care about parallelization, multithreading, multiprocessing, and other stuff – all the magic happens inside Spark.
Recent Posts
Improving InfluxDB with Apache Kafka
Why do we love InfluxDB
? That’s because this is an outstanding product that allows working with time series easily. It provides high performance for both data insertion and retrieval. It offers us a SQL-like query language with convenient functions for processing time-series data (for example, a derivative of values). It is supported by convenient visualization tools, such as Grafana
. It provides continuous queries that handle data aggregation on the fly. And also for the fact that one can get started with InfluxDB within a couple of hours.
Why do we avoid using InfluxDB
in projects? That’s because the cluster solution is not open-source. One has to pay for scaling and fault tolerance by purchasing a license. There is nothing wrong with this, however, in the concept of developing software based on free software, when all infrastructure components must be published under open licenses, there is no place for commercial products. As a result, the introduction of InfluxDB
in the critical places of information systems is impossible.
It’s interesting – Apache Cassandra, Kafka, HDFS, Elasticsearch, and many others provide clustered solutions for free which leads to their greater adoption in the projects.
In this article, we will illustrate how to use a supplementary Apache Kafka
cluster to implement the scalability and fault tolerance of InfluxDB
for popular use cases, without purchasing a commercial license for the InfluxDB
cluster.
Wrapping Python applications into self-extracting executable files using Pyinstaller
In this short article, we will explore a simple way that allows you to distribute applications created in Python as thick, self-extracting archives that look like simple executable files and contain all the environment and dependencies necessary to run an application.
CloudStack-UI 1.411.29 is Out
The overview includes a description of the main enhancements in Release 1.411.29. The key feature of the iteration is the introduction of a new plugin that allows users and administrators to manage resource limits and quotas for accounts. Besides, we enhanced the Log View plugin and such interface components as snapshot management, UI settings, API key management, error messages displaying, Security Group management, Service Offering chooser, as well as fixed a range of bugs for the Pulse plugin and the whole system. Below you will find details on all improvements and fixes of this release.
CloudStack-UI 1.411.29 is Out
The overview includes a description of the main enhancements in Release 1.411.29. The key feature of the iteration is the introduction of a new plugin that allows users and administrators to manage resource limits and quotas for accounts. Besides, we enhanced the Log View plugin and such interface components as snapshot management, UI settings, API key management, error messages displaying, Security Group management, Service Offering chooser, as well as fixed a range of bugs for the Pulse plugin and the whole system. Below you will find details on all improvements and fixes of this release.