Articles

High-Performance Noise-tolerant Motion Detector in Python, OpenCV, and Numba

Motion detection is often met in video analytics projects. It can be solved by comparing the variable part of the image with the unchanging, which allows distinguishing between the background and the moving objects. A simple motion detector can be easily found on the Internet, for example, at Pyimagesearch.com. This is a basic detector that does not handle:

  • environmental changes;
  • video stream noise that occurs due to various factors.

In this article, we will observe the implementation of an advanced Python-based motion detector that fits for the processing of noisy streams with high FPS expectations. The implementation relies on the OpenCV library for working with images and videos, the NumPy library for matrix operations, and Numba, which is used to speed up the part of operations that are performed in Python.

Noise is a natural or technical phenomenon observed in a video stream that should be ignored, otherwise it causes false positive detections. Noise examples:

  • glare of sunlight;
  • reflection of objects from transparent glass surfaces;
  • vibrations of small objects in the frame – foliage, grass, dust;
  • camera tremor due to random vibrations;
  • flickering lighting fluorescent lamps;
  • image defects due to low aperture, camera matrix quality;
  • scattering of the picture due to network traffic delays or interference when using analog cameras.

There are many variants of noise, the result is the same – small changes in the image that occurs even in the absence of actual motion in the frame. Basic algorithms do not process these effects. The algorithm presented in this article copes with the noise. An example of noise can be seen in the following video fragment with a frame rate of 60 FPS:

Read more ...

How to Run Low-Latency Jobs With Apache Spark

Apache Spark is a quite popular framework for massive scalable data processing. It is the heart of big-data processing systems for many companies which is very convenient for small computations in a single workstation, a corporate server or a high-performance cluster with thousands of nodes. Apache Spark has a very sophisticated design and, at the same time, an easy development model for software developers, which is especially important on early stages of the product adoption. The most attractive feature of Spark is that when the computations are designed well, Spark utilizes all the available compute capacity. Engineers don’t care about parallelization, multithreading, multiprocessing, and other stuff – all the magic happens inside Spark.

Read more ...

Improving InfluxDB with Apache Kafka

Why do we love InfluxDB? That’s because this is an outstanding product that allows working with time series easily. It provides high performance for both data insertion and retrieval. It offers us a SQL-like query language with convenient functions for processing time-series data (for example, a derivative of values). It is supported by convenient visualization tools, such as Grafana. It provides continuous queries that handle data aggregation on the fly. And also for the fact that one can get started with InfluxDB within a couple of hours.

Why do we avoid using InfluxDB in projects? That’s because the cluster solution is not open-source. One has to pay for scaling and fault tolerance by purchasing a license. There is nothing wrong with this, however, in the concept of developing software based on free software, when all infrastructure components must be published under open licenses, there is no place for commercial products. As a result, the introduction of InfluxDB in the critical places of information systems is impossible.

It’s interesting – Apache Cassandra, Kafka, HDFS, Elasticsearch, and many others provide clustered solutions for free which leads to their greater adoption in the projects.

In this article, we will illustrate how to use a supplementary Apache Kafka cluster to implement the scalability and fault tolerance of InfluxDB for popular use cases, without purchasing a commercial license for the InfluxDB cluster.

Read more ...

Long-lasting MySQL connections in Python and Tornado

From a DBMS point of view, the client connection is a valuable resource that must be carefully managed to control the use of system resources. In practice, all DBMS set the connection inactivity timeout, after which the given connection is closed unilaterally. Usually, applications will find out about this fact after the connection is broken.

In the case of MySQL, the client receives the message The MySQL server has gone away (error 2006). In this article, we will look at approaches that allow applications with long-lived connections to keep them alive as long as necessary. Examples will be provided for the standard MySQL database connection interface – mysql.connector. Sample Language is Python3, as an application example, microservice will be used, implemented on the Tornado framework.

Read more ...