Log aggregation using Kafka+ ELK perfect match.

Lyheng Tep
3 min readJul 15, 2023

--

Logging is the heart of applications where developers can understand what is happening in the system regarding errors or unexpected behaviors. In the meantime, more and more systems are becoming distributed, or hybrid which makes logs unmanageable. Therefore this post is for those looking to modernize their logging system.

What is log aggregation?

It is an operation that collects and consolidates log data from various IT infrastructures or micro-services into one central place.

What are the benefits of having log aggregation?

Continuous monitoring: Able to search, view, filter, group, and monitor log data in real time.

Gain insights: Analyze and understand user behaviors, transactions per hour, timely marketing, or innovation based on the current user trend.

Audit and compliance: Especially for regulator frameworks like PCI DSS or HIPAA require log retention for at least 1 year, by having log aggregation in place you can quickly get this done. Moreover, by taking advantage of it you can simply use this as a place to keep the audit trails.

How I setup log aggregation

There are 5 vital components for setting up log aggregation that I have been utilizing such as Filebeat, Kafka, Logstash, Elasticsearch, and Kibana.

Filebeat

It is a log shipper agent responsible for forwarding and centralizing log data. You can install it as an agent running along with your application server and specify a logfile or log path for it to monitor, then forward it to any output, for instance; Kafka, Elasticsearch, Logstash, …etc.

In my case, I use Filebeat to read log data and send it as a message to Kafka.

Apache Kafka

Apache Kafka is an open-source distributed event streaming used for stream processing, real-time data pipeline, and data integration at scale. It uses publish/subscribe model to help companies deliver a large volume of data with latencies as low as 2ms.

I used Apache Kafka for storing log messages that were read by Filebeat grouping by specific topics. By doing this, I can guarantee that the messages that were read by Filebeat were quickly sent to Kafka and will consume later by any consumers for example Logstash. Moreover, Kafka in this design is acted as a temporary storage while the primary data source is momentarily unavailable.

Logstash

Logstash is a data processing pipeline that takes and converts data from any source to your favorite data source.

Logstash here is being used as a Kafka consumer where it takes data from a Kafka topic transforms and indexes it before sending it to the Elasticsearch as a destination source.

Elasticsearch

Elasticsearch is a search and analytic engine for all types of data. It is known for its speed and scalability. You can also use Elasticsearch for a variety of use cases such as Application search, website search, logging, log analytics, application performance monitoring, and so on.

I use Elasticsearch for storing logs that were indexed and filtered by Logstash for displaying on Kibana which is the interface of our log aggregation system.

Kibana

Kibana is a charting tool, and interface for monitoring, managing, and securing an Elastic Stack cluster. It’s offering search and virtualized capacity for data indexed in Elasticsearch.

As the definition suggests, I use it for viewing and searching logs from Elasticsearch.

Conclusion

Finally, Log aggregation can help our current infrastructure scale regardless of logging concerns and the ability to monitor, search and analyze the root cause of the critical issues happening in the production faster than ever before. Also, remove worries about audits and compliances that are usually required by your organization.

If you find this article helpful, please follow for more topics in the future. Thanks.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Lyheng Tep
Lyheng Tep

Written by Lyheng Tep

Hi I am Lyheng. I am a software engineer from Cambodia. Coding is my passion. Find me at: https://lyhengtep.com

No responses yet

Write a response