Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (2024)

This article is the continuation of the Combine multiline logs into a single event with SOCK - a step-by-step guide for newbies” blog, where we went through multiline processing for the default Kubernetes logs pipeline.

Let's take a closer look at how the multilineConfigs option functions. This way, you can easily customize it using the standard OTel configuration to fit your specific needs. To fully understand it, we'll go through the operators and break down the default SOCK's filelog configuration into its basic parts.

Operators overview

The filelog receiver is the critical part of the Splunk OTel collector for Kubernetes log collection mechanism. The receiver is already heavily configured in the helm chart - if you’re interested in how this section looks for your version, run:

kubectl describe cm/<helm-app-name>-otel-agent

And look for the filelog section.

The config itself is very long, so today we’ll focus only on the operators section. Thanks to this capability we can create mini pipelines within the receiver itself to be able to process logs correctly based on certain criteria.

It might seem like a bunch of complicated, not-understandable statements at first. Don’t worry, we’ll break it down one by one, starting with…

Routers!

Look at the snippet of the first of SOCK’s filelog receiver operators:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (1)

Depending on the type of logs, we want to parse it accordingly. In the Kubernetes world, their format depends on the container runtime. In SOCK we support docker, cri-o, and containerd. The routes section is a simple substitute for the switch statement, well-known in the programming world.

In the above example, if the log body matches the regex format ^\\{, the log is being passed to another operator with the id of parser-docker, for ^[^ Z]+ it is parser-crio, and ^[^ ]+ goes to parser-containerd.

Let’s see one of them, parser-containerd:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (2)

We can see it is a regex_parser that parses logs to gather fields like time and logtag.

And that’s all, then filelog receiver executes operators one by one according to their sequence. But, if you’d like to pass the log to another operator based on your choice, you’d need only to specify the output field. The config would look like this:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (3)

Parsers

As we’ve shown in the previous example, regex_parser parses the string-type field configured in parse_from with regex created by the user. Thanks to that we can extract multiple attributes from one string in one operation.

It’s one of many parsers available for filelog operators, another example would be json_parser which is used in SOCK to parse docker logs timestamp:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (4)

In every parser, there are two options parse_from and parse_to. The default value of parse_from is body - which means a simple logline, and for parse_to it is an array of attributes.

In case your logs follow some other popular format, check out other parsing operators like syslog parser or CSV parser. All of them are described here.

Recombine

Finally, we reached the main point in our operator journey - the recombine operator is very powerful whenever you want to combine consecutive logs into single logs based on simple expression rules. Let’s take one of the examples from our SOCK’s config:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (5)

This means we combine logs whenever we encounter attributes.logtag set to F. logtag attribute was extracted from the log body before, in parser-containerd, you can go back to this example from the above.

Alternatively to is_last_entry you can configure is_first_entry - it should depend on if it’s easier to define the beginning or the end of the multiline block. source_dentifier tells what field should be used to separate one source of logs for combining purposes. In this example, we do it based on the log’s file path.

Multiline config for advanced users

multilineConfigs setting is fairly easy to use and doesn’t require knowledge about operators, but the drawback is that you can use it only on Kubernetes logs from the default logs pipeline. If you want to set up multiline processing in the pipeline written by you using the extraFileLogs option, you need to configure operators by yourself.

Let’s take a look at how the ultimate filelog config looks after applying the multilineConfigs rule, as we’ll need to manually do something similar. We’ll use a Java example here:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (6)

This values.yaml config produces the following operators snippet:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (7)

Which can be presented as a diagram:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (8)

In both operators, you can see the clean-up-log-record operation which moves attributes.log to body. This is necessary in the case of SOCK because of the processing it does at the beginning with parser-containerd, parser-docker, or parser-crio. The config will be even less complicated if you don’t use such a mechanism.

extraFileLogs configuration

Let’s start with the setup of a bare logs pipeline:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (9)

NOTE: preserve_leading_whitespaces option is necessary whenever your processing rule is based on leading whitespaces, if not set OTel will automatically trim your whitespaces.

Again we use the same Java log file that produces such a result in Splunk:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (10)

This time we need to apply multiline config as a part of filelog operators.

Scenario 1 - only recombine

In case we’re 100% sure our multiline config applies to all the logs collected by the pipeline, we can use one recombine operator without any complicated logic. For a directory composed of Java files, we can apply such a config:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (11)

source_identifier is attributes["log.file.path"] as this is the only differentiator we have at this point. Applying this config results in correct log processing:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (12)

Scenario 2 - recombine and a router

This time let’s combine two cases - java logs and the logs with a timestamp. Such a config requires defining a router to not unnecessarily run all the logs through the same operators. Additionally, in some cases running regex on a log that doesn’t match might result in runtime errors resulting in sending nothing to Splunk.

After applying this config:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (13)

Both files were processed correctly:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (14)

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (15)

Here an important thing to remember is that operators are being executed one by one, so we have to define noop (the operator that does not do anything) as an exit condition. The diagram looks like this:

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (16)

General notes

There are a few things to remember when creating your filelog pipeline:

  1. Every regex pattern must be double-backspaced. When the multilineConfigs option is used, this is automatically done by the Helm mechanism. Remember to do it manually or you might end up with such issues: failed to compile is_first_entry: invalid char escape (1:32)
  2. It is important to set output and default fields in routers. For example - if we didn’t do it in the previous example and the log would match the first expression it would be passed to the newline-processor, but then the timestamp-processor would be triggered as well - because it is next in a sequence.
  3. There are better ways to create expression patterns than attributes["log.file.path"] matches ".*java.*", we should generally avoid greedy regexes whenever possible. You can learn more about expression language to find out a better way that suits your needs.
Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users (2024)

References

Top Articles
Latest Posts
Article information

Author: Dong Thiel

Last Updated:

Views: 6096

Rating: 4.9 / 5 (59 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Dong Thiel

Birthday: 2001-07-14

Address: 2865 Kasha Unions, West Corrinne, AK 05708-1071

Phone: +3512198379449

Job: Design Planner

Hobby: Graffiti, Foreign language learning, Gambling, Metalworking, Rowing, Sculling, Sewing

Introduction: My name is Dong Thiel, I am a brainy, happy, tasty, lively, splendid, talented, cooperative person who loves writing and wants to share my knowledge and understanding with you.