Log Collecting with fluentbit
This is convert from a sharing of meetup on 13 Mar, 2025 (Slide: https://docs.google.com/presentation/d/1-BgxJXsCRrv8jV0UqEyucalbBCLfyH2RfTPsMJ_uJ2k/edit?usp=sharing)
Introducing fluent bit
Fluentd project is a CNCF Graduated project, aim for centralized log collection. Fluentd is written in Ruby & C with tons of plugins.
Fluent bit is a sub-project of Fluentd. It is completely written with C and come with much lower memory footprint compare with Fluentd.
| Attribute | Fluentd | Fluent Bit |
|---|---|---|
| Scope | Containers / Servers | Embedded Linux / Containers / Servers |
| Language | C & Ruby | C |
| Memeory | Greater than 60 MB | Approximately 1 MB |
| Performance | Medium Performance | High Performance |
| Dependencies | Built as a Ruby Gem, depends on other gems | Zero dependencies, unless required by a plugin |
| Plugins | Over 1,000 external plugins available. | Over 100 built-in plugins available. |
| License | Apache License v2.0 | Apache License v2.0 |
Source: https://docs.fluentbit.io/manual/about/fluentd-and-fluent-bit
I would suggest Fluentd for enrich or modify logging records while fluent bit for deploy on node for edge log collection.
Benefit of using Fluent Bit
Fluent Bit can be used to collect log from different location and send to one single centralized location for better query later. Fluent Bit is also able to parse log content to extract metadata. With the metadata, user would be able to modify or filter out the log records. The cost of ingesting log is currently around 0.5 per GiB nowadays.
Reason for pre-processing
Filtering and pre-processing would also reduce time in query the specific log you want. Filtering out log would help in preventing the storage become full in short time for on-perm deployment.
Common Use case
Direct to Destination

If the log message content is not too complicated and the amount of log generated is constant, sending the log directly to sink would be simplest.
Combined with Fluentd

Instead of sending to destination directly, it is possible to send the log to a centralized Fluentd with output type forward.
Log sent to fluentd could further process (e.g. remove all sensitive information).
This is mainly for the condition of having different teams responsible in different parts.
Combined with ELK

If ELK stack is being used currently, instead of removing everything, it possible to only replacing Filebeat with Fluent Bit. Fluent Bit provide support on output to Logdash directly. This way could help lower to the hard work for migration.
Buffer with Kafka and Post-Process with Fluentd

It is possible for having log generating in spike. In this condition, Kafka could be leveraged as buffer and using Fluentd / Fluent bit to write to the final destination sink.
How fluent bit works?

Input
Input is used for get where do the log located.
Common plugins:
- tail
- systemd
- forward
Ref: https://docs.fluentbit.io/manual/pipeline/inputs
Parser
Parser is used for parsing the log and extract out metadata.
Common plugins:
- JSON
- Logfmt
- Regexp
- Multiline
Ref: https://docs.fluentbit.io/manual/pipeline/parsers
Filter
Filter would be the place where modify and removal is performed.
Common plugins:
- grep
- Cloud metadata (aws/kubernetes)
- parser
- record modifier
Ref: https://docs.fluentbit.io/manual/pipeline/filters
Output
Output would put the log into the sink.
Common plugins:
- Object Storage (AWS S3 / GCP GCS / Azure Blob Storage)
- ElasticSearch / OpenSearch
- Graylog
- Loki
- OpenTelemetry
- Syslog server
- Kafka
- Forward (for sending to fluent bit / fluentd)
Reference
- Fluentd Website: https://www.fluentd.org/
- Fluent Bit Manual: https://docs.fluentbit.io/manual
- Fluent Bit GitHub: https://github.com/fluent/fluent-bit
- Builtin Config: https://github.com/fluent/fluent-bit/tree/master/conf