Steve Sun, developer and writer

How to write a special format configuration file parser

2022.05.08

Last week, I read this article - How to Write a Lexer in Go, I found that it is not so difficult to design a configuration file parser by this article’s mind-set. Then I try to write a fluent-bit configuration parser and got this Fluent-Bit configuration parser for Golang.

In this article, I want to introduce how to parse Fluent-bit configuration .conf file, and the thinking behind it is suitable for any other format file.

Fluent-bit configuration format and schema

[FIRST_SECTION]
    Key1  some value
    Key2  another value

[SECOND_SECTION]
    KeyN  3.14

Here is a classic mode configuration of Fluent-bit, it includes two key parts:

  • Section
  • Key/value pair

First of all, we need to define a struct which represents the Fluent-bit configuration file.

type FluentBitConf struct {
	Sections []Section
}

type Section struct {
	Name    string
	Entries []Entry
}

type Entry struct {
	Key   string
	Value interface{}
}

Once we have a struct, the next step is to parse tokens from file and save their values into golang struct. We can copy the logic of lexer to develop our own fluentbit parser.

In a lexer program, the target charectors which we want to parse out are called “Token”, Token is also the keyword which our parser program are searching for. A parser program will read charactors in a file one by one, whenever it found a token, parser save the value between tokens into the final structure and go ahead.

Parse a single token

If we want to parse Section, we have to make parser read charactors one by one and stop at [ charator, which means the beginning of a Section. Parser must save current state as t_section and keep parser reading until ] charactor, the word between [ and ] is the Section value we need to persist into go struct.


// define some tag to tell parser state
const (
	t_section = iota
)

func (parser *FluentBitConfParser) Parse() *FluentBitConf {
	var currSection *Section = nil

	for {
        // read charector one by one
		r, _, err := parser.reader.ReadRune()
		if err != nil {
            // stop at the end of file
			if err == io.EOF {
				if currSection != nil {
					parser.Conf.Sections = append(parser.Conf.Sections, *currSection)
				}
				return parser.Conf
			}
			return parser.Conf
		}
		switch r {
		case '\n':
			continue
		case '[':
			// save last config item
			if currSection != nil {
				parser.Conf.Sections = append(parser.Conf.Sections, *currSection)
			}
			// create new config item
			currSection = &Section{
				Name:    "",
				Entries: []Entry{},
			}
			parser.token = t_section
		default:
			if unicode.IsSpace(r) {
				continue
			}

            // here is important function, read the charectors after token-chareactor and save them into struct
			strValue, _ := parser.parseString()
			switch parser.token {
			case t_section:
				currSection.Name = strValue
				parser.token = t_entry_key
		}

	}
}

In function parser.parseString(), we have to read unitl the end of a value (for section, it’s ]), then return the value.

func (parser *FluentBitConfParser) parseString() (string, error) {
	var val string = ""

	if err := parser.reader.UnreadRune(); err != nil {
		return "", err
	}
	for {
		r, _, err := parser.reader.ReadRune()
		if err != nil {
			if err == io.EOF {
				return val, nil
			}
			return "", err
		}

		if parser.token == t_section && r == ']' {
			return val, nil
		}

		val = val + string(r)
	}
}

That’s all logic for parsing a section. To parse key/value pair is the same process, just note to make parser know which state it is and save values between whitespace or \n, you can see the code at the github repo.

Conclusion

To parse a configuration file, we have to

  • Defining token (key charectors)
  • Reading charectors and looking for token
  • Saving current state to tell parser which struct the following charectors belong
comments powered by Disqus