Steve Sun

Building a File Parser

Last week, after reading this article - How to Write a Lexer in Go, I found that it is not so difficult to design a configuration file parser by this article’s mindset. Then I tried to write a fluent-bit configuration parser and finally got this Fluent-Bit configuration parser for Golang.

In this article, I want to introduce how to parse Fluent-bit configuration .conf file, and the thinking behind it.

Fluent-bit configuration format and schema

[FIRST_SECTION]
    Key1  some value
    Key2  another value

[SECOND_SECTION]
    KeyN  3.14

Here is a classic mode configuration of Fluent-bit, it includes two parts:

  • Section
  • Key/value pair

First of all, we need to define a struct that represents the Fluent-bit configuration file.

type FluentBitConf struct {
	Sections []Section
}

type Section struct {
	Name    string
	Entries []Entry
}

type Entry struct {
	Key   string
	Value interface{}
}

Once we have a struct, the next step is to parse tokens from the file and save their values into golang struct. We can copy the logic of the lexer to develop our fluent bit parser.

In a lexer program, the target characters which we want to parse out are called “Token”, Token is also the keyword that our parser program is searching for. A parser program will read characters in a file one by one, whenever it found a token, the parser saves the value between tokens into the final structure and go ahead.

Parse a single token

If we want to parse a Section, we have to make the parser read characters one by one and stop at [ character, which means the beginning of a Section. The parser must save the current state as t_section and keep the parser reading until ] character, the word between [ and ] is the Section value we need to persist into go struct.


// define some tag to tell parser state
const (
	t_section = iota
)

func (parser *FluentBitConfParser) Parse() *FluentBitConf {
	var currSection *Section = nil

	for {
        // read charector one by one
		r, _, err := parser.reader.ReadRune()
		if err != nil {
            // stop at the end of file
			if err == io.EOF {
				if currSection != nil {
					parser.Conf.Sections = append(parser.Conf.Sections, *currSection)
				}
				return parser.Conf
			}
			return parser.Conf
		}
		switch r {
		case '\n':
			continue
		case '[':
			// save last config item
			if currSection != nil {
				parser.Conf.Sections = append(parser.Conf.Sections, *currSection)
			}
			// create new config item
			currSection = &Section{
				Name:    "",
				Entries: []Entry{},
			}
			parser.token = t_section
		default:
			if unicode.IsSpace(r) {
				continue
			}

            // here is important function, read the charectors after token-chareactor and save them into struct
			strValue, _ := parser.parseString()
			switch parser.token {
			case t_section:
				currSection.Name = strValue
				parser.token = t_entry_key
		}

	}
}

In function parser.parseString(), we have to read until the end of a value (for section, it’s ]), then return the value.

func (parser *FluentBitConfParser) parseString() (string, error) {
	var val string = ""

	if err := parser.reader.UnreadRune(); err != nil {
		return "", err
	}
	for {
		r, _, err := parser.reader.ReadRune()
		if err != nil {
			if err == io.EOF {
				return val, nil
			}
			return "", err
		}

		if parser.token == t_section && r == ']' {
			return val, nil
		}

		val = val + string(r)
	}
}

That’s all logic for parsing a section. Parse key/value pair is the same process, just note to make the parser know which state it is and save values between whitespace or \n, you can see the code in the Github repo.

Conclusion

To parse a configuration file, we have to

  • Defining token (key characters)
  • Reading characters and looking for a token
  • Saving current state to tell parser which struct the following characters belong