TIL: Creating a custom bufio.Scanner in Go

2021-12-04 00:00:00 +0000 UTC

The Go module bufio provides a struct and methods Scanner that allow for easy iteration over input from an io.Reader.

Built in functions of type type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error) include those that allow tokenizing on words, lines, runes, and bytes. You can also write your own SplitFunc and pass it as an argument to a scanner instance with scanner.Split(split SplitFunc) to tokenize on another condition.

For example, in a current project to implement a STOMP messaging protocol server I need to tokenize input based on null bytes, so I wrote this SplitFunc:

func ScanNullTerm(data []byte, atEOF bool) (int, []byte, error) {
	// if we're at EOF, we're done for now
	if atEOF && len(data) == 0 {
		return 0, nil, nil
	}

	if i := bytes.IndexByte(data, '\000'); i >= 0 {
		// there is a null-terminated frame
		return i + 1, data[0:i], nil
	}

	if atEOF {
		return len(data), data, nil
	}

	return 0, nil, nil
}

Which turns Alpha^@Beta^@Gamma\nDelta\Theta into the tokens Alpha, Beta, and Gamma\nDelta\nTheta.

Tags: til golang