Patching ripgrep to enable per-directory configuration

2023-03-10 00:00:00 +0000 UTC

ripgrep is a fast greplike tool written in Rust. A friend mentioned using rg on repetitive tasks in a project with certain command-line arguments repeatedly. rg supports applying a configuration file by setting the environment variable RIPGREP_CONFIG_PATH. This is the only way to specify a ripgreprc.

As a first step in learning a bit about Rust, I wrote a small patch to add additional configuration sourcing options to rg.

ripgreps current config functionality

Here’s the public API for the config module in ripgrep’s core crate:

pub fn args() -> Vec<OsString> {
	let config_path = match env::var_os("RIPGREP_CONFIG_PATH") {
		None => return vec![],
		Some(config_path) => {
			if config_path.is_empty() {
				return vec![];
			}
	            PathBuf::from(config_path)		            
		}
	};
	let (args, errs) = match parse(&config_path) {
		Ok((args, errs)) => (args, errs),
		Err(err) => {
			message!(
				"failed to read the file specified in RIPGREP_CONFIG_PATH: {}",
				err
			);
			return vec![];
		}
	};
	if !errs.is_empty() {
		for err in errs {
		    message!("{}:{}", config_path.display(), err);
		}
	}
	log::debug!(
		"{}: arguments loaded from config file: {:?}",
	        config_path.display(),
		args
	);
	args
}

Here are a few key Rust patterns in the existing implementation. One is match pattern matching:

let config_path = match env::var_os("RIPGREP_CONFIG_PATH") {
	None => return vec![],
	Some(config_path) => {
		if config_path.is_empty() {
			return vec![];
		}
	    PathBuf::from(config_path)		            
	}
};
let (args, errs) = match parse(&config_path) {
	Ok((args, errs)) => (args, errs),
	Err(err) => {
		message!(
			"failed to read the file specified in RIPGREP_CONFIG_PATH: {}",
			err
		);
		return vec![];
	}
};

Rust’s match statements must be exhaustive. In this use case, we are destructuring Result and Option enums so each possible enum value must be represented. For option, this is Some(T) and None. For Result this is Ok(T) and Err(U).

In these statements, the return keyword results in the following value being returned by the function. Otherwise the last evaluated expression (e.g., a line that does not end in a semi-colon) is the value returned by the entire match expression. In this case, those values are stored in the immutable variables config_path, args, and errs. The assignment of args and errs highlights another use of destructuring in Rust, this time of a tuple.

Another syntax note: the ! at the end of message! indicates this refers to a macro. Same with vec!.

Finally we get our first and only hint in this project of Rust’s unique memory model. The & leading the parameter &config_path passed to parse(&config_path) indicates we are passing a shared reference to config_path to the function parse. This means something, but I am not 100% sure what yet :)

The rest of the args function is relatively straightforward. The function ends by returning the value args in a Rust-idiomatic way. The last line of the function is the expression args. Since there is no ; at the end of the line, this is not a statement, so it is treated as the return value for the function.

My changes

My patched ripgrep discovers and parses at most one config file with this precedence1:

  1. A file named .ripgreprc in the current working directory.
  2. A file found using a path stored in the env at RIPGREP_CONFIG_PATH.
  3. A file named .ripgreprc stored in an ancestor directory of the current working directory.

Items 2 and 3 could easily be switched in precedence. I thought it may guard against unexpected parent directory configurations to ensure the environment-specified configuration is preferred above a .ripgreprc potentially living in /home/ or /.

To achieve this behavior I wrote a function config_path() -> Option<PathBuf> that returns Some(config_path) when the first option is met in the above list. This behavior is achieved by a series of functions cwd_ripgreprc() -> Option<PathBuf>, env_ripgreprc() -> Option<PathBuf> and find_ripgreprc() -> Option<PathBuf>. This change is reflected at the start of the args() function:

pub fn args() -> Vec<OsString> {
    let config_path = match config_path() {
        None => return vec![],
        Some(config_path) => config_path,
    };
    /// ...
    args
}

The config_path() function is implemented with straightforward fall-through logic:

fn config_path() -> Option<PathBuf> {
    let cwd_opt = cwd_ripgreprc();
    if cwd_opt.is_some() {
        return cwd_opt;
    }

    let env_opt = env_ripgreprc();
    if env_opt.is_some() {
        return env_opt;
    }

    find_ripgreprc()
}

The cwd_ripgreprc() and find_ripgreprc() functions are simple to implement using the rust env and PathBuf APIs:

/// if there is a ripgreprc in the cwd, get it
fn cwd_ripgreprc() -> Option<PathBuf> {
    let mut cwd = env::current_dir().unwrap();
    let file = Path::new(".ripgreprc");

    cwd.push(file);
    if cwd.is_file() {
        return Some(cwd);
    }
    None
}

/// ...

/// Find a .ripgreprc file in the tree
fn find_ripgreprc() -> Option<PathBuf> {
    let mut search_path = env::current_dir().unwrap();
    let file = Path::new(".ripgreprc");

    // go up one, since we know it's not in the current folder already
    if !search_path.pop() {
        return None;
    }

    loop {
        search_path.push(file);

        if search_path.is_file() {
            break Some(search_path);
        }

        if !(search_path.pop() && search_path.pop()) {
            break None;
        }
    }
}

The implementation of find_ripgreprc() was inspired by (read: taken directly from) this StackOverflow answer. One Rustism found in this implementation is the loop as expression. A loop expression in Rust returns the value listed in a break statement. The value of the last expression in a function is used as the function’s return value.

Finally, env_ripgreprc() is just a slight modification of the original config file lookup logic:

/// if we have a ripgreprc specified in env, get it
fn env_ripgreprc() -> Option<PathBuf> {
    match env::var_os("RIPGREP_CONFIG_PATH") {
        None => None,
        Some(config_path) => {
            if config_path.is_empty() {
                None
            } else {
                Some(PathBuf::from(config_path))
            }
        }
    }
}

And that’s it! After invoking cargo build --release, I have a working ripgrep that will look for configuration files in the current working directory and its parents prior to running.

Notes about Rust

In this small project, I didn’t have to get much into the infamous features that make Rust hard for so many to learn. Three notable features made the project as a whole pleasant:

  1. Excellent error messages from the compiler.
  2. Last expressionr eturn values.
  3. Pattern matching on enum types with destructuring.

I am looking forward to taking on something larger in Rust someday when I have the time!

Footnotes


  1. There are reasons not to have this logic in ripgrep. I was only curious how difficult it would be to implement the behavior – and found it surprisingly straightforward.
Tags: rust ripgrep opensource