DocsTokenizer

Tokenizer

A guide to Fig's tokenizer.

User InputTokenized Output
Standard

git push origin main


["git", "push", "origin", "main"]
Quotes
Double Quotes
git commit -m "brendan's commit message"

["git", "commit", "-m", "brendan's commit message"]
Single Quotes
git commit -m 'my "commit" msg'
['git', 'commit', '-m', 'my "commit" msg']
Escape Characters

echo hello\ world 1234


["echo", "hello world", "1234"]
Options with Arguments
Long options
--message="hello" and --message "hello"


["--message", "hello"]
Short options
-m "hello" and -m="hello"

["-m", "hello"]
Short options with arguments not separated by = or space
grep -C50

See exceptions to this below

["grep", "-C" "-50"]
Chained short options
Normal chained options
ps -aux

["ps", "-a", "-u", "-x"]
Chained options with argument at end

grep -abC555
See exceptions to this below

["grep", "-a", "-b", "-C", "555"]
Settings
You can add settings to Fig's completion spec to affect the tokenizer
Non posix compliant options posixNoncompliantFlags
ps -aux

["ps", "-aux"]

Where Fig's Tokenizer breaks:

  • Chained short options where the second option or after is not a lowercase or uppercase letter or the number 1 (1 is surprisingly a common flag ls)
    • e.g. ls -ga@ will output ["ls", "-g", "-a", "@"] instead of ["ls", "-g", "-a", "-@"]
    • e.g. ls -g%aG will output ["ls", "-g", "%aG"] instead of ["ls", "-g", "%", "a", "G"]
    • BUT
      • ls -ga1 will work as expected because Fig consider 1 to be an option
      • ls -% -@ will work as the @ and % symbols are the first option, not the second
      • git push -4 -6 will work as 4 and 6 are the first option in each token, not the second
  • Short options that take argument that starts with a lowercase letter, uppercase letter, or 1
    • e.g. grep -C1 will output ["grep", "-C", "-1"] instead of ["grep", "-C", "1"]
    • e.g. grep -C100 will output ["grep", "-C", "-1", "00"] instead of ["grep", "-C", "100"]
    • e.g. git commit -mhello will output ["git", "commit", "-m", "h", "e", "l", "l", "o"] instead of ["git", "commit", "-m", "hello"]
    • BUT
      • grep -C51 will work as the 5 will trigger the rest of the token to be considered an argument