01/28/2022Engineering
← Back to posts

How does Fig know what to suggest?

Sean Sullivan

Fig provides drop in autocomplete for hundreds of the most common CLI tools for developer workflows. But how does Fig know what to suggest? At a first glance, it might seem like this is a trivial task. After all, there are plenty of other autocomplete solutions integrated with the shell — fish shell comes with built-in autocomplete, zsh has plugins like zsh autosuggestions — but Fig is at a major disadvantage here. These shells + plugins have direct access to what a user has typed into their shell, also called the edit buffer. Fig, which is not quite a shell or terminal emulator, does not have direct access to this. The approach we’ve taken to this has changed over time, and we’ve settled on what we think is a pretty clever solution that maintains users privacy while creating a powerful API on the terminal.

The Scrappy Way

Without knowing what the user has typed, it would be impossible for Fig to offer suggestions. At the same time, keystrokes are very sensitive information, and to respect that Fig should operate much in the same way that your shell or terminal does. While Fig’s current solution achieves this goal, the first MVP of Fig’s autocomplete used what was essentially a keylogger(!!) to understand what the user had typed.

[... Fill in some details about keylogger and how it worked ...]

But this approach had many drawbacks. Besides the obvious security concerns, e.g. when a user is typing a password in the terminal, this approach was not even accurate in many cases. When a user presses ctrl-v to paste text, this would appear as a single keystroke and Fig would lose all sense of what was in the edit buffer and couldn’t offer suggestions. Similarly, pressing the up arrow to scroll through history just showed up as an up arrow to the key logger while the edit buffer had actually changed to an entirely new command from history. Other common operations that could throw Fig off included using line editing functions like ctrl-A and ctrl-E to move to the start or end of the line. What’s worse is these keybindings are customizable. So despite adding workarounds to handle these cases, some users would face issues if they’d changed keybindings for moving around in the line editor, or used vim-like bindings to navigate history.

Parsing config files to handle these edge cases was messy and eventually we decided there must be a better, more secure way.

Using Pseudoterminals

To understand the current approach that Fig uses to get the edit buffer, you need a bit of knowledge about how your terminal emulator and shell work together. [You can check out our in depth article on terminals, shells, and more here] but as a quick refresher, your terminal emulator is a program like iTerm, Alacritty, Kitty, etc. that handles keyboard input from the user and displays text in a GUI window to the screen. The terminal emulator works with the shell, a program like bash, fish, or zsh, which is a program that presents you with a prompt, parses the command you’ve typed and substitutes variables and aliases, and then actually executes the command. It sends information about how to display the prompt(s), how to color text, and the output of programs that are executed to the terminal, communicating using ANSI escape codes to do things like move the cursor around the screen or change the window title.

Let’s say you’re using iTerm as your terminal emulator. Typically, when you launch iTerm, it will start your shell of choice. Fig adds a hook to the start of your shell startup file (e.g. your .bashrc) that executes figterm — which is similar to iTerm in many ways. This program also launches a new shell “within” figterm and this child shell communicates with figterm in the same way your shell typically communicates with iTerm. Also like iTerm, figterm maintains a representation of your terminal screen. Rather than displaying this to a GUI, however, figterm just passes all of the communication from the shell up to iTerm and all of the communication from iTerm down to the shell — you can think of it like a passthrough layer that tries to reconstruct what is on the user’s screen.

This works super well to understand what is on the user’s screen. It also avoids a lot of security issues — figterm only sees what your shell or terminal emulator would see. If you trust iTerm, Alacritty, or any other terminal emulator there’s really not much of a difference between that and figterm. When you a type a password into your terminal for sudo, e.g., figterm's screen representation won’t see this at all, because this text doesn’t show up directly on your screen.

This works well for a lot of cases where a keylogger didn’t work as well. When a user presses the up arrow in the shell to scroll through history, the shell will send some ANSI escape codes [link] back to the terminal emulator saying “Hey, the user just changed the edit buffer — delete the last N characters and type this instead” so that the terminal emulator can display the new text to the screen. figterm helps to pass this message, and also uses it to reconstruct this new screen representation internally.

Working with Prompts

There is one problem, however. Fig needs to know the current edit buffer, not the contents of the screen. There can be a lot of text in the terminal emulator’s screen that is not the edit buffer like prompts, output from commands like ls, or even autosuggestions from a shell like fish. To extract the current edit buffer we need to know where this buffer starts and ends, without any of the extraneous text.

[... Maybe talk about cells and the screen representation that Figterm and terminal emulators maintain (colored cells, semantic annotation of prompts) in a bit more depth ...]

To do this, figterm annotates each character or “cell” of the screen as “prompt”, “suggestion”, “output”, or “edit buffer” and pulls out only the “edit buffer” cells after the most recent prompt. To make these annotations, we need to know where a “prompt” starts and ends. Since the shell configures the prompt, this is a pretty natural place to start looking for a way to do this. When you install Fig, we add some invisible characters to your prompt, more ANSI escape codes — which allow the shell to communicate with the terminal emulator, or in our case figterm. When figterm sees the ANSI escape code for “start prompt” it starts annotating text that is typed as prompt, and it stops when it sees “end prompt”. This allows us to filter out this text when extracting your edit buffer. We use a similar process for auto suggestions and command output, though prompts are the most important because they tell us where a new command starts.

Performance

Despite it’s security and accuracy, adding figterm in between your shell and terminal emulator could be alarming for another reason: performance. There is an additional step introduced between typing something and it showing up on your screen, instead of this information being passed directly from terminal to shell and shell to terminal it goes through an additional read and write call with figterm. To mitigate performance issues as much as possible, we implemented figterm in C so we could ensure these system calls were nearly instantaneous and sequential so that the overhead is literally just an additional read and write call.

The story is a little bit more complicated than that, though. figterm must also extract the current edit buffer using the annotations we mentioned before and then send this buffer to the desktop Fig app. Initially, in our C code, this was implemented synchronously to the read and write calls and introduced additional latency beyond the ideal. Rather than wrestling with memory management and multi-threading in C (and tired of the several memory bugs we’d already run into) we decided to port figterm to Rust.

[... blurb about rust, async advantage ...]

  • Tokio

How does Fig know what you’ve typed (Sean)

  • Why do we
  • Keylogger
  • PTY approach
  • Shell integrations
  • Wrapping prompt
  • Why is it difficult to know what someone’s typed
    • What are all the edge cases we have to capture e.g. zsh suggestion, prompts get in the way
  • Rust re-write (Grant)
  • Privacy and security
    • No one his concerns about iTerm having keystrokes
  • Speed / performance
Tags
engineeringpseudoterminalperformance