08/10/2022Engineering
← Back to posts

How does Fig know what you've typed in the terminal?

Sean Sullivan

Fig provides drop in autocomplete for 500+ of the most common CLI tools. But how does Fig know what you've typed into your terminal?


At a first glance, it might seem like this is a trivial task. After all, there are plenty of other autocomplete solutions integrated with the shell — zsh has plugins like zsh autosuggestions, fish shell comes with built-in autocomplete — but Fig is at a major disadvantage here. These shells + plugins have direct access to what a user has typed into their shell, also called the edit buffer.

Fig, which is not quite a shell or terminal emulator, does not have direct access to this. We have tried multiple different approaches to get a user's edit buffer over time. We think the approach we've settled on is not only pretty clever, but also maintains a user's privacy and allows us to create a powerful API that we will soon expose to 3rd party developers.

Method 1: The Scrappy Way

The first MVP of Fig's autocomplete used what was essentially a keylogger (!!!). Yes, we know, key-logger + terminal = recipe for disaster right? We knew there were plenty of issues but we also knew we would rebuild Fig with one of the below methods. We wanted to optimize for getting an MVP out quickly.

How did the keylogger work?

We set up a CGEventTap which routed all keyboard events through Fig. Whenever a terminal emulator was the focused application, we would process keystrokes one at a time in an effort to reconstitute what had been typed in the terminal.

This hack allowed us to push a prototype of autocomplete and get feedback from real users, but it was extremely brittle! We'd lose track of what had been typed if the user pressed the up arrow and started navigating into their history or if they used any keybindings that we didn't process correctly.But now we were hooked: we'd gotten a glimpse of how powerful terminal autocomplete could be. We started investigating other, more robust approaches.

Quick aside: none of these keystrokes were ever stored or transmitted. All of the autocomplete logic runs locally on device.

Drawbacks

Method 2: The ZSH Line Editor

Very quickly, we started looking for any ways to reduce the need for a keylogger to get the users edit buffer. Fortunately, for one of the most common shells, there's a built-in API to do this.

How did the ZLE integration work?

With ZSH, the shell itself exposes an API to get the contents of the edit buffer on any keystroke. This works nearly perfectly -- it doesn't expose nearly as much information about global keypresses as the keylogger approach, and handled history and pasting as well.

This is the same API that is used by tools like zsh-autosuggest and zsh-syntax-highlighting, so it's been vetted and was therefore battle tested and very accurate once set up properly.

Drawbacks

Method 3: Pseudoterminals

Quick Aside: How do terminal emulators and shell work?

To understand the current approach that Fig uses to get the edit buffer, you need a bit of knowledge about how your terminal emulator and shell work together. We plan to discuss terminals, shells, and more in a coming blog post but as a quick refresher, your terminal emulator is a program like iTerm, Alacritty, Kitty, etc. that handles keyboard input from the user and displays text in a GUI window to the screen. The terminal emulator works with the shell, a program like bash, fish, or zsh, which is a program that presents you with a prompt, parses the command you've typed and substitutes variables and aliases, and then actually executes the command. It sends information about how to display the prompt(s), how to color text, and the output of programs that are executed to the terminal, communicating using ANSI escape codes to do things like move the cursor around the screen or change the window title.

Fig's Pseudoterminal Layer

Let's say you're using iTerm as your terminal emulator. Typically, when you launch iTerm, it will start your shell of choice. Fig adds a hook to the start of your shell startup file (e.g. your .bashrc) that executes figterm — which is similar to iTerm in many ways. This program also launches a new shell “within” figterm and this child shell communicates with figterm in the same way your shell typically communicates with iTerm. Also like iTerm, figterm maintains a representation of your terminal screen. Rather than displaying this to a GUI, however, figterm just passes all of the communication from the shell up to iTerm and all of the communication from iTerm down to the shell — you can think of it like a passthrough layer that tries to reconstruct what is on the user's screen.

How does figterm work?

This works super well to understand what is on the user's screen. It also avoids a lot of security issues — figterm only sees what your shell or terminal emulator would see. If you trust iTerm, Alacritty, or any other terminal emulator there's really not much of a difference between that and figterm. When you a type a password into your terminal for sudo, e.g., figterm's screen representation won't see this at all, because this text doesn't show up directly on your screen.

This works well for a lot of cases where a keylogger didn't work as well. When a user presses the up arrow in the shell to scroll through history, the shell will send some ANSI escape codes back to the terminal emulator saying “Hey, the user just changed the edit buffer — delete the last N characters and type this instead” so that the terminal emulator can display the new text to the screen. figterm helps to pass this message, and also uses it to reconstruct this new screen representation internally.

There is one problem, however. Fig needs to know the current edit buffer, not the contents of the screen. There can be a lot of text in the terminal emulator's screen that is not the edit buffer like prompts, output from commands like ls, or even autosuggestions from a shell like fish. To extract the current edit buffer we need to know where this buffer starts and ends, without any of the extraneous text.

Terminal screens are represented by terminal emulators as a grid of cells that each store information like the bytes of a character to be displayed in the cell, the foreground and background color of the cell, and other font attributes.

To get the edit buffer, figterm adds semantic annotations to cells as well, marking each cell of the screen as “prompt”, “suggestion”, “output”, or “edit buffer”. To make these annotations, we need to know where a “prompt” starts and ends. When you install Fig, we add some invisible characters to your prompt, more ANSI escape codes — which allow the shell to communicate with the terminal emulator, or in our case figterm. When figterm sees the ANSI escape code for “start prompt” it starts annotating the cells of text that is typed as prompt, and it stops when it sees “end prompt”. This allows us to filter out this text when extracting your edit buffer. We use a similar process for auto suggestions and command output, though prompts are the most important because they tell us where a new command starts.

Figterm's internal representation of the terminal screen looks something like the videos below. On the left is the terminal I'm typing into and on the right is a visually annotated version, where prompt cells have a blue background.

Notice that accounting for just prompts isn't enough. In this first clip, we haven't added compatibility for other suggestion plugins to figterm, so suggestions from zsh-autosuggestions are included as part of the edit buffer, which gives inaccurate suggestions -- Fig would think that the edit buffer is ls -lah even if I've just typed l and there is a grayed out suggestion for s -lah. After adding support for suggestions, Fig's screen representation looks something like this:

With proper annotations, figterm is able to extract an accurate current edit buffer and use it to provide suggestions, even when you paste text into the terminal, scroll through history, etc.

Drawbacks

Method 4: Rust-based Pseudoterminal

After dealing with enough undefined behavior and strange edge cases in C, we decided to re-write our figterm in Rust. While there are marginal performance hits for Rust over C, after factoring in the difficulty of writing performant and safe C, we've actually seen net performance improvements with Rust. We've been able to introduce more concurrency and parallelize IPC to speed up the main read and write loop of figterm using tokio. We've also been able to extract a lot of common IPC logic between figterm and the Fig desktop app (which is being re-written in Rust for cross-OS portability).

The dependency management system in Rust is also much better than in C, allowing us to take advantage of modern terminal emulator libraries like Alacritty and Wezterm.

Next Steps

With a cross-terminal, cross-shell, and cross-platform solution for getting shell information like the current edit buffer, you can build a lot of really powerful terminal tooling. At Fig we are building a whole suite of tooling on top of this infrastructure, but we also want to expose this powerful API to other developers. We're currently dogfooding our Fig.js API that includes edit buffer hooks and much more and allows developers to make modern web-based developer tools that dock to the terminal window.

If you're interested in Fig's vision or what we're building, you can

-Sean Sullivan (sean [at] fig.io)

Tags
engineeringpseudoterminalperformance