At work I contribute to a moderately-sized monorepo at 70 thousand files, 8-digit lines of code and hundreds of PRs merged every day. One day I opened a remote buffer at that repository and ran M-x find-file.

Emacs froze for 5 seconds before showing me the find-file prompt. Which isn’t great, because when writing software, opening files is actually something one needs to do all the time.

Luckily, Emacs is “the extensible, customizable, self-documenting real-time display editor”, and comes with profiling capabilities: M-x profiler-start starts a profile and M-x profiler-report displays a call tree showing how much CPU cycles are spent in each function call after starting the profile. Starting a profile and running M-x find-file showed that all time was being spent in a function called ffap-guess-file-name-at-point, which was being called by file-name-at-point-functions, an abnormal hook run when find-file is called.

I checked the documentation for ffap-guess-file-name-at-point with M-x describe-function ffap-guess-file-name-at-point and it didn’t seem to be something essential, so I removed the hook by running M-x eval-expression, writing the form below, and pressing RET.

1
(remove-hook 'file-name-at-point-functions 'ffap-guess-file-name-at-point)

This solved the immediate problem of Emacs blocking for 5 seconds every time I ran find-file, with no noticeable drawbacks.

I could now navigate around and open files. The next thing I tried in this remote git repository was searching through project files. The great projectile package provides the projectile-find-file function for that, but I had previously given up making projectile perform well with remote buffers; given how things are currently implemented it seems to be impractical. So I installed the find-file-in-project package for use on remote projects exclusively: M-x package-install find-file-in-project.

Both projectile-find-file and find-file-in-project (aliased as ffip):

  • show a narrowed list of all project files in the minibuffer
  • prompt the user to filter and scroll through candidates
  • open a file when RET is pressed on a candidate.

To disable projectile on remote buffers I had the following form in my configuration.

1
2
(defadvice projectile-project-root (around ignore-remote first activate) (unless (file-remote-p default-directory 'no-identification) ad-do-it))

Which causes the projectile-project-root function to not run its usual implementation on remote buffers, but instead return nil unconditionally. projectile-project-root is used as a way to either get the project root for a given buffer (remote or not), or as a boolean predicate to test if the buffer is in a project (e.g., a git repository directory). Having it return nil on remote buffers effectively disables projectile on remote buffers.

I then wrote a function that falls back to ffip when projectile is disabled and bound it to the keybinding I had for projectile-find-file, so that I could press the same keybinding whenever I wanted to search for projects files, and not have to think about whether I’m on a remote buffer or not:

1
2
3
4
5
6
(defun maybe-projectile-find-file () "Run `projectile-find-file' if in a project buffer, `ffip' otherwise." (interactive) (if (projectile-project-p) (projectile-find-file) (ffip)))

And called it:

M-x maybe-projectile-find-file

Emacs froze for 30 seconds. After that, it showed the prompt with the narrowed list of files in the project. 30 seconds! What was it doing during the whole time? Let’s try out the profiler again.

  1. Start a new profile:

    M-x profiler-start

  2. Call the function to be profiled:

    M-x maybe-projectile-find-file (it freezes Emacs again for 30 seconds)

  3. And display the report:

    M-x profiler-report

Which showed:

1
2
3
Function CPU samples %
+ ... 21027 98%
+ command-execute 361 1%

This tells us that 98% of the CPU time was spent in whatever ... is. Pressing TAB on a line will expand it by showing its child function calls.

1
2
3
4
5
6
7
8
9
Function CPU samples %
- ... 21027 98% + ivy--insert-minibuffer 13689 64% + #<compiled 0x131f715d2b6fa0a8> 3819 17% Automatic GC 2017 9% + shell-command 1424 6% + ffip-get-project-root-directory 77 0% + run-mode-hooks 1 0%
+ command-execute 361 1%

Expanding ... shows that Emacs spent 64% of CPU time in ivy--insert-minibuffer and 9% of the time—roughly 3 whole seconds!— garbage collecting. I had garbage-collection-messages set to t so I could already tell that Emacs was GCing a lot; enabling this setting makes a message be displayed in the echo area whenever Emacs garbage collects. I could also see the Emacs process consuming 100% of one CPU core while it was frozen and unresponsive to input.

Drilling down on #<compiled 0x131f715d2b6fa0a8> shows that cycles there (17% of CPU time) were spent on Emacs waiting for user input, so we can ignore it for now.

As I get deep in drilling down on ivy--insert-minibuffer, names in the “Function” column start getting truncated because the column is too narrow. A quick Google search (via M-x google-this emacs profiler report width) shows me how to make it wider:

1
2
(setf (caar profiler-report-cpu-line-format) 80 (caar profiler-report-memory-line-format) 80)

Describing those variables with M-x describe-variable shows that the default values are 50.

From the profiler report buffer I run M-x eval-expression, paste the form above with C-y and press RET. I also persist this form to my configuration. Pressing c in the profiler report buffer (bound to profiler-report-render-calltree) redraws it, now with a wider column, allowing me to see the function names.

Here is the abbreviated expanded relevant portion of the call stack.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Function CPU samples %
- ffip 13586 63% - ffip-find-files 13586 63% - let* 13586 63% - setq 13585 63% - ffip-project-search 13585 63% - let* 13585 63% - mapcar 13531 63% - #<lambda 0xb210342292> 13528 63% - cons 13521 63% - expand-file-name 12936 60% - tramp-file-name-handler 12918 60% - apply 9217 43% - tramp-sh-file-name-handler 9158 42% - apply 9124 42% - tramp-sh-handle-expand-file-name 8952 41% - file-name-as-directory 5812 27% - tramp-file-name-handler 5793 27% + tramp-find-foreign-file-name-handler 3166 14% + apply 1237 5% + tramp-dissect-file-name 527 2% + #<compiled -0x1589d0aab96d9542> 337 1% tramp-file-name-equal-p 312 1% tramp-tramp-file-p 33 0% + tramp-replace-environment-variables 6 0% #<compiled 0x1e202496df87> 1 0% + tramp-connectable-p 1006 4% + tramp-dissect-file-name 628 2% + eval 517 2% + tramp-run-real-handler 339 1% + tramp-drop-volume-letter 60 0% tramp-make-tramp-file-name 30 0% + tramp-file-name-for-operation 40 0% + tramp-find-foreign-file-name-handler 2981 13% + tramp-dissect-file-name 518 2% tramp-tramp-file-p 34 0% #<compiled 0x1e202496df87> 1 0% + tramp-replace-environment-variables 1 0% + replace-regexp-in-string 153 0% + split-string 15 0% + ffip-create-shell-command 4 0% cond 1 0%

A couple of things to unpack here. From lines 8-11 it could deduced that ffip maps a lambda that calls expand-file-name over all completion candidates, which in this case are around 70 thousand file names. Running M-x find-function ffip-project-search and narrowing to the relevant region in the function shows exactly that:

1
2
3
4
(mapcar (lambda (file) (cons (replace-regexp-in-string "^\./" "" file) (expand-file-name file))) collection)

On line 11 of the profiler report we can see that 60% of 30 seconds (18 seconds) was spent on expand-file-name calls. By dividing 18 seconds by 70000 we get that expand-file-name calls took 250µs on average. 250µs is how long a modern computer takes to read 1MB sequentially from RAM! Why would my computer need to do that amount of work 70000 times just to display a narrowed list of files?

Let’s see if the function documentation for expand-file-name provides any clarity.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
expand-file-name is a function defined in C source code. Signature
(expand-file-name NAME &optional DEFAULT-DIRECTORY) Documentation
Convert filename NAME to absolute, and canonicalize it. Second arg DEFAULT-DIRECTORY is directory to start with if NAME is relative
(does not start with slash or tilde); both the directory name and
a directory's file name are accepted. If DEFAULT-DIRECTORY is nil or
missing, the current buffer's value of default-directory is used.
NAME should be a string that is a valid file name for the underlying
filesystem.

Ok, so it sounds like expand-file-name essentially transforms a file path into an absolute path, based on either the current buffer’s directory or optionally, a directory passed in as an additional argument. Let’s try evaluating some forms with M-x eval-expression both on a local and a remote buffer to get a sense of what it does.

In a local dired buffer at my local home directory:

1
2
(expand-file-name "foo.txt")
;; => "/Users/mpereira/foo.txt"

In a remote dired buffer at my remote home directory:

1
2
(expand-file-name "foo.txt")
;; => "/ssh:mpereira@remote-host:/home/mpereira/foo.txt"

The expand-file-name call in ffip-project-search doesn’t specify a DEFAULT-DIRECTORY (the optional second parameter to expand-file-name) so like in the examples above it defaults to the current buffer’s directory, which in the profiled case is a remote path like in the second example above.

With a better understanding of what expand-file-name does, let’s now try to understand how it performs. We can benchmark it with benchmark-run in local and remote buffers, and compare their runtimes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
benchmark-run is an autoloaded macro defined in benchmark.el.gz. Signature
(benchmark-run &optional REPETITIONS &rest FORMS) Documentation
Time execution of FORMS. If REPETITIONS is supplied as a number, run forms that many times,
accounting for the overhead of the resulting loop. Otherwise run
FORMS once.
Return a list of the total elapsed time for execution, the number of
garbage collections that ran, and the time taken by garbage collection.

Benchmarking it in a local dired buffer at my local home directory

1
2
(benchmark-run 70000 (expand-file-name "foo.txt"))
;; => (0.308712 0 0.0)

and in a remote dired buffer at my remote home directory

1
2
(benchmark-run 70000 (expand-file-name "foo.txt"))
;; => (31.547211 0 0.0)

showed that it took 0.3 seconds to run expand-file-name 70 thousand times on a local buffer, and 30 seconds to do so on a remote buffer: two orders of magnitude slower. 30 seconds is more than what we observed in the profiler report (18 seconds), and I’ll attribute this discrepancy to unknowns; maybe the ffip execution took advantage of byte-compiled code evaluation, or there’s some overhead associated with benchmark-run, or something else entirely. Nevertheless, this experiment clearly corroborates the profiler report results.

So! Back to ffip. Looking again at the previous screenshot, it seems that the list of displayed files doesn’t even show absolute file paths. Why is expand-file-name being called at all? Maybe calling it isn’t too important…

Let’s remove the expand-file-name call by

  1. visiting the ffip-project-search function in the library file with M-x find-function ffip-project-search
  2. “raising” file in the lambda
  3. re-evaluating ffip-project-search with M-x eval-defun

and see what happens.

1
2
3
4
5
(mapcar (lambda (file) (cons (replace-regexp-in-string "^\./" "" file)
- (expand-file-name file)))
+ file))
 collection)

I run my function again:

M-x maybe-projectile-find-file

It’s faster. This change alone reduces the time for ffip to show the candidate list from 30 seconds to 8 seconds with no noticeable drawbacks. Which is better, but still not even close to acceptable.

Profiling the changed function shows that now most of the time is spent in sorting candidates with ivy-prescient-sort-function, and garbage collection. Automatic sorting of candidates based on selection recency comes from the excellent ivy and ivy-prescient packages, which I had installed and configured. Disabling ivy-prescient with M-x ivy-prescient-mode and re-running my function reduces the time further from 8 seconds to 4 seconds.

Another thing I notice is that ffip allows fd to be used as a backend instead of GNU find. fd claims to have better performance, so I install it on the remote host and configure ffip to use it. I evaluate the form below like before, but I could also have used the very handy M-x counsel-set-variable, which shows a narrowed list of candidates of all variables in Emacs (in my setup there’s around 20 thousand) along with a snippet of their docstrings, and on selection allows the variable value to be set. Convenient!

1
(setq ffip-use-rust-fd t)

Which brings my function’s runtime to a little over 2 seconds—a 15x performance improvement overall—achieved via:

  1. Manually evaluating a modified function from an installed library file
  2. Disabling useful functionality (prescient sorting)
  3. Installing a program on the remote host and configuring ffip to use it

The last point is not really an issue, but the whole situation is not ideal. Even putting aside all of the above points, I don’t want to wait for over 2 seconds every time I search for files in this project.

Let’s see if we can do better than that.

So far we’ve been mostly configuring and introspecting Emacs. Let’s now extend it with new functionality that satisfies our needs.

We want a function that:

  1. Based on a remote buffer’s directory, figures out its remote project root directory
  2. Runs fd on the remote project root directory
  3. Presents the output from fd as a narrowed list of candidate files, with it being possible to filter, scroll, and select a candidate from the list
  4. Has good performance and is responsive even on large, remote projects

Let’s see if there’s anything in find-file-in-project that we could reuse. I know that ffip is figuring out project roots and running shell commands somehow. By checking out its library file with M-x find-library find-file-in-project (which opens a buffer with the installed find-file-in-project.el package file) I can see that the shell-command-to-string function (included with Emacs) is being used for running shell commands, and that there’s a function named ffip-project-root that sounds a lot like what we need.

I have a keybinding that shows the documentation for the thing under the cursor. I use it to inspect the two functions:

1
2
3
4
5
6
7
8
ffip-project-root is an autoloaded function defined in
find-file-in-project.el. Signature
(ffip-project-root) Documentation
Return project root or default-directory.
1
2
3
4
5
6
7
8
shell-command-to-string is a compiled function defined in
simple.el.gz. Signature
(shell-command-to-string COMMAND) Documentation
Execute shell command COMMAND and return its output as a string.

Perfect. We should be able to reuse them.

I also know that the ivy-read function provided by ivy should take care of displaying the narrowed list of files. Looks like we won’t need to write a lot of code.

To verify that our code will work on remote buffers we’ll need to evaluate forms in the context of one. The with-current-buffer macro can be used for that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
with-current-buffer is a macro defined in subr.el.gz. Signature
(with-current-buffer BUFFER-OR-NAME &rest BODY) Documentation
Execute the forms in BODY with BUFFER-OR-NAME temporarily current. BUFFER-OR-NAME must be a buffer or the name of an existing buffer.
The value returned is the value of the last form in BODY. See
also with-temp-buffer.

For writing our function, instead of evaluating forms ad-hoc with M-x eval-expression, we’ll open a scratch buffer and write and evaluate forms directly from there, which should be more convenient.

I have a clone of the Linux git repository on my remote host. Let’s assign a remote buffer for the officially funniest file in the Linux kernel, jiffies.c

/ssh:mpereira@remote-host:/home/mpereira/linux/kernel/time/jiffies.c

—to a variable named remote-file-buffer by evaluating the following form with eval-defun.

1
2
3
4
5
(setq remote-file-buffer (find-file-noselect (concat "/ssh:mpereira@remote-host:" "/home/mpereira/linux/kernel/time/jiffies.c")))
;; => #<buffer jiffies.c>

Notice that the buffer is just a value, and can be passed around to functions. We’ll use it further ahead to emulate evaluating forms as if we had that buffer opened, with the with-current-buffer macro.

Let’s start exploring by writing to the *scratch* buffer and continuing to evaluate forms one by one with eval-defun.

1
2
3
4
5
6
7
8
(shell-command-to-string "hostname")
;; => "macbook" default-directory
;; => "/Users/mpereira/.emacs.d/ (ffip-project-root)
;; => "/Users/mpereira/.emacs.d/

And now let’s evaluate some forms in the context of a remote buffer. Notice that running hostname in a shell returns something different.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
(with-current-buffer remote-file-buffer (shell-command-to-string "hostname"))
;; => "remote-host" (with-current-buffer remote-file-buffer default-directory)
;; => "/ssh:mpereira@remote-host:/home/mpereira/linux/kernel/time/" (with-current-buffer remote-file-buffer (ffip-project-root))
;; => "/ssh:mpereira@remote-host:/home/mpereira/linux/" (with-current-buffer remote-file-buffer (shell-command-to-string "fd --version"))
;; => "fd 8.1.1" (with-current-buffer remote-file-buffer (executable-find "fd" t))
;; => "/usr/bin/fd"

Emacs is not only running shell commands, but also evaluating forms as if it were running on the remote host. That’s pretty sweet!

Now that we made sure that the executable for fd is available on the remote host, let’s try running some fd commands.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
(with-current-buffer remote-file-buffer (shell-command-to-string "pwd"))
;; => "/home/mpereira/linux/kernel/time" (with-current-buffer remote-file-buffer (shell-command-to-string "fd --extension c | wc -l"))
;; => 28 (with-current-buffer remote-file-buffer (shell-command-to-string "fd . | head"))
;; => Kconfig
;; Makefile
;; alarmtimer.c
;; clockevents.c
;; clocksource.c
;; hrtimer.c
;; itimer.c
;; jiffies.c
;; namespace.c
;; ntp.c

fd tells us that there are 28 C files in /home/mpereira/linux/kernel/time. Let’s see if we can get the project root, which would be /home/mpereira/linux.

1
2
3
(with-current-buffer remote-file-buffer (ffip-project-root))
;; => "/ssh:mpereira@remote-host:/home/mpereira/linux/"

That seems to work.

Let’s now play with default-directory. This is a buffer-local variable that holds a buffer’s working directory. By evaluating forms with a redefined default-directory it’s possible to emulate being in another directory, which could even be on a remote host. The code block below is an example of that—the second form redefines default-directory to be the project root.

1
2
3
4
5
6
7
8
(with-current-buffer remote-file-buffer (shell-command-to-string "pwd"))
;; => "/home/mpereira/linux/kernel/time" (with-current-buffer remote-file-buffer (let ((default-directory (ffip-project-root))) (shell-command-to-string "pwd")))
;; => /home/mpereira/linux

Nice!

I wonder how much Assembly and C are currently in the project.

1
2
3
4
5
6
7
8
9
(with-current-buffer remote-file-buffer (let ((default-directory (ffip-project-root))) (shell-command-to-string "fd --extension asm --extension s --exec-batch cat '{}' | wc -l")))
;; => 373663 (with-current-buffer remote-file-buffer (let ((default-directory (ffip-project-root))) (shell-command-to-string "fd --extension c --extension h | xargs cat | wc -l")))
;; => 27088162

Twenty seven million, eighty eight thousand, one hundred and sixty two lines of C, and almost half a million lines of Assembly. It’s fine.

Alright, at this point it feels like we have all the pieces: let’s put them together.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
(defun my-project-find-file (&optional pattern) "Prompt the user to filter, scroll and select a file from a list of all
project files matching PATTERN." (interactive) (let* ((default-directory (ffip-project-root)) (fd (executable-find "fd" t)) (fd-options "--color never") (command (concat fd " " fd-options " " pattern)) (candidates (split-string (shell-command-to-string command) "\n" t))) (ivy-read "File: " candidates :action (lambda (candidate) (find-file candidate)))))

This is a bit longer than what we’ve been playing with, but even folks new to Emacs Lisp should be able to follow it:

  1. Redefine default-directory to be the project root directory (line 5)
  2. Build, execute, and parse the output of the fd command into a list of file names (lines 6-9)
  3. Display a file prompt showing a narrowed list of all files in the project (lines 10-13)

Let’s see if it works.

1
2
(with-current-buffer remote-file-buffer (my-project-find-file "jif"))

It does!

Since it was declared (interactive) we can also to call it via M-x my-project-find-file.

Going back to the large remote project and running my-project-find-file a few times shows that it now runs in a little over a second—a 30x improvement compared with what we started with.

This is still not good enough, so I went ahead and evolved the function we were working on to most of the time show something on screen immediately and redraw it asynchronously. You can check out the code at fast-project-find-file.el.

* * *

Did you notice how the function implementation came almost naturally from exploration? The immediate feedback from evaluating forms and modifying a live system—even though old news to Lisp programmers—is incredibly powerful. Combine it with an “extensible, customizable, self-documenting” environment and you have a very satisfying and productive means of creation.