Run commands easily with kwt

Part 3: implementation and motivation of the web interface.

bettercallshao
bettercallshao
Run commands easily with kwt

In part 3, I will explain the implementation and motivation of kwtd, the web interface for kwt. Please install kwt by following the guide and read part-1 and part-2 if interested.

What is kwtd?

The kwt package releases both kwt and kwtd, one can think of kwtd as a coordinator for user interaction via the browser, and kwt as the executor that actually runs the commands. Guide is available for a demo.

  • First kwtd needs to be run. Convenience scripts are available to run it at startup.
  • Once kwtd is running, it is accessible on http://127.0.0.1:7171. It can only show and ingest menus, but not run the commands.
  • A kwt executor must be connected to kwtd declaring a channel and a menu, e.g. kwt start --channel 0 --menu python-demo.
  • Finally a user can navigate to said channel on kwtd and run commands defined in said menu.

Drag Racing

Web: simplicity

The web stack of Gorilla + VueJS + Bootstrap is chosen out of simplicity and convenience. A deliberate decision is made not to use npm, since it's foreign to the go ecosystem, which complicates continuous deployment (normal go containers images won't have npm available) and worsens developer experience. Building includes these steps.

  • Download 3rd party assets by building and running cmd/prebuild/main.go.
  • Use go-assets-builder to build 3rd party and project assets into assets.go.
  • Build the web server with assets.go.

Points to note.

  • No internet is required for kwtd to work.
  • The guideline for layout and styling is simplicity (laziness).
  • Asset files are obviously not bundled, which is fine since we only serve localhost.
  • To help develop the frontend, env variable ASSETS_ROOT can be set so kwtd reads assets from the file system.

Communication: WebSocket

The kwtd server is the central exchange in the system, its job is to pass WebSocket messages between kwt executor and the browser. Each channel contains an exchange loop, defined by exch.go (github) using go channels (different from kwt channels).

  • The kwt executor initiates the exchange by WebSocket connection to the channel's back endpoint /ws/back/0 with menu name as parameter.
  • If the channel is available, kwtd server will initialize the exchange loop, load the menu (menu from server side is used), and create a random validation token, occupying the channel.
  • The executor GET the channel endpoint /channel/0 to receive the menu and the validation token, and initializes its own exchange loop.
  • The executor waits for a Command, renders and executes it, and sends the Output back line by line until the loop is broken by server or Ctrl^C.
  • Once the exchange loop is established, the browser can connect to /ws/front/0 and /channel/0 to send Commands and receive Outputs.

Some particularities makes this different from a traditional web server.

  • Each channel accommodates at most one executor and one browser session. We are trying to basically run commands in a terminal, this is a very one-to-one mapped interaction model. If a second agent tries to connect to a channel from either side, it will be refused.
  • Each time an exchange loop is established between an executor and a server, a random validation token is generated. The browser keeps this token and attaches it to Commands. If the executor sees a token it didn't expect, it won't execute the Command.
  • There are three channels by hardcode, not ideal.
  • The HTTP port is configurable on both kwt (--master) and kwtd (env PORT).
  • The server serves 127.0.0.1, only allowing local traffic for security.

Display: markdown

The browser receives a WebSocket message per line of command output. All of the outputs are kept for the channel, and can be optionally displayed by toggling the Logs switch. The logs can cleared by going back to Home. The output of the latest command is kept above the log display, it can be rendered as markdown by toggling the Markdown switch. The markdown feature is implemented in hope that the formatting capabilities can enable more useful commands to be built, e.g. with hyperlinks, tables, or images.

Motivation: wheels

The web interface was part of the first vision of the system, because I wanted a tool that allows non-developers to run commands easily. The core of the architecture is allowing a surrogate terminal program to be stuffed with declarative configurations and controlled by a UI. A contradicting but more popular approach to desktop productivity is to run all the actual commands (or call the APIs) from the UI application itself, successful examples include Raycast, Command E, Alfred. Here is why I did it differently.

The wheels have been invented. The terminal is very good at lots of things, and I already know the command to do XYZ, I shouldn't have to invest extra time to configure a productivity tool that work in an unfamiliar way. For example, I already know mysql and aws commands very well, I'd like to configure the tool with this knowledge, instead of reading the docs on how the tool handles MySQL databases and configures credentials.

The execution is controlled. Running a terminal program makes me feel safe. I can Ctrl^C it or close the terminal to kill it, read its logs to know its alive, or configure its environment to run the correct version of Python. In contrast, a process running behind a UI application is much less controllable, especially when faced with problems.

The tool is easy to trust. Since the surrogate terminal program runs inside a environment defined by the user, the tool does not have to store the passwords or credentials. The system is also auditable, because all of the business logic is defined declaratively in config files and the tool itself is dumb. The tool can be trusted because it knows very little.

On the other hand, I do use Alfred on a daily basis, the esthetics is definitely better than a browser UI, and it is probably easier to ship lots of business logic as a application bundle instead of as a small application with a bunch of config files. Maybe there is room to converge.