console-terminal-emulator — emulate a real terminal using a pseudo-terminal
console-terminal-emulator
[--linux] [--sco] [--freebsd] [--netbsd] [--decvt] [--vcsa] {directory
}
console-terminal-emulator is a utility that expects file descriptor 4 to be the master side of a pseudo-terminal and the TTY
environment variable to be the full device filename of the slave side.
(pty-get-tty(1) can be used to set this process state up.)
First it sets up the various files in directory
, opening the input FIFO for reading and the display buffers for reading and writing.
The FIFOs are created unconditionally, and opened in non-blocking mode.
The display buffers are created if they do not exist.
All created display buffer files have permissions rw-r-----.
All created input FIFO files have permissions rw--w----.
It then enters a loop where it simultaneously:
processes all data received from the pseudo-terminal master side as terminal output, handling printing characters, control characters, escape sequences, and control sequences.
processes all input events from the input FIFO, sending terminal character and escape sequences to the master side.
It finishes when the master side signals hangup (i.e. when the line discipline would set modem control lines on a real serial device to signal hangup because the last slave file descriptor is closed) and there are no more received data to output. At termination, it unlinks the file for the slave side, and erases the display as if by a Clear Display control sequence.
The files used are as follows:
directory
/tty
A stable and predictable name for the slave side of the pseudo-terminal device.
This is a link to the device, if possible, or a symbolic link if not.
(It is usually not.
On Linux, the pseudo-terminal devices are on a filesystem of their own, which would make a link a cross-filesystem link; and on BSDs the devfs
driver disallows the creation of directories under /dev
forcing directory
to be on another filesystem.)
If directory
is /run/dev/vc1
, for example, then the stable name for the terminal, for use in login services, will be /run/dev/vc1/tty
.
directory
/input
The input FIFO, through which realizer processes send keyboard and mouse events. Events are in a uniform packet format, which the terminal emulator converts into appropriate escape sequences.
directory
/display
The UTF-32, 24-bit colour, display buffer.
directory
/vcsa
An 8-bit character set, 3-bit colour, display buffer that is compatible with the Linux vcsa devices for kernel virtual terminals. This is provided for compatibility with screen reader softwares.
console-terminal-emulator emulates the character sequence processing logic of a hardware terminal, taking the data received from the master side of the pseudo-terminal and translating them into modifications to the screen buffer data files.
All character data are first decoded from UTF-8 to a stream of Unicode code points. The decoding is directly from UTF-8 to the code points; in particular, UTF-16 is not employed as an intermediary step. If a UTF-8 sequence decodes to one of the reserved code points used for UTF-16, no special treatment is given, and the code point is treated as any other.
The decoder has to do something with invalid UTF-8 encodings, including overlong and incomplete character sequences. Whatever code point they decode to, such characters are not processed as control characters or as parts of control/escape sequences. They abort any control/escape sequence that they interrupt, and if not incomplete encodings they are printed as ordinary printing characters even if they are the code points for control characters. This behaviour should not be relied upon, and programs should not send such UTF-8 sequences to a terminal.
The --vcsa command line option enables output to the directory
/vcsa
file.
This file is structured as per the Linux character device file of that name, as a 4-byte header giving size and cursor position information followed by a series of 2-byte character and attribute pairs in IBM PC CGA format. It is implemented mainly as a compatibility mechanism for the benefits of terminal realizing softwares that expect Linux vcsa devices, such as screen readers for the blind or partially sighted.
The terminal emulator does not recognize nor handle CSI sequences that deal with 8-bit character sets. The vcsa screen buffer is always ISO 8859-1; with Unicode code points outwith that range converted to code point 255. To handle a wider range of characters, terminal realizing softwares should use the Unicode screen buffer. Code points in the ranges 0x00 to 0x1F and 0x80 to 0x9F may appear, and terminal realizing softwares are expected to render these with some form of ordinary printing graphic.
The 24-bit RGB foreground and background colour values for character cells are mapped to 3-bit CGA by comparing the relative intensities of red, green, and blue. The "bright foreground" and "bright background" CGA attribute bits are taken from the boldface and blink terminal attributes, respectively. No attempt is made to provide MDA or VGA in monochrome mode attributes, such as underline. To handle the full range of attributes, terminal realizing softwares should use the Unicode screen buffer.
The directory
/display
file begins with a 16-byte header:
4-byte UCS-4 Byte Order Mark in host byte order.
2-byte width in host byte order.
2-byte height in host byte order.
2-byte cursor X position in host byte order.
2-byte cursor Y position in host byte order.
cursor glyph type byte.
cursor attributes byte.
pointer attributes byte.
Reserved byte.
That is followed by a series of 16-byte records, one per character cell, containing:
Foreground alpha value byte.
Foreground red value byte.
Foreground green value byte.
Foreground blue value byte.
Background alpha value byte.
Background red value byte.
Background green value byte.
Background blue value byte.
4-byte UCS-4 value in host byte order.
Attributes.
Reserved bytes.
Unassigned code points, reserved code points, control code points, combining code points, and zero-width code points may appear, and terminal realizing softwares are expected to render these with some form of ordinary printing graphic.
Characters in the "Cc" ("Other, Control") Unicode code point category (a.k.a. "C0" and "C1" characters) are control characters. They are always processed, even in the middle of escape or control sequences. All control characters are no-ops except for the following:
CR
Carriage return. Move to column 0.
NEL
Newline. Move to column 0 and one row down.
LF
, VT
, FF
, and IND
Linefeed/Vertical tab/Index. Move one row down, remaining in the current column.
RI
Reverse index. Move one row up, remaining in the current column.
TAB
Horizontal tab. Move to the next tabstop, or the last column.
BS
Backspace. Nondestructively move to the previous column, stopping at the first column.
DEL
Delete. Delete the character at the cursor position, moving the remainder of the row to the left and padding the final column with a space.
HTS
Horizontal tab set. Set a tabstop at the current column.
CAN
Cancel. Cancel any control/escape sequence currently in progress.
ESC
Escape. Cancel any control/escape sequence currently in progress and begin an escape sequence.
CSI
Control sequence introducer. Cancel any control/escape sequence currently in progress and begin a control sequence.
Escape sequences are multiple-character sequences comprising:
the ESC character, an optional intermediate character in the range U+0020 to U+002F, and a single final character in the range U+0040 to U+007E.
Most such escape sequences for real terminals are ISO 2022 character set switching sequences, which the terminal emulator has no need of since it uses UTF-8 natively, and so does not support. The only supported escape sequences are:
DECALN
the DEC VT extension that fills the display with the letter "E" as a screen alignment test
S7C1T
set the terminal emulator to send C1 characters in control sequences encoded with the ECMA-48 7-bit extensions (as below)
S8C1T
set the terminal emulator to send C1 characters in control sequences as plain single C1 code points (but UTF-8 encoded)
the ESC character and a single final character in the range U+0040 to U+005F.
This is the two-character 7-bit mechanism of ECMA-48 section 5.
It is strictly speaking unneeded given that the terminal emulator is 8-bit clean and employs UTF-8 as standard.
The entire U+0080 to U+009F control character range is accessible via this mechanism.
So NEL
can be emitted as either the single character U+0085 or as the two-character sequence U+001B U+0045.
(Of course, U+0085 encodes to two characters in UTF-8.)
Similarly, CSI
can be emitted as either the single character U+009B or as the two-character sequence U+001B U+005B.
Control sequences are multiple-character sequences comprising the CSI character followed by parameter characters in the range U+0030 to U+003F followed optionally by an intermediate character in the range U+0020 to U+002F and terminated by a single final character in the range U+0040 to U+007E.
The parameter characters are normally the digits and semi-colon (U+003B), comprising up to 16 semi-colon separated digit sequences, with an initial character in the range U+003C to U+003F denoting a DEC vendor-private control sequence that is an extension to ECMA-48. ECMA-48 defines colon (U+003A) as a sub-parameter separator, but no control sequence recognized by the terminal emulator makes use of this and all leading sequences of sub-parameters in any parameter are simply ignored. More than 16 parameters simply causes leading parameters to be discarded. As an extension to DEC, the terminal allows vendor-private characters anywhere in the parameter character sequence, rather than only in the initial position, ignoring all but the last one. However, this is largely to simplify implementation and should not be relied upon.
The digit sequences are parameters to the action, and zero-length digit sequences are a parameter with the value zero. In most cases, a control sequence with N parameters is equivalent to N such control sequences in order each with one of the parameters.
The terminal emulator maintains a current set of character attributes, a 24-bit RGB foregrond colour, and a 24-bit RGB background colour. These are combined with character code points when writing to the character cells of each screen buffer. Each character cell thus has its own, independent, attributes and 24-bit foreground and background colours.
The ANSI colour attributes set by the SGR control sequence map from the 3-bit RGB ANSI colours to 24-bit RGB using 255 for a 1 bit and 0 for a 0 bit. The terminal emulator also supports the ISO 8613-6 SGR colour extensions, that are often misattributed to xterm:
The indexed colour extension (SGR 38:5 and SGR 48:5) has the conventional mapping of 16 "standard" colours, a 24 value grayscale, and a 6 by 6 by 6 colour cube.
The RGB direct colour extension (SGR 38:2 and SGR 48:2) simply sets the 24-bit RGB values directly.
When characters or lines are deleted, or characters or lines or the screen are erased, the background colour used for the "filler" that is placed in the newly blanked cells is the current background colour. This is "background colour erase" and is the default for DEC VT520 terminals as well as the only option on Linux and BSD kernel virtual terminals. By changing the "DECECM" private mode setting, it can be switched to "screen colour erase", which uses the default terminal background colour (as set by SGR 0 and SGR 49).
All other characters are "printing" characters.
Characters in the "Cf" ("Other, Format") and "Mn" ("Mark, Non-spacing") Unicode code point categories are simply discarded. Note that this includes U+00AD (soft hyphen). The terminal emulator does not have enough information about word breaks or context to handle soft hyphenation. This behaviour differs from some other terminal emulation programs, which are based upon code written by Markus Kuhn that treats U+00AD unconditionally as an spacing character and explicitly overrides its "Mn" categorization.
Characters in the "Me" ("Mark, Enclosing") Unicode code point category overstrike the character at the current cursor position without advancing.
All other printing characters are printed as-is, using the currently set attributes, foreground colour, and background colour. After each character is printed, the cursor position is advanced.
By default, the emulator has automatic right margins turned on. Conceptually, automatic right margins means that writing a character in the last column automatically returns to the first column and moves down a row, a line wrap. If automatic right margins are turned off, writing a character in the last column does not move down a row or return to the first column. In practice, things are slightly more complex if automatic margins are turned on. Rather than a wrap happening there and then when the character is written to the last column, instead a pending wrap is flagged. If the cursor is then explicitly moved, the pending wrap is cancelled. Otherwise, when the next graphic is to be printed the pending wrap is enacted immediately beforehand.
By default, the emulator also has scrolling turned on, so that moving down from the last row scrolls the buffer up and moving up from the first row scrolls the buffer down. If scrolling is turned off, moving down from the last row or moving up from the first row have no effect. Scrolling only applies to cursor advancement by printing characters or the Newline, Index, or Reverse Index control characters. It does not apply to cursor motion control sequences.
Terminal input operates in terms of a stream of input events, comprising Unicode characters or special keys.
The directory
/input
FIFO receives a sequence of 4-byte messages.
To avoid message tearing, realizers must ensure that they do not write messages using multiple system calls.
A message is a 32-bit word in host byte order.
The most significant byte denotes the message type and the interpretation of the remainder of the message.
0x00nnnnnn
A null message. This is ignored
0x01cccccc
Unicode character.
The UCS-4 code point is cccccc
,
This is UTF-8 encoded and sent through to the terminal line discipline as terminal input.
If bracketed paste has been switched on, the control sequence denoting end of paste is sent beforehand.
0x02kkkkmm
A system keypad key.
The key number is kkkk
and mm
is a set of bitflags indicating the current state of modifier keys.
System keys are ignored by the terminal emulator.
0x04kkkkmm
The absolute horizontal position of the mouse.
The column number is kkkk
and mm
is a set of bitflags indicating the current state of modifier keys.
If bracketed paste has been switched on, the control sequence denoting end of paste is sent before any generated mouse report.
0x05kkkkmm
The absolute vertical position of the mouse.
The row number is kkkk
and mm
is a set of bitflags indicating the current state of modifier keys.
If bracketed paste has been switched on, the control sequence denoting end of paste is sent before any generated mouse report.
0x06kknnmm
A mouse wheel motion.
The wheel number is kk
, the (signed) amount by which the wheel is scrolled is nn
, and mm
is a set of bitflags indicating the current state of modifier keys.
If bracketed paste has been switched on, the control sequence denoting end of paste is sent before any generated mouse report.
0x07kkbbmm
A mouse button.
The button number is kk
, its state is bb
, and mm
is a set of bitflags indicating the current state of modifier keys.
The well-known buttons are numbered in DEC VT order: left, middle, right, side.
If bracketed paste has been switched on, the control sequence denoting end of paste is sent before any generated mouse report.
0x09cccccc
Pasted Unicode character.
The UCS-4 code point is cccccc
,
This is UTF-8 encoded and sent through to the terminal line discipline as terminal input.
If bracketed paste has been switched on, the control sequence denoting start of paste is sent beforehand.
If the character is ESC or CSI, the control sequence denoting end of paste is sent afterward.
0x0Ckkkkmm
A consumer device key.
The key number is kkkk
and mm
is a set of bitflags indicating the current state of modifier keys.
Consumer device keys are ignored by the terminal emulator, because of a lack of appropriate escape sequences and control sequences for representing them.
0x0Ekkkkmm
An extended key.
The key number is kkkk
and mm
is a set of bitflags indicating the current state of modifier keys.
The terminal emulator sends the equivalent control sequence through to the line discipline as terminal input.
If bracketed paste has been switched on, the control sequence denoting end of paste is sent beforehand.
0x0Fkkkkmm
A function key.
The function key number is kkkk
and mm
is a set of bitflags indicating the current state of modifier keys.
The terminal emulator sends the equivalent control sequence through to the line discipline as terminal input.
If bracketed paste has been switched on, the control sequence denoting end of paste is sent beforehand.
The modifier flags represent abstract level2, level3, control, group2, and super modifiers. (See ISO 9995 for the concepts of levels and groups; super is an abstract "Command" or "GUI" modifier.) The input protocol does not comprise standalone events for modifier keys or associated modifier lock keys. The terminal emulator has no dealing in keyboard modifier state tracking; nor, similarly, does it deal in using modifiers to renumber function keys or to change dual-mode keypads. Those are both the province of realizers.
The numbering of system, extended, and consumer keys largely follows the ID numbers used for keys in the USB HID protocol, with some exceptions, and a number of additions for keys that are not present in USB. Importantly, keys that generate ordinary Unicode characters are sent as a Unicode character message and not as an extended key with the USB keycode. The terminal emulator has no dealing in keyboard maps and exactly how keystrokes translate to Unicode characters; which are the province of a realizer.
The DEC status reports report that a locator is always present and available, and that it is a mouse device.
Two mouse protocols are supported, the xterm Private Mode #1006 protocol and the DEC VT Locator protocol.
These correspond to the ttymouse=sgr
and ttymouse=dec
settings in vim, respectively.
These are superior to the unsupported xterm Private Mode #9 ("xterm.X10"), #1005 ("xterm.UTF-8"), and #1015 ("urxvt") protocols, for several reasons including avoiding encoding ambiguities and display dimension limits, and supersede them.
In the DECPM #1006 protocol, the click-only (DECPM #1000), button motion (DECPM #1002), and all events (DECPM #1003) modes are all available. The mouse grabber mode (DECPM #1001) is not.
In the DEC VT Locator protocol, only character coordinates are available, the terminal emulator having no dealings in displays; they being the province of a realizer and possibly not even be pixel-addressible.
The DEC status reports report that a keyboard is always present, with an unknown country layout.
Extended keys and function keys cause the terminal emulator to send control sequences through the line discipline to terminal input. What control sequences are sent depends from what console type the terminal emulator is set to: SCO console, Linux console, FreeBSD console, NetBSD console, or DEC VT.
No console terminal type defines control sequences for all keys, or distinct control sequences where it does define them. For example, the Linux console only defines control sequences for function keys from 1 to 20 and has no defined control sequences for function keys outwith that range. (The SCO console defines control sequences for function keys from 1 to 48.) Similarly, the FreeBSD console, the Linux console, the SCO console, and the NetBSD console all make no distinction between the calculator keypad and the equivalent main keypad or cursor/editing keypad keys. (The DEC VT520 does.)
The ECMA-48 FNK sequences are not generated by the terminal emulator; no kernel virtual console sends them, in any mode.
Most consoles use the non-standard DECFNK sequences instead, for function keys and for some (to varying extents) extended keys.
However, there is a wide degree of variance amongst consoles that can cause problems if the TERM
environment variable (and hence termcap/terminfo terminal type) used by applications connected to the slave side of the terminal does not match the console emulation being employed by the terminal emulator.
The Linux console employs the DECFNK sequences for FIND
and SELECT
for the HOME
and END
keys.
The other consoles employ the DECFNK FIND
and SELECT
sequences for those keys, with different control sequences for HOME
and END
(as per a DEC VT520 in VT mode).
This has two major consequences:
If the terminal emulator is in Linux console mode, HOME
and END
will not be correctly recognized if the TERM
environment variable does not also specify "linux"; and if the terminal emulator is in another mode but the TERM
environment variable is set to "linux", FIND
and SELECT
will be incorrectly recognized and HOME
and END
will not be recognized at all.
The terminal emulator generates the DECFNK sequences for all function keys, and the different DEC control sequences for the PAD_F1 to PAD_F4 extended keys. The Linux, FreeBSD, NetBSD, and SCO consoles all substitute PAD_F1 to PAD_F4 for the F1 to F4 keys, and have no equivalents for the true F1 to F5 keys on an actual DEC terminal which are what the DECFNK sequences (from 11 to 15) designate. None of this remapping, however, is the domain of the terminal emulator. The terminal emulator input protocol deals in a set of "true DEC VT" function keys and separate extended PAD_F1 to PAD_F4 keys, as if there were a real DEC VT and keys F1 to F5 were all (even the always-local F5) set to non-local mode. It is entirely up to each realizer to determine what keys on the realizing keyboard generate function key input messages and what real keys generate extended key input messages. Local mode function keys are an internal matter for realizers.
Similarly, the terminal emulator has no dealings in numlock, nor in local editing cursor key modes. The calculator keypad keys always generate their "application mode", a.k.a. "numlock off", control sequences; and the cursor/editing keypad keys always generate their "application mode" sequences. Such mode switching is an internal matter for realizers.
The Private Mode #2004 controls whether bracketed paste is switched on. When it is, sequences of successive pasted characters are bracketed by DECFNK sequences denoting paste on and paste off.
To prevent actual pasted character sequences from resembling DECFNK, paste is switched off after every pasted ESC or CSI character. If the next character is a pasted one, paste will then be switched back on again.
The terminal emulator does not replicate all features of a real hardware terminal. Its goal is to provide a workalike for (the TUI parts of the) the virtual consoles that are/were built in to the Linux and BSD operating system kernels. There is no support for the historical features of real terminal hardwares such as attached printers, page switching, status lines, XON/XOFF modem flow control, programmable function keys, alternative (WYSE/TVI) control/escape sequences, direct auxiliary serial device I/O, and graphics modes.
Rather, the terminal emulator is aimed at handling the outputs of TUI programs that use the "linux
" terminal type (for the Linux kernel virtual terminals), the "pcvtXX
" terminal type (for NetBSD kernel virtual terminals), the "pccon
" terminal type (for OpenBSD kernel virtual terminals), or the "xterm
" terminal types (for FreeBSD kernel virtual terminals).
(These are the terminal types set by vc-get-tty(1).)
The terminal emulator also has no dealing in the things that are the domain of separate realizing tools. The modular nature of user-space virtual terminals means that the terminal emulator has no knowledge of the actual devices used to realize the terminal, and that there can be zero or many realizers for any given virtual terminal. There is no support for font definitions, window titles, 256-colour VGA palettes, VGA-specific overscan and underline, VESA power, screen-savers, or multi-mode keypads. In response to device status report requests, it always responds with fixed information about a set of pseudo-devices.
The terminal emulator is 8-bit clean and employs UTF-8 as standard. Therefore, the terminal emulator has no need for mechanisms to switch 8-bit code pages amongst multiple character sets. There is no ISO 2022 support.
The two-character 7-bit mechanism of ECMA-48 (section 5) is not only present, but is more completely implemented than in several kernel console terminal emulators, which usually only implement the 7-bit aliases for CSI and OSC.
Both ECMA-48 and the DEC VT520 Video Terminal Programmer Information [EK-VT520-RM] reference are straightforward and clear about numeric parameters to cursor motion and screen editing control sequences: a value of 0 is given no special meaning and just means zero rows/columns/whatever. However, that is not the behaviour of the Linux or FreeBSD kernel virtual terminal emulators, both of which replace any zero parameters with the value 1. Therefore, because of the aforementioned goal, this terminal emulator does the same and does not follow the ECMA-48 standard or the DEC VT520 documentation.
Neither ECMA-48 nor the DEC VT520 Video Terminal Programmer Information [EK-VT520-RM] reference document a quirk of most DEC VT-family emulators: holding line wrap pending. Actual implementations, including the FreeBSD kernel virtual terminal emulator, have this mechanism. This is largely undocumented behaviour, but it is behaviour that many programs (including the prompt display of the Z Shell, for example) rely upon. So it is implemented here.
ECMA-48 permits parameters to CSI control sequence to comprise sequences of sub-parameters separated by colons, meaning that a full parameter list is a semi-colon separated sequence of colon-separated subsequences.
ISO 8613-6 SGR colour extensions were intended to make use of this mechanism, with colour values being encoded as sub-parameters behind a leading 38 or 48 sub-parameter.
That standard explicitly states in section 13.1.8 that "Pe" values are separated by character "3/10" (i.e. U+003A).
This would have resulted in SGR sequences that were unambiguous, since it would not be possible to mistake a sub-parameter specifying a colour value for an SGR attribute code: CSI 1;4;38:5:14;48:2:3:7:224m
.
A long-standing misunderstanding/misreading on the parts of many people has led, however, to most programs (from terminfo to the Linux kernel terminal emulator) sending and expecting semi-colons instead of colons, leading to ambiguous SGR sequences: CSI 1;4;38;5;14;48;2;3;7;224m
.
For compatibility, the terminal emulator has to do the same; even though it would be good to fix this 20-year-old bug.