Attacks that allow for unintended control of Unicode and homoglyphic characters, described by the researchers in this report leverage text encoding that may cause source code to be interpreted differently by a compiler than it appears visually to a human reviewer. Source code compilers, interpreters, and other development tools may permit Unicode control and homoglyph characters, changing the visually apparent meaning of source code.
Internationalized text encodings require support for both left-to-right languages and also right-to-left languages. Unicode has built-in functions to allow for encoding of characters to account for bi-directional, or Bidi ordering. Included in these functions are characters that represent non-visual functions. These characters, as well as characters from other human language sets (i.e., English vs. Cyrillic) can also introduce ambiguities into the code base if improperly used.
This type of attack could potentially be used to compromise a code base by capitalizing on a gap in visually rendered source code as a human reviewer would see and the raw code that the compiler would evaluate.
The use of attacks that incorporate maliciously encoded source code may go undetected by human developers and by many automated coding tools. These attacks also work against many of the compilers currently in use. An attacker with the ability to influence source code could introduce undetected ambiguity into source code using this type of attack.
The simplest defense is to ban the use of text directionality control characters both in language specifications and in compilers implementing these languages.
Two CVEs were assigned to address the two types of attacks described in this report.
CVE-2021-42574 was created for tracking the Bidi attack. CVE-2021-42694 was created for tracking the homoglyph attack.
Thanks to the reporters, Nicholas Boucher and Ross Anderson of The University of Cambridge (UK).
This document was written by Chuck Yarbrough.