What is Punycode?

Punycode is a way to represent Unicode characters using a limited set of ASCII characters, designed for handling domain names with non-ASCII characters on the internet. Its main purpose is to convert non-ASCII characters into pure ASCII characters, enabling them to be resolved and processed normally within the Domain Name System (DNS).


Since the original domain name system was designed around the ASCII character set, it could not directly handle non-ASCII characters such as Chinese characters, Cyrillic letters, Greek letters, etc. To allow domain names containing non-ASCII characters to be used on the internet, Punycode was introduced.


Punycode utilizes an encoding algorithm based on ASCII characters, converting sequences of non-ASCII characters into an ASCII prefix starting with "xn--", followed by a series of ASCII characters. This allows these domain names to be handled within the existing domain name system.


For example, suppose we have a domain name with non-ASCII characters, like "范例.com". Through Punycode encoding, it would be converted to "xn--fsqr86j.com". Thus, the DNS system can correctly resolve this domain name.


Punycode encoding and decoding are achieved through a specific algorithm and can be performed using corresponding encoding and decoding tools or libraries.


Background and principles of Punycode:

Punycode is an ASCII-based encoding scheme designed to convert non-ASCII characters into ASCII characters for processing within the domain name system. It employs a specific encoding algorithm that converts sequences of non-ASCII characters into an ASCII prefix starting with "xn--", followed by a series of ASCII characters. This encoding method preserves the semantics of the original domain name while ensuring compatibility with the existing DNS system.


Applications of Punycode:

Punycode is widely used in internationalized domain names (IDNs), allowing domain names with non-ASCII characters to function normally on the internet. It plays a crucial role in domain name resolution, email, URLs, and other network identifiers in multilingual environments. With Punycode encoding, users can create domain names in their own languages and character sets, facilitating seamless interaction with the global internet.


The encoding and decoding process of Punycode:

The Punycode encoding process involves converting non-ASCII characters to their Unicode code points and then performing a series of transformations and compression operations to generate the corresponding ASCII character sequence. The decoding process is the inverse of the encoding process, converting the encoded ASCII character sequence back to the original non-ASCII characters.


Security issues with Punycode:

Due to the multitude of languages worldwide, the influx of various characters into domain names inevitably leads to some problems, such as phishing attacks.


Many Unicode characters, which represent Greek, Slavic, Armenian, and other letters in internationalized domain names, look similar to Latin letters but are processed as entirely different web addresses by computers. For example, the Slavic letter “а” (U+0430) and the Latin letter “a” (U+0041) are treated as different characters by browsers, but both are displayed as “a” in the address bar. Because some earlier browsers did not display the Punycode-encoded domain names in the address bar, this created many confusing and unclear domain names.


This vulnerability did not last long. Shortly after the vulnerability was disclosed, affected vendors patched the issue in subsequent updates.