Notes from 27 January 2022 APAC Space Web Conference

Notes from 27 January 2022 APAC Space Web Conference

Overview

This APAC Space session focused on understanding and participating in two Public Comment proceedings related to Internationalized Domain Names (IDNs). Satish Babu (APRALO) facilitated the open community discussion.

For AOB, Pam Little (NomCom) called on the regional community to apply for available leadership positions in the ICANN ecosystem. Applications must reach NomCom by 11 Mar 2022, and queries could be directed to nomcom2022@icann.org.


Details of Session

1) Public Comment: Additional Unicode Scripts for Support in IDNs

Background
Sarmad Hussain (ICANN IDN and UA Programs Senior Director) gave an overview of IDNs and Unicode scripts. IDNs are identified through the IDNA2008 standard, which is based on the Unicode standard – i.e. IDNA2008 uses code point properties of scripts as defined by Unicode, and calculates the code points to be used in IDNs. IDNA2008 also recommends how registries using these code points should include them.

In addition, Unicode states that not all the 159 scripts it contains might be suitable for domain names. The scripts are grouped into 3 categories in Unicode Standard Annex #31 (UAX#31):

  • Recommended Scripts – 29 modern scripts (and other common/inherited scripts) with widespread customary use amongst large communities. The 28 scripts ICANN has been working on for the Maximal Starting Repertoire (MSR) come from here. ICANN has been working with various script communities on Root Zone Label Generation Rules (RZ-LGR; more on this later) for these scripts.
  • Excluded Scripts – 94 historic/obsolete scripts that are not in customary modern use and should be excluded from use in identifiers. Some of them also have unresolved architectural issues that make them unsuitable as identifiers. Scripts here include Coptic, Egyptian Hieroglyphs etc.
  • Limited Use Scripts – 34 modern scripts in relatively limited use compared to Recommended Scripts, e.g. Balinese, Canadian Aboriginal Syllabics, Cherokee, Javanese etc. To avoid security issues, Unicode recommends that some implementations may wish to disallow Limited Use scripts in identifiers.

The Public Comment Proceeding
The main question of the Public Comment is which other scripts out of the 159 might be suitable for ICANN to support apart from the initial 28 for the MSR. Here, ICANN had engaged with Unicode and IDN experts to review UAX#31 in the context of IDNs. The resulting analysis and recommendations were published in the report, “Evaluating Unicode Scripts for Use in IDNs”, which proposes that:

  • Recommended Scripts are good for use in the root zone and at the second level.
  • Excluded Scripts should not be used in identifiers at all.
  • Limited Use Scripts should only be used at the second level and on a case-by-case basis.

On the rationale behind the case-by-case basis for Limited Use Scripts, Sarmad explained that there is need to analyse if these scripts present any significant security and stability issues. For the analysis, ICANN needs to work with relevant script communities on how to use such scripts. However, these scripts may not have significant communities available to consult or have much published information available, making their study and rules of use difficult to determine. Without access to an active community, it is very hard to develop usage solutions for such scripts.

Care also has to be taken not just for possible confusions with ASCII labels but also with variant mappings, cross-script variants, and other security issues. Showing examples of Limited Use Scripts and Recommended Scripts, confusion can arise from similar-looking scripts across both categories, e.g. New Tai Lue (Limited Use) and Myanmar (Recommended) scripts. Limited Use Scripts may also have no open and readily available fonts, so they may not render properly on-screen.

Feedback from the Public Comment proceeding, which closes on 15 Feb 2022, will be used to determine how ICANN can continue work on implementing IDNs. Particular questions being asked include:

  1. Should ICANN support Limited Use Scripts at the second level in IDNs? If yes, which scripts and why? If no, why?
  2. Are changes needed to the criteria for case-by-case support of Limited Use Scripts at the second level? If yes, what changes?
  3. Should ICANN support Excluded Scripts at the second level in IDNs? If yes, which scripts should be included and how should any security issue with them be mitigated? If no, why?
  4. Is there any feedback on the shortlisted scripts for the root zone?

Open Community Discussion
Satish asked what the considerations are for using Limited Use scripts only at the second level and not the top level/root zone at all. Sarmad explained that the top level is a broader zone which is used globally while the second level is relatively more focused in use, such as by a certain community or for a particular context. In this sense, the second level does not have a global audience. Similar reasoning applies to scripts, where the second level is more suited for scripts not as widely used as those at the top level.

Satish also asked if everyone’s inputs to the Public Comment would be taken the same way or whether feedback from linguistic communities might be especially taken in. Sarmad explained that in the case of Limited Use scripts, input from relevant linguistic communities is essential in developing a solution to support such scripts. Support solutions are not developed by ICANN but by the respective linguistic communities. The absence of such communities or community input will mean that there will be no recommendations on how these scripts can be used securely. Satish suggested that at At-Large, end-users of a language could group themselves together to participate in providing feedback.

2) Public Comment: Proposal for Myanmar Script RZ-LGR

Background
Pitinan Kooarmornpatana (ICANN IDN Programs Senior Manager) provided a brief history of RZ-LGR, which were developed by the community as a mechanism to create the rules that define variants and how to use them. Work consists of two stages, beginning with a script community forming a Generation Panel (GP). The rules the GP develops are sent to the Integration Panel (IP), which ensures that everything can be synchronised. Thereafter, the developed rules are entered into the root zone.

The RZ-LGR function by installing these rules for a script into a tool. The tool works out if a TLD label is valid and calculates its variants. The tool also checks if a label and its variants may clash with existing TLDs and their variants. A label’s variants can come under two groups: i) allocatable for use, and ii) blocked, i.e. not for use due to, for example, security issues.

In 2019 and 2020, the Board resolved for the GNSO and ccNSO to take into account certain recommendations which included using RZ-LGR in their policy development processes. In 2021, the GNSO’s Final Report on New gTLD Subsequent Procedures (SubPro) incorporated RZ-LGR for the next round of new gTLDs.

Currently, RZ-LGR development has reached Version 4 where 18 scripts have been integrated into the root zone. 7 more scripts are expected to be integrated this year. Pitinan noted that some GPs had been meeting for several years, showing the great effort on the part of the community. She also hoped that GPs could be formed for Thana and Tibetan scripts.

The Public Comment Proceeding
Yin May Oo (Myanmar Script Generation Panel Co-Chair) then gave an overview of the RZ-LGR developed for Myanmar script, which is open for Public Comment. Myanmar script covers 6 languages – Burmese, Shan, Rakhine, Sgaw Karen, Mon, and Pa’O Karen. All of them use the same basic components of Myanmar script but many ways to compose linguistic structures, thus making the script complex. Users of the 6 languages range from 50,000 to 15 million people.

As an example of complexity, Yin May explained the composition of the Myanmar word for “human ability”: လူစွမ်း  Here, a component of the word could be a simple structure of a vowel and a consonant: /လ / ူ / (U+101C / U+1030), but also include standard vowels needing many diacritics (i.e. tone marks) and an additional consonant that modifies the pronunciation of the vowel: / စ / ွ / မ / ် / း / (U+1005 / U+103D / U+1019 / U+103A / U+1038).

The repertoire for Myanmar script includes more than 90 code points that the GP wanted to examine. Looking into these, the GP also evaluated cross-script variant code points, such as the full circle character “ဝ” which has similar versions in other language scripts (e.g. Latin, Greek, Cyrillic etc.). To ensure it did not create any new in-script variants for Myanmar script, the GP divided the script repertoire into different categories and set certain rules, such as for how diacritics should be used.

Overall, the GP wanted to avoid the script becoming more complex. It also consulted local language users who were happy with the rules developed, and in a way that could allow domain names to be easily composed. The Public Comment proceeding will close on 3 Mar 2022, and the GP will then proceed to submit its work to the IP for evaluation.

Open Community Discussion
Satish congratulated the Myanmar GP for its complex work. He asked if Myanmar script faces any issues with allocatable label variants. He explained that some languages naturally do not have this issue, but others like the Chinese language do.

Pitinan clarified that Myanmar script does have its in-script variants, some of which are allocatable while others are blocked. In general, rules have been set such that only characters from the same Myanmar language, e.g. Burmese, can be applied within a label. By not being able to mix scripts across different Myanmar languages, this in a way limits the number of allocatable labels. Yin May added that Myanmar has many language variations where the same consonant might be used with different signs to spell the same vowel, hence the GP ensured only one diacritic is used for the whole label to avoid confusing users.