...
Answer (Roger Carney, GoDaddy and former T/T PDP WG member): I do not believe Recommendation 2 states, suggests or implies “…that all registration data must be tagged with a language/script tag following the adoption of the policy…”. During discussions in the PDP I purposefully avoided (as much as I could) using the term “tagging (tag/tagged)”. This term seems to have originated from the IRD WG and I did not think it was useful/appropriate to bring up implementation proposals in the PDP. I think Question 1 as written above is interpreting the text/meaning of Recommendation 2 somewhat different than I did when agreeing with the Recommendation in the PDP. The text in the final report is “Whilst noting that a Whois replacement system should be capable of receiving input in the form of non-ASCII script contact information, the Working Group recommends its data fields be stored and displayed in a way that allows for easy identification of what the different data entries represent and what language(s)/script(s) have been used by the registered name holder.” This recommendation does not require or even suggest/recommend that any additional data fields be created. We also need to remember that RDAP does not store nor does it display data, as such I don’t believe this recommendation intended RDAP to be considered as a replacement WHOIS system in this context.
Answer (w/ name and affiliation): James Galvin, Afilias, T/T PDP WG member, IRD Expert WG chair): Use of the word "tag" and the action "tagging" generally implies a technical interpretation of "easy identification" that includes the use of identifiers specified by RFC 5646 "Tags for Identifying Languages". That is one method of "easy identification" but is not the choice I would support in this IRT. I believe the question of "easy identification" of the script used in the content of data fields is easily answered by recognizing that the Unicode Standard explicitly specifies in which scripts a code point is valid. Given a set of code points the intersection of all valid scripts quickly identifies the script in use. It is language identification that is problematic. Unfortunately, what was not sufficiently considered by either the T/T PDP WG or the IRD Expert WG is that identifying a language requires context and, further, that that context is not generally available. Consider too that for some data elements, e.g., notably a Contact Name, there may be more than one language present in the field content. This latter issue is not addressed at all in the EPP protocol and, since that protocol is in the path from a registrant to a directory service display of data (WHOIS or RDAP), even if the language was known by the registrar there is no way to have this information available for display without technical standards work to "update" EPP. Finally, in consideration of the fact that the language identifier is only needed when a transformation is to take place or is indicated to have taken place, it's presence should only be required in those circumstance. This nuanced interpretation places the burden of "easy identification" of language identifier on the transformation action. Since transformation is explicitly not mandatory the solution for "easy identification" of a language identifier does not need to be a standard and the transformer can perform this task in any way that meets their needs. This nuanced interpretation of "easy identification" may or may not require review outside of the IRT group.
Answer (w/ name and affiliation):
...