2022-06-01 IDNs EPDP String Similarity Review
The call for the IDNs EPDP String Similarity Review team will take place on Wednesday, 01 June 2022 at 12:00 UTC for 60 minutes.
For other places see: https://tinyurl.com/2p8bcfrf
PROPOSED AGENDA
BACKGROUND DOCUMENTS
Example slide deck: https://docs.google.com/presentation/d/1Yr-YJBf9QitohFUpkSkkcB8ibevBxfSRap5QegQAWlE/edit?usp=sharing
PARTICIPATION
Notes/ Action Items
Notes and Action Items
IDNs EPDP String Similarity Review small team
1 June 2022
- Small group members agreed to meet next week as scheduled, noting that one small group member (Edmon) will already be in transit for ICANN74.
Slide 18: Example 6: Arabic Script
- Question: Example is to show that there could situation where a blocked variant could be visually similar to another string that does not have a variant relationship. In those situations, the question is whether they should be examined by the string sim review to determine whether the two applications might need to be in a contention set, correct?
- Response: If you start with Label B, there is the possibility of confusion between Label B and Label A even though they are visually different, they could be confusable. The challenge is that even though the difference in shape is due to variants, the people doing the string sim analysis are not familiar with that detail and they consider these as uniquely different. But when an Arabic language speaker sees them, there is a connection that may create confusion.
- Comment: From the perspective of an Arabic speaker, this is an uncommon word. Label A could be confused with the highlighted blocked variant or Label B, however an Arabic keyboard would not have the blocked variant under Label B. It does not seem that there is much chance of an Arabic speaker typing this.
- Comment: In some cases, it might not be the end user typing the domain. Sometimes they receive something and copy and paste the text.
- Question: If we use Level 2 vs. Level 3, what happens in this example?
- Response: For Level 2, both A and B could co-exist in the root zone. For Level 3, because the blocked variant of Label B looks similar to Primary Label A, it is likely that just one would be allowed (either A or B) due to string sim. If applied for in the same round, there would be a contention set. If one is already delegated, the other would not be allowed in subsequent rounds.
- Comment: If A and B can exist at the same time, is it a concern if a domain under label A looks very much like a domain under domain B? This is a question we want to explore. If yes, we should consider Level 3. If no, this might move us towards Level 2.
- Staff comment: From Label B’s perspective, if Label B is delegated and users of the language using Label B understand that the text could take the form of the blocked variant of Label B. If that is the perception, when they look at Label A they may think it is a form of Label B, and may click on it with that intent.
- Comment: Maybe we could have a Level 3a and Level 3b. Level 3a is just all of A against B and its allocatable variants. Level 3b would be all of A against all of B.
- Staff Comment: The staff report suggests that if someone applies for Label A and someone else applies for Label B, Label A (but not its allocatable and blocked variants) will be compared to Label B and all of Label B’s allocatable and blocked variants.
- Some had interpreted Level 3 as being a comparison of all variants of A against all variants of B, but members suggested that this may not be necessary and the staff paper approach may make more sense.
- Comment: Visual similarity may be one factor for consideration, but the applicant should also write their motivation for applying for the string and intended use, which may be able to assist with avoiding delegation of problematic strings.
- Comment: Label A can be represented at 3 levels: Label, Label + allocatable variants, Label + allocatable and blocked variants. Label B can also be represented at 3 levels: Label, Label + allocatable variants, Label + allocatable and blocked variants. There are 9 different possible combinations of how these different combinations of levels are compared.
- Suggestion: Compare Level 2 for Label A with Level 3 of Label B. Level 2 of Label B will be compared to Level 3 of Label A. Blocked variants of A and blocked variants of B are not compared with one another.
- Comment: Benefit of this solution is that we don’t get the computational complexity of having all blocked variants of one string compared with all blocked variants of another. We find all of the critical cases, but still have a manageable string sim review set.
- Comment: If both Labels are applied for in the same round, we will need to do a two sided comparison.
Action Item 1: Staff to update example 6 to show example of Label A (black and green) compared with Label B (black, green and red). Label B (black and green) would be compared with Label A (black, green, and red).
- Comment: If both Labels are applied for in the same round, Level 1 will not result in a contention set, Level 2 will not result in contention set, but the possible options for Level 3 will result in contention set.
Slide 19 – Example 7: Chinese
- Comment: For a native/fluent speaker, they look similar but there is low chance of confusion. There may be a possibility of confusion for a non-native speaker who is not fluent.
- Question: Is the Traditional Chinese version in the example used in daily life?
- Response: For confusability, it really depends. If unicorn and unicom are confusable, then some of these might also be confusable. On a personal level, it is not confusable, but it might be possible to fake something and cause confusion. The examples would be worth flagging in the evaluation process for review by the panel. The decisions of the Panel could be appealed.
Action Item 2: For example 7, staff to compare A vs. B and A vs. C using the model suggested for comparison for the example 6.
Slide 20 – Example 8: Kanji – Han
Action Item 3: Time permitting, staff to extend example 8 in line with the action items above.