Jump to the Background for the history and background of this issue. |
Below are some of the bug reports/observations noted during testing.
Description of Issue | Noted by | Date Added | Status | Additional Notes on testing, fixes | |||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Better identification of
in its error response emails. When the translation tool has issues with the email, an email is sent from transbot-no-reply@icann.org with Dear <sender>
Other questions :
|
updated | Tool now identifies the subject line of the problem email in the body of the email | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | Missing emails from one list to another In 2018, it was noticed that many emails are being silently dropped from one list, mainly emails from lacralo-es not being sent to lacralo-en Comparing :
Why is this happening? | Dev Anand Teelucksingh |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | <DNT> tag isn't case sensitive. | 07/03/2017 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | The <DNT> tag will be seen in the original email but not in the translated one. See EN and ES messages. | 07/03/2017 | To achieve this, it would be necessary to re-architecture the transbot code. Upon investigating the issue it was discovered this is a known limitation, rather than a bug in the existing code. The request has been recorded as something to consider in a future iteration of the translation tool. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | ES Translated emails seem to remove several line breaks from the EN See EN and ES messages | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | Re: attachments, new-transbot lists have a message and attachment limit of 200K Given many PDFs will be larger, will be hard to test unless message size limit is raised Tested with a smaller attachment, the attachment does go through. See EN and ES | 08/03/2017 | The new-transbot list email size limit has been increased to 400K. This is enforced for the entire email, including text and attachment. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | Handling an email sent to both new-transbot-en and new-transbot-es lists at the same time. When such an email to both lists happens, some emails don't get translated. | 09/03/2017 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description of Issue | Noted by | Date Added | Status | Additional Notes on testing, fixes | |
---|---|---|---|---|---|
A decimal number in an email e.g 3.9MB will trigger the error message "Sentence punctuation must be followed by a space" |
| Tool now handles numbers with decimal spaces. | |||
Re: attachments, new-transbot lists have a message and attachment limit of 200K Given many PDFs will be larger, will be hard to test unless message size limit is raised Tested with a smaller attachment, the attachment does go through. See EN and ES | 08/03/2017 | The new-transbot list email size limit has been increased to 400K. This is enforced for the entire email, including text and attachment. | |||
A custom machine translation tool for the Latin American and Caribbean Regional At-Large Organisation (LACRALO) mailing lists was implemented around early 2011. However several issues or factors continue to negatively impact the working of the machine translation tool and in turn, has lead to great difficulty in communication and collaboration with the English and Spanish speaking communities in the LAC region.
See presentation by the At-Large Technology Taskforce Working Group for the ICANN53 meeting which has worked to identify the issues and continues to follow this issue with ICANN Staff.
To recap, LACRALO has two mailing lists
Emails in english sent to lac-discuss-en@atlarge-lists.icann.org are machine translated via your custom tool using Google Translate and posted to lac-discuss-es@atlarge-lists.icann.org.
Similarly, emails in Spanish sent to the lac-discuss-es@atlarge-lists.icann.org are translated and posted to lac-discuss-en@atlarge-lists.icann.org.
(this section is from a June 18 2012 email)
When an email with attachments such as PDFs is sent to one list, the subject line and body of the email is translated and sent to the other list BUT without the attachment.
The subject line of translated emails (seemingly) from the lac-discuss-ES list to the lac-discuss-EN list often translated to garbled text.
Examples abound from a review of the archives.
(a) First email posted to lac-discuss-en list : (b) which is translated and posted to lac-discuss-es list as: (c) Someone on the lac-discuss-es list responds posts to lac-discuss-es list as: (d) which is translated and posted to the en list as: |
Another example: Email on lac-discuss-es list : http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004518.html gets translated and posted as an email on lac-discuss-es list: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005897.html Note the difference with "Utf-8? Q?" in this example as compared to "Iso-8859-1? Q?" in the previous example. |
Such gibberish in the subject lines can get even worse if someone responds on the EN list and the translation further scrambles the subject line on the other list.
Again, examples abound from a review of the archives but as one example, consider the subject line for an email on lac-discuss-es list
which gets translated and posted to the EN list as |
Consider example #1 again -
First email posted to lac-discuss-en list :
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005932.html
Subject line: [lac-discuss-en] ICANN full list of applied for gTLD strings
which is translated and posted to lac-discuss-es list as:
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004552.html
Subject line: Lista completa de la ICANN solicitó cadenas de gTLD
The email at http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004552.html shows that the subject line is missing the [lac-discuss-es]. This hampers filtering by ES users and makes it difficult to track threaded conversations.
There have been numerous complaints about the quality of the translation of the actual body of emails with strange characters, some of which are superscript characters appearing in the translated version.
Examples aboud (repeating phrase, I'm afraid) on the LACRALO list archives, here is one example:
Email to lac-discuss-en : http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005858.html
got translated to this on the lac-discuss-es list: http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004483.html
As you can see,
* a character like a double quote " is translated to "
* a word like "organisation" is translated to organización
* a sentence like "The highest decision making body in any organisation is also subject to rules." is translated to
"El más alto órgano de decisión en cualquier la organización también está sujeto a reglas."
ICANN Staff have created two mailing lists (New-transbot-en and New-transbot-es) with a select number of persons on those lists for testing purposes.
Some of the key changes implemented in the new translation tool.
Name | Affiliation |
---|---|
LACRALO | |
LACRALO | |
LACRALO | |
LACRALO | |
LACRALO | |
AFRALO | |
APRALO | |
NARALO | |
ICANN At-Large Staff |
Status | Date Added | Description | Additional Notes | |
---|---|---|---|---|
(Noted by Dev Anand Teelucksingh and satish.babu) | 28 Feb 2016 | Subject line in body of translated email has the sender and first line of the email on the same line, when the sender and first line should be on separate lines. Two examples:
| The converted body text was included immediately after the subject and from line in translated emails. A new line character was inserted between sender name and first body text line to separate them. Initial testing complete; additional testing in progress. Example: https://community.icann.org/x/AofDAw | |
(Noted by satish.babu) | 28 Feb 2016 | There is an empty space in the beginning of most (but not all) lines in the translated email | After investigation it appears empty spaces may have been added by the email client or by Google Translate API. Space removed before all translated lines in mailing list emails to resolve the issue. Tested with mail IDs Outlook, Yahoo, Gmail. Verified that translated emails are not indented; empty spaces are not appearing at the beginning of lines. Example: https://community.icann.org/x/vYPDAw | |
(Noted by satish.babu) | 28 February 2016 | At the end of the message, the sender's name starts with a lower case ('dev Anand', although it is 'Dev Anand' in the original message | Google Translate understands 'Dev' as an abbreviation and as a rule converts it to all lowercase. Name was hardcoded to fix 'Dev' to begin with a capital letter. Tested with mail IDs Outlook, Yahoo, Gmail. Verified that name is appearing correctly with first letter capitalized. Example: https://community.icann.org/x/ooPDAw | |
(Noted by satish.babu) | 28 February 2016 | 'Transbot' is mis-spelled as 'tansbot' (third line from the bottom) | Misspelling was hardcoded. Applied hardcode fix to correct spelling of 'tansbot' to 'transbot' in translated emails. Tested with mail IDs Outlook, Yahoo, Gmail. Verified that spelling is now appearing correctly as 'transbot.' Example: https://community.icann.org/x/oILDAw | |
(Noted by Dev Anand Teelucksingh | April 17 2016 | the transbot can't handle cedilla - as At-Large Staff signature lists a staff member with a cedilla in her name, any message from At-Large Staff will result in a message not translated
The April thread with the subject line "CALL FOR MEMBERS: At-Large Public Interest Working Group" on EN : http://mm.icann.org/pipermail/new-transbot-en/2016-April/thread.html and ES : http://mm.icann.org/pipermail/new-transbot-es/2016-April/thread.html showed how the issue was isolated after several variations of the original email were tried. This is a critical bug, as any cedilla in any word in an email would result in the email not being translated. | The issue has been resolved specific to the reported cases of broken emails caused by the cedilla character in the AL Staff signature. As a larger issue, it is still in progress. Efforts around the reported case led to wider investigation into how the transbot and email applications handle Unicode characters. This is important UTF-8 compliance work and requires extensive testing. Recent tests with a wider set of characters using Outlook have been successful. Tests with those same characters have been inconsistent with Gmail and Yahoo. The team is continuing to research, test, and make progress. Examples: https://community.icann.org/x/pYrDAw | |
Noted by Dev Anand Teelucksingh | April 17 2016 | The phrase "This Working Group is open to interested members of the At-Large community." gets translated to "Este grupo de trabajo está abierto a los miembros interesados \u200b\u200bde la comunidad de alcance." | The issue was related to zero-width space, which was being injected by Google Translate API. Zero-width space is used after characters that aren't followed by a visible space, but after which there may be a line break (source). It was encoded into the Unicode and was appearing as /u200b. To fix, it was replaced in the translated text with no space. Tested with mail IDs Outlook, Yahoo, Gmail. Verified that phrase is translated correctly without additional characters. Example: https://community.icann.org/x/t4PDAw | |
Email subject lines can get jumbled and distorted along threads of translated emails | This issue is related to extra spaces in subject lines, and research shows it is a known issue with Microsoft Office/Mail-man server. There is not a known resolution at this time. As a workaround solution, the new test lists were designed so that subject lines aren't translated and original subject are retained. | |||
Attachments are not retained on translated emails | Changes were made to support attachments on translated emails. The file formats that will be retained between lists are TXT, PDF, WORD, JPEG, PPT, PNG, GIF |