EDRM Internationalization Data Set

The EDRM Internationalization Data Set (18.4 MB) is a snapshot of selected Ubuntu localization mailing list archives covering 23 languages in 724 MB of email.

The languages are:

ArabicCatalanChinese
DanishDutchEnglish
FinnishFrenchGerman
GreekHebrewHungarian
ItalianJapaneseKorean
NorwegianPolishPortuguese
RomanianRussianSpanish
SwedishTamilTurkish