Conteranto - Methodology

The problem

When you translate a sentence, the words might be correct but the tone is wrong. A polite email in English can sound neutral in Japanese. A casual request in Spanish becomes strangely formal in French.

This is not just a feeling. Havaldar et al. (ACL 2025) tested six translation systems and measured exactly how much style gets lost. They found that every system -- from Google Translate to GPT-4 -- pushes translations toward the middle. Very polite text becomes only somewhat polite. Very casual text becomes more neutral.

The problem is worse for non-Western languages. Translations into Japanese and Chinese lose more tone than translations into Spanish or French.

What we measure

Conteranto adjusts three style dimensions. Each one is backed by a multilingual dataset with native-speaker annotations:

Politeness -- How much respect the text shows. From casual ("Hey, send that over") to highly respectful ("I would be most grateful if you could share the document").

20,500 annotated examples in English, Spanish, Japanese, Chinese

Formality -- The language register. From everyday speech ("Can you send it?") to formal tone ("We kindly request submission of the document").

8,000 annotated examples in English, French, Italian, Portuguese

Intimacy -- How personal the tone feels. From warm and close ("Thanks so much, really appreciate you!") to distant and impersonal ("The department acknowledges receipt").

11,800 annotated examples in English, Spanish, Portuguese, Italian, French, Chinese

What the research found

The paper measured style alignment: the correlation between the intended style of the original text and the style of the translation. Higher is better.

System	Style Alignment	Note
Google Translate	0.58	Best standard system
GPT-3.5	0.51
GPT-4	0.49
Llama-3.2-11B	0.48
GPT-4 + RASTA	0.70	+43% with style examples

Standard quality metrics like BLEU showed no correlation with style alignment. A translation can score high on quality while completely missing the tone.

The non-Western gap

Native Japanese text has similar politeness variety to Spanish (variance around 0.20). But after translation, that variety collapses:

Target Language	Native Variety	After Translation	Lost
Spanish	0.23	0.17	26%
Japanese	0.20	0.09	55%
Chinese	0.20	0.13	35%

Japanese translations lose more than half their tonal range. This is why cultural translation matters most for exactly the languages that current tools handle worst.

How Conteranto works

When you pick a target language, Conteranto sets the three sliders to match how people in that culture typically communicate. Japanese starts with high politeness and low intimacy. Dutch starts with low politeness and high intimacy. You can then adjust each slider to fit your specific context.

The translation prompt tells the model exactly how polite, formal, and personal the output should be -- and specifically warns against the neutrality bias that the paper documented. For non-Western languages, the prompt includes extra guidance to use native style mechanisms (like Japanese keigo or Persian ta'arof) rather than just translating English politeness markers.

Example: "Could you send the report?"

Japanese (Politeness 77, Formality 78, Intimacy 35):
Standard translation gets the words right but sounds flat.
Conteranto uses keigo and honorific forms to match how a Japanese speaker would actually make this request in a professional setting.

Example: same sentence

Dutch (Politeness 35, Formality 42, Intimacy 72):
Standard translation sounds too formal for Dutch culture.
Conteranto makes it warmer and more direct, matching how Dutch speakers actually communicate with colleagues.

Conteranto supports 76 languages across 7 regions, including many that current AI tools handle poorly -- Yoruba, Amharic, Quechua, Khmer, and more. For these underrepresented languages, cultural translation matters most because the risk of getting the tone wrong is highest.

References

Havaldar, S., Stein, A., Wong, E., & Ungar, L. (2025). Towards Style Alignment in Cross-Cultural Translation. ACL 2025. arXiv:2507.00216 | Code + Data
Havaldar, S. et al. (2023). Multilingual Politeness Dataset. EMNLP 2023. arXiv:2310.07135
Briakou, E. et al. (2021). XFORMAL: Multilingual Formality Style Transfer. NAACL 2021. arXiv:2104.04108
Pei, J. et al. (2023). SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis. SemEval 2023. arXiv:2210.01108
Hofstede, G. et al. (2010). Cultures and Organizations: Software of the Mind. 3rd ed. McGraw-Hill.
Hall, E. T. (1976). Beyond Culture. Anchor Books.