The problem
When you translate a sentence, the words might be correct but the tone is wrong. A polite email in English can sound neutral in Japanese. A casual request in Spanish becomes strangely formal in French.
This is not just a feeling. Havaldar et al. (ACL 2025) tested six translation systems and measured exactly how much style gets lost. They found that every system -- from Google Translate to GPT-4 -- pushes translations toward the middle. Very polite text becomes only somewhat polite. Very casual text becomes more neutral.
The problem is worse for non-Western languages. Translations into Japanese and Chinese lose more tone than translations into Spanish or French.
What we measure
Conteranto adjusts three style dimensions. Each one is backed by a multilingual dataset with native-speaker annotations:
Politeness -- How much respect the text shows. From casual ("Hey, send that over") to highly respectful ("I would be most grateful if you could share the document").
Formality -- The language register. From everyday speech ("Can you send it?") to formal tone ("We kindly request submission of the document").
Intimacy -- How personal the tone feels. From warm and close ("Thanks so much, really appreciate you!") to distant and impersonal ("The department acknowledges receipt").
What the research found
The paper measured style alignment: the correlation between the intended style of the original text and the style of the translation. Higher is better.
| System | Style Alignment | Note |
|---|---|---|
| Google Translate | 0.58 | Best standard system |
| GPT-3.5 | 0.51 | |
| GPT-4 | 0.49 | |
| Llama-3.2-11B | 0.48 | |
| GPT-4 + RASTA | 0.70 | +43% with style examples |
Standard quality metrics like BLEU showed no correlation with style alignment. A translation can score high on quality while completely missing the tone.
The non-Western gap
Native Japanese text has similar politeness variety to Spanish (variance around 0.20). But after translation, that variety collapses:
| Target Language | Native Variety | After Translation | Lost |
|---|---|---|---|
| Spanish | 0.23 | 0.17 | 26% |
| Japanese | 0.20 | 0.09 | 55% |
| Chinese | 0.20 | 0.13 | 35% |
Japanese translations lose more than half their tonal range. This is why cultural translation matters most for exactly the languages that current tools handle worst.
How Conteranto works
When you pick a target language, Conteranto sets the three sliders to match how people in that culture typically communicate. Japanese starts with high politeness and low intimacy. Dutch starts with low politeness and high intimacy. You can then adjust each slider to fit your specific context.
The translation prompt tells the model exactly how polite, formal, and personal the output should be -- and specifically warns against the neutrality bias that the paper documented. For non-Western languages, the prompt includes extra guidance to use native style mechanisms (like Japanese keigo or Persian ta'arof) rather than just translating English politeness markers.
Standard translation gets the words right but sounds flat.
Conteranto uses keigo and honorific forms to match how a Japanese speaker would actually make this request in a professional setting.
Standard translation sounds too formal for Dutch culture.
Conteranto makes it warmer and more direct, matching how Dutch speakers actually communicate with colleagues.
References
- Havaldar, S., Stein, A., Wong, E., & Ungar, L. (2025). Towards Style Alignment in Cross-Cultural Translation. ACL 2025. arXiv:2507.00216 | Code + Data
- Havaldar, S. et al. (2023). Multilingual Politeness Dataset. EMNLP 2023. arXiv:2310.07135
- Briakou, E. et al. (2021). XFORMAL: Multilingual Formality Style Transfer. NAACL 2021. arXiv:2104.04108
- Pei, J. et al. (2023). SemEval-2023 Task 9: Multilingual Tweet Intimacy Analysis. SemEval 2023. arXiv:2210.01108
- Hofstede, G. et al. (2010). Cultures and Organizations: Software of the Mind. 3rd ed. McGraw-Hill.
- Hall, E. T. (1976). Beyond Culture. Anchor Books.