Conteranto - Methodology

What happens when you pick a language?

When you choose Japanese as a target language, the sliders move to show how Japanese people typically communicate -- more polite, less direct. Pick Dutch, and they shift the opposite way -- very direct, informal. These starting positions are not random. They come from decades of cross-cultural research on how people in different countries actually communicate.

Where do the numbers come from?

Hofstede's Cultural Dimensions

In the 1970s, researcher Geert Hofstede surveyed over 100,000 IBM employees across 76 countries about their values and workplace behavior. He discovered measurable patterns that differ consistently between cultures. His work, published in Cultures and Organizations: Software of the Mind (3rd ed, 2010), remains the most widely used framework in cross-cultural research. We use three of his dimensions:

Power Distance (PDI) -- How much do people accept hierarchy? In Malaysia (PDI=104), you would never call your boss by their first name. In Israel (PDI=13), everyone does -- even in the military. High PDI cultures use more polite, deferential language when speaking to superiors.

Individualism (IDV) -- Do people think of themselves as "I" or "we"? In the US (IDV=91), people say "I think we should..." In South Korea (IDV=18), people say "Our team feels that..." Collectivist cultures avoid singling out individuals and prefer indirect, face-saving communication.

Uncertainty Avoidance (UAI) -- How comfortable are people with ambiguity? In Greece (UAI=112), people follow strict protocols and formal procedures. In Singapore (UAI=8), people adapt on the fly and keep things informal. High UAI cultures prefer structured, formal language.

Hall's Context Theory

Some cultures say exactly what they mean. Others expect you to read between the lines. Anthropologist Edward T. Hall described this in Beyond Culture (1976):

Low-context cultures (Dutch, German, American) -- Communication is explicit and direct. "The report is due Friday" means exactly that.
High-context cultures (Japanese, Persian, Arab) -- Meaning is wrapped in context, tone, and relationship. "It might be nice to have the report soon" could mean the same thing.

Important: Hall described cultures qualitatively -- he did not assign numerical scores. Our context values (CTX) are estimates based on Hall's framework, Erin Meyer's country rankings in The Culture Map (2014), and subsequent cross-cultural communication research. Every CTX value is shown transparently in the data table below so you can see exactly what we used.

Politeness Theory

Brown and Levinson's Politeness: Some Universals in Language Usage (1987) established that all cultures use strategies to protect "face" -- the social image people want to maintain. However, which strategies they use varies dramatically. Some cultures soften requests to protect the listener's autonomy (negative politeness), while others emphasize warmth and solidarity (positive politeness). Recent research confirms that politeness markers are among the most culturally variable aspects of language -- and the most frequently lost in translation (Masoud et al., 2024).

How we calculate the defaults

We combine these research scores into four dimensions using a weighted formula. Each formula reflects the research consensus on which cultural factors most influence that communication style:

Politeness = 0.35 × PDI + 0.30 × (100 - IDV) + 0.35 × CTX
Directness = 0.40 × IDV + 0.35 × (100 - CTX) + 0.25 × (100 - PDI)
Formality = 0.35 × PDI + 0.35 × UAI + 0.30 × CTX
Attribution = 0.55 × IDV + 0.25 × (100 - PDI) + 0.20 × (100 - CTX)

In plain English

Politeness -- Strong hierarchy + group-oriented + read-between-the-lines = more polite, deferential language. Think: Japanese, Korean, Persian.

Directness -- Individualist + say-what-you-mean + egalitarian = more direct, explicit speech. Think: Dutch, Israeli, American.

Formality -- Strong hierarchy + need for structure + high-context = more formal register and vocabulary. Think: Arabic, Japanese, Greek.

Attribution -- Individualist + egalitarian + explicit communication = "I did this" instead of "mistakes were made." Think: American, Australian, Dutch.

Example: Japanese

Japan has PDI=54, IDV=46, UAI=92, CTX=90 (very high-context). Plugging into the formulas:
Politeness = 0.35(54) + 0.30(54) + 0.35(90) = 67, plus keigo adjustment +10 = 77
Directness = 0.40(46) + 0.35(10) + 0.25(46) = 33, minus adjustment -5 = 28
This matches reality: Japanese communication is famously polite and indirect.

Language-specific features

Some languages have built-in politeness systems that go beyond what country-level culture scores capture. For example, Japanese has keigo -- six different politeness levels baked into the grammar itself. Persian has ta'arof -- an elaborate courtesy ritual where you might refuse a compliment three times before accepting. Korean has six speech levels that change verb endings based on who you're talking to.

Because these features are part of the language structure (not just cultural preference), we add small corrections (+3 to +10 points) on top of the formula:

Language	What makes it special	Adjustment
Japanese	Keigo -- six grammatical politeness levels in verb conjugation	Politeness +10, Directness -5, Formality +5
Persian	Ta'arof -- elaborate courtesy system with ritualized offers and refusals	Politeness +10, Directness -8
Korean	Six speech levels that change verb endings based on social hierarchy	Politeness +8, Directness -3, Formality +5
Thai	Pronoun and particle system that encodes social status in every sentence	Politeness +8, Directness -5
Urdu	Adab -- etiquette system with formal/informal verb forms	Politeness +8, Directness -3
Arabic	Elaborate honorific forms of address tied to religious and social norms	Politeness +5, Formality +3
Hindi	Three-level honorific verb system (intimate, neutral, respectful)	Politeness +5
Vietnamese	Dozens of pronouns that encode age, gender, and social relationship	Politeness +5, Directness -3
Dutch	Cultural norm of directness that exceeds what Hofstede scores predict	Directness +5, Politeness -3
Hebrew	Dugri culture -- "straight talk" valued as honesty, not rudeness	Directness +5, Politeness -3

Why underrepresented languages matter

There are over 7,000 languages spoken in the world. Most AI models are trained on roughly 100 of them. That means billions of people are effectively left out of the AI revolution.

The problem comes down to data. AI language models learn from text on the internet -- and the internet is overwhelmingly in English. When a model has seen millions of English sentences but only a few thousand in Yoruba or Khmer, it simply cannot produce the same quality of output. This creates a cycle: less data leads to worse tools, which leads to less digital content, which leads to even less data.

What "low-resource" means -- A language is considered low-resource when there is not enough digitized text data to train AI models effectively. This can happen because: (1) the language has fewer speakers, (2) speakers have limited internet access, or (3) the language uses a script or structure that existing tools handle poorly.

Why this matters for cultural translation

Cultural translation is especially important for underrepresented languages -- and especially hard. Here is why:

Cultural norms are less documented. For English or French, there are thousands of studies on politeness, formality, and communication styles. For Amharic, Yoruba, or Khmer, this research is sparse. Our Hofstede-based defaults help fill this gap with a principled starting point.
Translation quality is lower. When AI models have less training data for a language, they are more likely to produce translations that are grammatically correct but culturally wrong -- exactly the problem Conteranto addresses.
The stakes are higher. A mistranslation between English and Dutch is usually caught quickly. A mistranslation into a language with fewer bilingual speakers may go unnoticed and cause real misunderstanding.
Every translation generates data. When users translate into underrepresented languages with Conteranto, they create examples of culturally adapted text that did not exist before -- contributing to the research on how these cultures actually communicate.

What Conteranto does differently

In the language selector, we mark each language with its NLP representation level -- high, medium, low, or very low. Languages are sorted within each region so that the most underrepresented ones appear first. This is not just labeling; it is a deliberate design choice to draw attention to the languages that need cultural translation the most and benefit from it the most.

Research context: Stanford's Human-Centered AI Institute documented in 2025 that most major LLMs underperform for non-English languages, are not attuned to relevant cultural contexts, and are not accessible in parts of the Global South (Stanford HAI, 2025). The Masakhane project has united researchers across Africa to build open-source NLP resources for dozens of African languages, and Cohere's Aya model (2024) covers 101 languages -- more than double previous models. Conteranto contributes to this effort by focusing on the cultural layer that even multilingual models miss.

Full data table

Below are all 76 languages with their research inputs (PDI, IDV, UAI from Hofstede; CTX estimated by us) and the computed output values. Every number is transparent -- you can verify the formula yourself.

Language	Region	PDI	IDV	UAI	CTX*	Adj.	Pol.	Dir.	For.	Att.
Loading data from API...

* CTX (Context) values are our estimates based on Hall's qualitative framework, not published scores.

References and further reading

Foundational works

Hofstede, G., Hofstede, G. J., & Minkov, M. (2010). Cultures and Organizations: Software of the Mind. 3rd ed. New York: McGraw-Hill.
Hall, E. T. (1976). Beyond Culture. New York: Anchor Books.
Brown, P. & Levinson, S. C. (1987). Politeness: Some Universals in Language Usage. Cambridge: Cambridge University Press.
Meyer, E. (2014). The Culture Map: Breaking Through the Invisible Boundaries of Global Business. New York: PublicAffairs.

Recent research on cultural alignment in AI (2024-2025)

Masoud, R. I., Liu, Z., Ferianc, M., Treleaven, P., & Rodrigues, M. (2024). Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions. COLING 2025. arXiv:2309.12342
Li, C., Chen, M., Wang, J., Sitaram, S., & Xie, X. (2024). CultureLLM: Incorporating Cultural Differences into Large Language Models. NeurIPS 2024. proceedings.neurips.cc
Kharchenko, J., Roosta, T., Chadha, A., & Shah, C. (2024). How Well Do LLMs Represent Values Across Cultures? Empirical Analysis Based on Hofstede Cultural Dimensions. arXiv:2406.14805

Underrepresented languages and AI (2024-2025)

Stanford HAI (2025). Mind the (Language) Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts. hai.stanford.edu
Liu, Y. (2025). Improving Machine Translation Accuracy for Underrepresented Languages Using Transformer Models. International Journal of Bilingualism. doi:10.1177/14727978251337995
Cambridge NLP (2024). Natural Language Processing Applications for Low-Resource Languages. Natural Language Processing, 31, 183-197. doi:10.1017/nlp.2024.33
Cohere for AI (2024). Aya: A Massively Multilingual Language Model Covering 101 Languages. cohere.com

Data sources

Clearly Cultural. Geert Hofstede Cultural Dimensions. clearlycultural.com
Hofstede Insights. Country Comparison Tool. hofstede-insights.com