Conteranto

Conteranto

METHODOLOGY

What happens when you pick a language?

When you choose Japanese as a target language, the sliders move to show how Japanese people typically communicate -- more polite, less direct. Pick Dutch, and they shift the opposite way -- very direct, informal. These starting positions are not random. They come from decades of cross-cultural research on how people in different countries actually communicate.

Where do the numbers come from?

Hofstede's Cultural Dimensions

In the 1970s, researcher Geert Hofstede surveyed over 100,000 IBM employees across 76 countries about their values and workplace behavior. He discovered measurable patterns that differ consistently between cultures. His work, published in Cultures and Organizations: Software of the Mind (3rd ed, 2010), remains the most widely used framework in cross-cultural research. We use three of his dimensions:

Power Distance (PDI) -- How much do people accept hierarchy? In Malaysia (PDI=104), you would never call your boss by their first name. In Israel (PDI=13), everyone does -- even in the military. High PDI cultures use more polite, deferential language when speaking to superiors.

Individualism (IDV) -- Do people think of themselves as "I" or "we"? In the US (IDV=91), people say "I think we should..." In South Korea (IDV=18), people say "Our team feels that..." Collectivist cultures avoid singling out individuals and prefer indirect, face-saving communication.

Uncertainty Avoidance (UAI) -- How comfortable are people with ambiguity? In Greece (UAI=112), people follow strict protocols and formal procedures. In Singapore (UAI=8), people adapt on the fly and keep things informal. High UAI cultures prefer structured, formal language.

Hall's Context Theory

Some cultures say exactly what they mean. Others expect you to read between the lines. Anthropologist Edward T. Hall described this in Beyond Culture (1976):

Important: Hall described cultures qualitatively -- he did not assign numerical scores. Our context values (CTX) are estimates based on Hall's framework, Erin Meyer's country rankings in The Culture Map (2014), and subsequent cross-cultural communication research. Every CTX value is shown transparently in the data table below so you can see exactly what we used.

Politeness Theory

Brown and Levinson's Politeness: Some Universals in Language Usage (1987) established that all cultures use strategies to protect "face" -- the social image people want to maintain. However, which strategies they use varies dramatically. Some cultures soften requests to protect the listener's autonomy (negative politeness), while others emphasize warmth and solidarity (positive politeness). Recent research confirms that politeness markers are among the most culturally variable aspects of language -- and the most frequently lost in translation (Masoud et al., 2024).

How we calculate the defaults

We combine these research scores into four dimensions using a weighted formula. Each formula reflects the research consensus on which cultural factors most influence that communication style:

Politeness = 0.35 × PDI + 0.30 × (100 - IDV) + 0.35 × CTX
Directness = 0.40 × IDV + 0.35 × (100 - CTX) + 0.25 × (100 - PDI)
Formality = 0.35 × PDI + 0.35 × UAI + 0.30 × CTX
Attribution = 0.55 × IDV + 0.25 × (100 - PDI) + 0.20 × (100 - CTX)

In plain English

Politeness -- Strong hierarchy + group-oriented + read-between-the-lines = more polite, deferential language. Think: Japanese, Korean, Persian.

Directness -- Individualist + say-what-you-mean + egalitarian = more direct, explicit speech. Think: Dutch, Israeli, American.

Formality -- Strong hierarchy + need for structure + high-context = more formal register and vocabulary. Think: Arabic, Japanese, Greek.

Attribution -- Individualist + egalitarian + explicit communication = "I did this" instead of "mistakes were made." Think: American, Australian, Dutch.

Example: Japanese

Japan has PDI=54, IDV=46, UAI=92, CTX=90 (very high-context). Plugging into the formulas:
Politeness = 0.35(54) + 0.30(54) + 0.35(90) = 67, plus keigo adjustment +10 = 77
Directness = 0.40(46) + 0.35(10) + 0.25(46) = 33, minus adjustment -5 = 28
This matches reality: Japanese communication is famously polite and indirect.

Language-specific features

Some languages have built-in politeness systems that go beyond what country-level culture scores capture. For example, Japanese has keigo -- six different politeness levels baked into the grammar itself. Persian has ta'arof -- an elaborate courtesy ritual where you might refuse a compliment three times before accepting. Korean has six speech levels that change verb endings based on who you're talking to.

Because these features are part of the language structure (not just cultural preference), we add small corrections (+3 to +10 points) on top of the formula:

Language What makes it special Adjustment
JapaneseKeigo -- six grammatical politeness levels in verb conjugationPoliteness +10, Directness -5, Formality +5
PersianTa'arof -- elaborate courtesy system with ritualized offers and refusalsPoliteness +10, Directness -8
KoreanSix speech levels that change verb endings based on social hierarchyPoliteness +8, Directness -3, Formality +5
ThaiPronoun and particle system that encodes social status in every sentencePoliteness +8, Directness -5
UrduAdab -- etiquette system with formal/informal verb formsPoliteness +8, Directness -3
ArabicElaborate honorific forms of address tied to religious and social normsPoliteness +5, Formality +3
HindiThree-level honorific verb system (intimate, neutral, respectful)Politeness +5
VietnameseDozens of pronouns that encode age, gender, and social relationshipPoliteness +5, Directness -3
DutchCultural norm of directness that exceeds what Hofstede scores predictDirectness +5, Politeness -3
HebrewDugri culture -- "straight talk" valued as honesty, not rudenessDirectness +5, Politeness -3

Why underrepresented languages matter

There are over 7,000 languages spoken in the world. Most AI models are trained on roughly 100 of them. That means billions of people are effectively left out of the AI revolution.

The problem comes down to data. AI language models learn from text on the internet -- and the internet is overwhelmingly in English. When a model has seen millions of English sentences but only a few thousand in Yoruba or Khmer, it simply cannot produce the same quality of output. This creates a cycle: less data leads to worse tools, which leads to less digital content, which leads to even less data.

What "low-resource" means -- A language is considered low-resource when there is not enough digitized text data to train AI models effectively. This can happen because: (1) the language has fewer speakers, (2) speakers have limited internet access, or (3) the language uses a script or structure that existing tools handle poorly.

Why this matters for cultural translation

Cultural translation is especially important for underrepresented languages -- and especially hard. Here is why:

What Conteranto does differently

In the language selector, we mark each language with its NLP representation level -- high, medium, low, or very low. Languages are sorted within each region so that the most underrepresented ones appear first. This is not just labeling; it is a deliberate design choice to draw attention to the languages that need cultural translation the most and benefit from it the most.

Research context: Stanford's Human-Centered AI Institute documented in 2025 that most major LLMs underperform for non-English languages, are not attuned to relevant cultural contexts, and are not accessible in parts of the Global South (Stanford HAI, 2025). The Masakhane project has united researchers across Africa to build open-source NLP resources for dozens of African languages, and Cohere's Aya model (2024) covers 101 languages -- more than double previous models. Conteranto contributes to this effort by focusing on the cultural layer that even multilingual models miss.

Full data table

Below are all 76 languages with their research inputs (PDI, IDV, UAI from Hofstede; CTX estimated by us) and the computed output values. Every number is transparent -- you can verify the formula yourself.

Language Region PDI IDV UAI CTX* Adj. Pol. Dir. For. Att.
Loading data from API...

* CTX (Context) values are our estimates based on Hall's qualitative framework, not published scores.

References and further reading

Foundational works

Recent research on cultural alignment in AI (2024-2025)

Underrepresented languages and AI (2024-2025)

Data sources