Module:Grc-utilities/doc
Tokenization
The function tokenize breaks the text into meaningful units of a single consonant or monophthong letter, or diphthong, with any diacritics, as shown below. This function is used by Module:grc-translit and Module:grc-accent, and by the sandbox module Module:grc-pronunciation/sandbox.
The first argument is the word to be tokenized. The second is a boolean: if true, the function will group εω together as a diphthong, for instance in πόλεως (póleōs), genitive of πόλῐς (pólis, “city state”).
| word | tokens |
|---|---|
| ἡμεῖς | ἡ, μ, εῖ, ς |
| οἷαι | οἷ, αι |
| ἀναῡ̈τέω | ἀ, ν, α, ῡ̈, τ, έ, ω |
| δαΐφρων | δ, α, ΐ, φ, ρ, ω, ν |
| τούτῳ | τ, ού, τ, ῳ |
| ὑϊκός | ὑ, ϊ, κ, ό, ς |
| ἡ Ἑλήνη | ἡ, , Ἑ, λ, ή, ν, η |
| νηῦς | ν, ηῦ, ς |
| υἱός | υἱ, ό, ς |
| ὄργυιᾰ | ὄ, ρ, γ, υι, ᾰ |
| οὐ δοκεῖν ἀλλ᾽ εἶναι ἀγαθὸν | οὐ, , δ, ο, κ, εῖ, ν, , ἀ, λ, λ, ᾽, , εἶ, ν, αι, , ἀ, γ, α, θ, ὸ, ν |
Testcases
- (公元前5世纪,阿提卡) IPA(帮助):/ɡǎːd/
- (公元前1世纪,埃及) IPA(帮助):/ɡad/
- (公元4世纪,通用希腊语) IPA(帮助):/ɣað/
- (公元10世纪,拜占庭) IPA(帮助):/ɣað/
- (公元15世纪,君士坦丁堡) IPA(帮助):/ɣað/
