Help
Word and Character Count in WebTranslateIt
🔗Word Count
Word count is the number of words in a document or passage of text. Word count is commonly used by translators to determine the price of a translation job. When counting words for a translation job, the word count is based on the source language.
Therefore, what “counts as” a word, and which words “don’t count” toward the total is important.
With WebTranslateIt, HTML tags do not count against the word count, since you can click to paste them, but translatable attributes in HTML tags (alt
, summary
, placeholder
, standby
, abbr
, content
, title
and label
attributes) are extracted out and are included in the word count.
We use a XML parser to find the attributes, so if you translate a string having the same attribute for the same tag, like for instance <a href="https://webtranslateit.com" title="WebTranslateIt" title="A translation website">WebTranslateIt</a>
, only the first attribute will count in the word count.
Variable placeholders count as one word.
WebTranslateIt word count is language-aware, conforms to the latest Unicode Standard and has built-in, dictionary-based support for text in languages such as Chinese, Japanese or Thai. We’re currently using ICU v.70.1.
🔗Examples
Sentence | Language | Word Count |
---|---|---|
Hello, how are you? |
English | 4 words |
こんにちは元気ですか |
Japanese | 4 words |
Welcome to <a href="https://webtranslateit.com" title="Welcome back!">WebTranslateIt</a> |
English | 5 words |
There are %{count} posts |
English | 4 words |
🔗Character Count
In some languages pairs character count is used by translators to determine the price of a translation job. Counting characters using bytes was vastly used in the past, but is incorrect with some languages and with emojis, for instance. We think it is more correct to count characters by graphemes.
A grapheme is a sequence of one or more code points that are displayed as a single, graphical unit that a reader recognizes as a single element of the writing system. For instance, a
and ä
are graphemes, but they may consist of different code points.
With WebTranslateIt, HTML tags do not count against the character count, but translatable attributes in HTML tags (alt
, summary
, placeholder
, standby
, abbr
, content
, title
and label
attributes) are extracted out and are included in the character count. Variable placeholders are included in the character count.
Strings containing several successive whitespace characters or similar (,
\n
, \r
, \t
) are squished into one whitespace character for the character count.
For instance, this string:
Hello how
are you?
Counts as 18 characters.
🔗Examples
Character | Count using bytes | Count using graphemes |
---|---|---|
A | 1 byte | 1 grapheme |
🤔 | 2 bytes | 1 grapheme |
की | 2 bytes | 1 grapheme |
Next Up: Synchronization Tool. The WebTranslateIt Synchronization Tool wti is a powerful command-line tool designed for advanced users to synchronize your language files with WebTranslateIt… »