String Length Calculator Online - Count Characters, Words & Bytes
Count characters, words, lines, bytes, and detailed text statistics with real-time analysis.
Key Features
Basic & Detailed Modes
Basic shows essential counts; Detailed adds advanced statistics like max/min line length and average word length.
Real-time Analysis
All statistics update instantly as you type. No Calculate button needed.
UTF-8 Byte Count
Accurate byte size using TextEncoder API. ASCII = 1 byte, CJK = 3 bytes, emoji = 4 bytes.
Privacy First
All processing happens locally in your browser. No data ever leaves your device.
Understanding String Encoding
ASCII
1 byte per character. Covers English letters, digits, basic symbols. Every ASCII string is valid UTF-8.
UTF-8
1-4 bytes per char. ASCII=1B, Latin=2B, CJK=3B, emoji=4B. Dominant web encoding. Backward compatible with ASCII.
UTF-16
2 or 4 bytes per char. Used internally by JavaScript and Java. BMP=2B, supplementary chars (emoji, rare CJK)=4B via surrogate pairs.
GBK / GB2312
1-2 bytes per char. Legacy Chinese encoding. ASCII=1B, Chinese=2B. Still used in legacy Chinese systems and databases.
Frequently Asked Questions
How does the tool count words when there are consecutive spaces?
How are emoji and multi-byte Unicode characters counted?
.length counts UTF-16 code units, so most characters including CJK and simple emoji count as 1. However, some emoji composed of multiple code points with zero-width joiners (family emoji, skin-tone-modified emoji, flag sequences) may count as 2 or more. The byte count uses TextEncoder (UTF-8): ASCII = 1 byte, Latin-1 supplement = 2 bytes, CJK = 3 bytes, emoji = 4 bytes.How are sentences detected in Detailed mode?
Why do character count and byte count differ?
TextEncoder. In UTF-8, ASCII letters are 1 byte, accented Latin characters are 2 bytes, Chinese/Japanese/Korean characters are 3 bytes, and emoji are 4 bytes. A 10-character Chinese string could be 30 bytes, while a 10-character ASCII string is only 10 bytes.Can I use the byte count for API payload size checking?
Why does JavaScript string length differ from PHP mb_strlen?
.length counts UTF-16 code units. For characters in the Basic Multilingual Plane (BMP), this is the same as the number of characters. However, for characters outside the BMP (like some emoji and rare CJK characters), JavaScript counts them as 2 because they use surrogate pairs. PHP's mb_strlen() with UTF-8 counts actual Unicode characters, giving a different result. For example, the emoji 😀 is 2 in JavaScript's .length but 1 in mb_strlen. Always specify the encoding when comparing string lengths across languages.What is the maximum string length in JavaScript?
How to count Unicode characters correctly across languages?
[...str].length or Array.from(str).length instead of str.length. In Python, len(str) returns the correct Unicode character count. In PHP, use mb_strlen($str, 'UTF-8') instead of strlen(). In Go, utf8.RuneCountInString(str) gives the correct count. In Java, str.codePointCount(0, str.length()) handles surrogate pairs. The key insight: most language-native string length methods count code units, not visible characters. Use Unicode-aware APIs for accurate results.Length of string in bash — how to get it?
${#var}. For example: str="hello"; echo ${#str} prints 5. This counts characters, not bytes. For byte length, use echo -n "$str" | wc -c. Note that bash's ${#var} counts characters based on the current locale settings — for UTF-8 locales it correctly counts multi-byte characters as single characters. For POSIX-compatible behavior across all shells, pipe to wc -m for character count or wc -c for byte count.Difference between VARCHAR and TEXT in SQL for string length?
LENGTH(column) (MySQL) or LEN(column) (SQL Server).String Length by Programming Language
Different programming languages count string length differently. The table below compares how each language handles character counting, byte counting, and Unicode.
| Language | Char Count | Byte Count | Unicode-Aware? | Notes |
|---|---|---|---|---|
| JavaScript | str.length | new TextEncoder().encode(str).length | ⚠ UTF-16 code units | Use [...str].length for correct char count |
| Python | len(str) | len(str.encode('utf-8')) | ✓ Correct | Counts Unicode correctly |
| PHP | mb_strlen($s, 'UTF-8') | strlen($s) | ⚠ mb_strlen only | strlen() returns bytes, not chars! |
| Go | utf8.RuneCountInString(s) | len(s) | ⚠ RuneCount only | len() on string returns bytes |
| Java | str.codePointCount(0, str.length()) | str.getBytes('UTF-8').length | ⚠ codePointCount | .length() counts UTF-16 code units |
| Ruby | str.length | str.bytesize | ✓ Correct | Returns Unicode characters |
| C# | str.Length | Encoding.UTF8.GetByteCount(str) | ⚠ UTF-16 code units | Use StringInfo for text elements |
| Rust | s.chars().count() | s.len() | ✓ Correct | .chars() iterates Unicode |