String Length Calculator Online - Count Characters, Words & Bytes

Count characters, words, lines, bytes, and detailed text statistics with real-time analysis.

edit Input Text

Key Features

analytics

Basic & Detailed Modes

Basic shows essential counts; Detailed adds advanced statistics like max/min line length and average word length.

bolt

Real-time Analysis

All statistics update instantly as you type. No Calculate button needed.

memory

UTF-8 Byte Count

Accurate byte size using TextEncoder API. ASCII = 1 byte, CJK = 3 bytes, emoji = 4 bytes.

lock

Privacy First

All processing happens locally in your browser. No data ever leaves your device.

Understanding String Encoding

abc

ASCII

1 byte per character. Covers English letters, digits, basic symbols. Every ASCII string is valid UTF-8.

language

UTF-8

1-4 bytes per char. ASCII=1B, Latin=2B, CJK=3B, emoji=4B. Dominant web encoding. Backward compatible with ASCII.

data_array

UTF-16

2 or 4 bytes per char. Used internally by JavaScript and Java. BMP=2B, supplementary chars (emoji, rare CJK)=4B via surrogate pairs.

translate

GBK / GB2312

1-2 bytes per char. Legacy Chinese encoding. ASCII=1B, Chinese=2B. Still used in legacy Chinese systems and databases.

Frequently Asked Questions

Usage
How does the tool count words when there are consecutive spaces?expand_more
Multiple consecutive spaces, tabs, and newlines are treated as a single word boundary. "hello world" with three spaces still counts as two words. The word count is computed by splitting on one or more whitespace characters and filtering out empty strings, matching how most text editors and word processors count words.
How are emoji and multi-byte Unicode characters counted?expand_more
JavaScript's .length counts UTF-16 code units, so most characters including CJK and simple emoji count as 1. However, some emoji composed of multiple code points with zero-width joiners (family emoji, skin-tone-modified emoji, flag sequences) may count as 2 or more. The byte count uses TextEncoder (UTF-8): ASCII = 1 byte, Latin-1 supplement = 2 bytes, CJK = 3 bytes, emoji = 4 bytes.
How are sentences detected in Detailed mode?expand_more
Sentences are detected by splitting on sentence-ending punctuation: period (.), question mark (?), and exclamation mark (!). Each resulting segment that contains at least one non-whitespace character is counted as a sentence. This is a heuristic approach — it does not handle abbreviations (e.g., "Dr.", "U.S.") or decimal numbers (e.g., "3.14") that contain periods.
Why do character count and byte count differ?expand_more
Character count counts Unicode code points as JavaScript sees them (UTF-16 code units). Byte count uses UTF-8 encoding via TextEncoder. In UTF-8, ASCII letters are 1 byte, accented Latin characters are 2 bytes, Chinese/Japanese/Korean characters are 3 bytes, and emoji are 4 bytes. A 10-character Chinese string could be 30 bytes, while a 10-character ASCII string is only 10 bytes.
Can I use the byte count for API payload size checking?expand_more
Yes — the byte count shown uses UTF-8 encoding, which matches how most web APIs and HTTP clients encode text. If you are building a JSON API, the byte count of the string value (excluding JSON syntax overhead like quotes and commas) is what the tool measures. For the full payload size including JSON structure, serialize the entire object and measure its length separately.
Why does JavaScript string length differ from PHP mb_strlen?expand_more
JavaScript's .length counts UTF-16 code units. For characters in the Basic Multilingual Plane (BMP), this is the same as the number of characters. However, for characters outside the BMP (like some emoji and rare CJK characters), JavaScript counts them as 2 because they use surrogate pairs. PHP's mb_strlen() with UTF-8 counts actual Unicode characters, giving a different result. For example, the emoji 😀 is 2 in JavaScript's .length but 1 in mb_strlen. Always specify the encoding when comparing string lengths across languages.
What is the maximum string length in JavaScript?expand_more
The ECMAScript specification limits string length to 253 - 1 elements (~9 quadrillion code units), but in practice the maximum depends on the browser and available memory. Most modern browsers can handle strings up to ~256 MB, which corresponds to roughly 256 million ASCII characters. For very large strings, operations become slow because JavaScript strings are immutable and copying requires O(n) memory. If you need to process large text, consider streaming or chunking the input.
How to count Unicode characters correctly across languages?expand_more
For accurate Unicode character counts: In JavaScript, use [...str].length or Array.from(str).length instead of str.length. In Python, len(str) returns the correct Unicode character count. In PHP, use mb_strlen($str, 'UTF-8') instead of strlen(). In Go, utf8.RuneCountInString(str) gives the correct count. In Java, str.codePointCount(0, str.length()) handles surrogate pairs. The key insight: most language-native string length methods count code units, not visible characters. Use Unicode-aware APIs for accurate results.
Length of string in bash — how to get it?expand_more
In bash, the length of a string variable is obtained with ${#var}. For example: str="hello"; echo ${#str} prints 5. This counts characters, not bytes. For byte length, use echo -n "$str" | wc -c. Note that bash's ${#var} counts characters based on the current locale settings — for UTF-8 locales it correctly counts multi-byte characters as single characters. For POSIX-compatible behavior across all shells, pipe to wc -m for character count or wc -c for byte count.
Difference between VARCHAR and TEXT in SQL for string length?expand_more
In MySQL, VARCHAR(n) stores up to n characters (max 65,535 bytes total row limit) and uses 1-2 bytes overhead. TEXT stores up to 65,535 bytes (65KB) with 2 bytes overhead. In PostgreSQL, VARCHAR(n) stores up to n characters (no overhead), TEXT stores unlimited characters — functionally they are identical, with VARCHAR(n) adding a length constraint. In SQL Server, VARCHAR(n) max is 8,000 characters, and VARCHAR(MAX) stores up to 2GB. Use VARCHAR when you need a length constraint; use TEXT/VARCHAR(MAX) for arbitrarily long content. The actual string length in characters can be checked with LENGTH(column) (MySQL) or LEN(column) (SQL Server).

String Length by Programming Language

Different programming languages count string length differently. The table below compares how each language handles character counting, byte counting, and Unicode.

Language Char Count Byte Count Unicode-Aware? Notes
JavaScriptstr.lengthnew TextEncoder().encode(str).length⚠ UTF-16 code unitsUse [...str].length for correct char count
Pythonlen(str)len(str.encode('utf-8'))✓ CorrectCounts Unicode correctly
PHPmb_strlen($s, 'UTF-8')strlen($s)⚠ mb_strlen onlystrlen() returns bytes, not chars!
Goutf8.RuneCountInString(s)len(s)⚠ RuneCount onlylen() on string returns bytes
Javastr.codePointCount(0, str.length())str.getBytes('UTF-8').length⚠ codePointCount.length() counts UTF-16 code units
Rubystr.lengthstr.bytesize✓ CorrectReturns Unicode characters
C#str.LengthEncoding.UTF8.GetByteCount(str)⚠ UTF-16 code unitsUse StringInfo for text elements
Rusts.chars().count()s.len()✓ Correct.chars() iterates Unicode