How to count Unicode characters correctly in different languages?

JavaScript: use [...str].length. Python: len(str) is correct. PHP: use mb_strlen($str, 'UTF-8'). Go: utf8.RuneCountInString(str). Java: str.codePointCount(0, str.length()). Most native .length methods count code units, not visible characters.

How to get string length in bash?

In bash: ${#var} counts characters. For byte length: echo -n "$str" | wc -c. For POSIX-compatible behavior across all shells, use wc -m for character count or wc -c for byte count.

What is the difference between VARCHAR and TEXT for string length in SQL?

VARCHAR(n) stores up to n characters with a length constraint. TEXT stores up to 65KB in MySQL or unlimited in PostgreSQL. VARCHAR(n) adds a validation layer; TEXT is for arbitrarily long content. In MySQL use LENGTH(), in SQL Server use LEN() to get the actual string length.

String Length Calculator Online - Count Characters, Words & Bytes

Count characters, words, lines, bytes, and detailed text statistics with real-time analysis.

edit Input Text

Key Features

analytics

Basic & Detailed Modes

Basic shows essential counts; Detailed adds advanced statistics like max/min line length and average word length.

bolt

Real-time Analysis

All statistics update instantly as you type. No Calculate button needed.

memory

UTF-8 Byte Count

Accurate byte size using TextEncoder API. ASCII = 1 byte, CJK = 3 bytes, emoji = 4 bytes.

lock

Privacy First

All processing happens locally in your browser. No data ever leaves your device.

Understanding String Encoding

abc

ASCII

1 byte per character. Covers English letters, digits, basic symbols. Every ASCII string is valid UTF-8.

language

UTF-8

1-4 bytes per char. ASCII=1B, Latin=2B, CJK=3B, emoji=4B. Dominant web encoding. Backward compatible with ASCII.

data_array

UTF-16

2 or 4 bytes per char. Used internally by JavaScript and Java. BMP=2B, supplementary chars (emoji, rare CJK)=4B via surrogate pairs.

translate

GBK / GB2312

1-2 bytes per char. Legacy Chinese encoding. ASCII=1B, Chinese=2B. Still used in legacy Chinese systems and databases.

Frequently Asked Questions

Usage

How does the tool count words when there are consecutive spaces?expand_more

Multiple consecutive spaces, tabs, and newlines are treated as a single word boundary. "hello world" with three spaces still counts as two words. The word count is computed by splitting on one or more whitespace characters and filtering out empty strings, matching how most text editors and word processors count words.

How are emoji and multi-byte Unicode characters counted?expand_more

JavaScript's .length counts UTF-16 code units, so most characters including CJK and simple emoji count as 1. However, some emoji composed of multiple code points with zero-width joiners (family emoji, skin-tone-modified emoji, flag sequences) may count as 2 or more. The byte count uses TextEncoder (UTF-8): ASCII = 1 byte, Latin-1 supplement = 2 bytes, CJK = 3 bytes, emoji = 4 bytes.

How are sentences detected in Detailed mode?expand_more

Sentences are detected by splitting on sentence-ending punctuation: period (.), question mark (?), and exclamation mark (!). Each resulting segment that contains at least one non-whitespace character is counted as a sentence. This is a heuristic approach — it does not handle abbreviations (e.g., "Dr.", "U.S.") or decimal numbers (e.g., "3.14") that contain periods.

Why do character count and byte count differ?expand_more

Character count counts Unicode code points as JavaScript sees them (UTF-16 code units). Byte count uses UTF-8 encoding via TextEncoder. In UTF-8, ASCII letters are 1 byte, accented Latin characters are 2 bytes, Chinese/Japanese/Korean characters are 3 bytes, and emoji are 4 bytes. A 10-character Chinese string could be 30 bytes, while a 10-character ASCII string is only 10 bytes.

Can I use the byte count for API payload size checking?expand_more

Yes — the byte count shown uses UTF-8 encoding, which matches how most web APIs and HTTP clients encode text. If you are building a JSON API, the byte count of the string value (excluding JSON syntax overhead like quotes and commas) is what the tool measures. For the full payload size including JSON structure, serialize the entire object and measure its length separately.

Why does JavaScript string length differ from PHP mb_strlen?expand_more

JavaScript's .length counts UTF-16 code units. For characters in the Basic Multilingual Plane (BMP), this is the same as the number of characters. However, for characters outside the BMP (like some emoji and rare CJK characters), JavaScript counts them as 2 because they use surrogate pairs. PHP's mb_strlen() with UTF-8 counts actual Unicode characters, giving a different result. For example, the emoji 😀 is 2 in JavaScript's .length but 1 in mb_strlen. Always specify the encoding when comparing string lengths across languages.

What is the maximum string length in JavaScript?expand_more

The ECMAScript specification limits string length to 2⁵³ - 1 elements (~9 quadrillion code units), but in practice the maximum depends on the browser and available memory. Most modern browsers can handle strings up to ~256 MB, which corresponds to roughly 256 million ASCII characters. For very large strings, operations become slow because JavaScript strings are immutable and copying requires O(n) memory. If you need to process large text, consider streaming or chunking the input.

How to count Unicode characters correctly across languages?expand_more

For accurate Unicode character counts: In JavaScript, use [...str].length or Array.from(str).length instead of str.length. In Python, len(str) returns the correct Unicode character count. In PHP, use mb_strlen($str, 'UTF-8') instead of strlen(). In Go, utf8.RuneCountInString(str) gives the correct count. In Java, str.codePointCount(0, str.length()) handles surrogate pairs. The key insight: most language-native string length methods count code units, not visible characters. Use Unicode-aware APIs for accurate results.

Length of string in bash — how to get it?expand_more

In bash, the length of a string variable is obtained with ${#var}. For example: str="hello"; echo ${#str} prints 5. This counts characters, not bytes. For byte length, use echo -n "$str" | wc -c. Note that bash's ${#var} counts characters based on the current locale settings — for UTF-8 locales it correctly counts multi-byte characters as single characters. For POSIX-compatible behavior across all shells, pipe to wc -m for character count or wc -c for byte count.

Difference between VARCHAR and TEXT in SQL for string length?expand_more

In MySQL, VARCHAR(n) stores up to n characters (max 65,535 bytes total row limit) and uses 1-2 bytes overhead. TEXT stores up to 65,535 bytes (65KB) with 2 bytes overhead. In PostgreSQL, VARCHAR(n) stores up to n characters (no overhead), TEXT stores unlimited characters — functionally they are identical, with VARCHAR(n) adding a length constraint. In SQL Server, VARCHAR(n) max is 8,000 characters, and VARCHAR(MAX) stores up to 2GB. Use VARCHAR when you need a length constraint; use TEXT/VARCHAR(MAX) for arbitrarily long content. The actual string length in characters can be checked with LENGTH(column) (MySQL) or LEN(column) (SQL Server).

String Length by Programming Language

Different programming languages count string length differently. The table below compares how each language handles character counting, byte counting, and Unicode.

Language	Char Count	Byte Count	Unicode-Aware?	Notes
JavaScript	`str.length`	`new TextEncoder().encode(str).length`	⚠ UTF-16 code units	Use `[...str].length` for correct char count
Python	`len(str)`	`len(str.encode('utf-8'))`	✓ Correct	Counts Unicode correctly
PHP	`mb_strlen($s, 'UTF-8')`	`strlen($s)`	⚠ mb_strlen only	`strlen()` returns bytes, not chars!
Go	`utf8.RuneCountInString(s)`	`len(s)`	⚠ RuneCount only	`len()` on string returns bytes
Java	`str.codePointCount(0, str.length())`	`str.getBytes('UTF-8').length`	⚠ codePointCount	`.length()` counts UTF-16 code units
Ruby	`str.length`	`str.bytesize`	✓ Correct	Returns Unicode characters
C#	`str.Length`	`Encoding.UTF8.GetByteCount(str)`	⚠ UTF-16 code units	Use `StringInfo` for text elements
Rust	`s.chars().count()`	`s.len()`	✓ Correct	`.chars()` iterates Unicode

String Length Calculator Online - Count Characters, Words & Bytes

Key Features

Basic & Detailed Modes

Real-time Analysis

UTF-8 Byte Count

Privacy First

Understanding String Encoding

ASCII

UTF-8

UTF-16

GBK / GB2312

Frequently Asked Questions

String Length by Programming Language

Related Tools

Text Diff

Diff Tool

Text Compare

Text Case Converter

Text Replacer

Slug Generator

Word Counter

Text Reverser

Line Sorter