Count bytes and analyze text encoding in real-time.
Analyze text size in different encodings
UTF-8 Size
0
0 B
UTF-16 Size
0
0 B
Characters
0
Words
0
Lines
0
Spaces
0
Non-ASCII
0
Tip: UTF-8 is most common for web. ASCII characters use 1 byte, while emojis and special characters can use 2-4 bytes.
Paste or type any text. The counter shows bytes in UTF-8, UTF-16, and ASCII encodings simultaneously, plus character count with and without spaces.
Different encodings produce different byte counts for the same text. UTF-8 is most common on the web. UTF-16 is used internally by JavaScript and Java. ASCII only covers basic English characters.
Many systems enforce byte limits rather than character limits. Database columns (VARCHAR, TEXT), API payloads, and protocol buffers use byte sizes — not character counts.
| Encoding | Bytes per ASCII char | Bytes per emoji | Used In |
|---|---|---|---|
| ASCII | 1 byte | Not supported | Legacy systems, basic English text |
| UTF-8 | 1 byte | 4 bytes | Web (HTML, JSON, CSS), Linux, macOS |
| UTF-16 | 2 bytes | 4 bytes | Windows, Java, JavaScript internally |
| UTF-32 | 4 bytes | 4 bytes | Fixed-width — rarely used in production |
In ASCII text, 1 character = 1 byte. But in Unicode (UTF-8), characters outside the basic ASCII range use multiple bytes. An accented character like é uses 2 bytes. A Chinese character uses 3 bytes. An emoji uses 4 bytes. This means "hello 😊" is 7 characters but 10 bytes in UTF-8. This distinction matters for database column sizes, HTTP Content-Length headers, and any system with byte-based limits.
Common questions about Byte Counter