Assuming every text file is UTF-8
Many legacy tools and Windows workflows still create UTF-16, Windows-1252, or BOM-marked files. Check encoding before blaming the importer.
Detect text file encoding and Byte Order Mark details before importing CSVs, logs, source files, translations, or customer exports. Checks UTF-8, UTF-16, ASCII, BOM presence, confidence, and privacy-safe browser processing.
Continue with a related workflow or open the next tool that usually follows this task.
Use this workflow when a spreadsheet export needs to become API-ready JSON, a fixture, or a structured dataset for debugging.
OpenRelated toolConvert CSV and JSON locally with headers, delimiters, and validation.
OpenUpload or drag a text, CSV, log, source, or export file into the detector.
Review the detected encoding, confidence score, and BOM result.
Compare the result with the expected import target, such as UTF-8 without BOM or UTF-16LE.
If text is garbled, reopen or convert the file using the detected encoding in your editor or data pipeline.
Use the result before bulk imports, migrations, or customer data cleanup.
Identify whether a broken file is likely UTF-8, UTF-16, ASCII, or a legacy encoding candidate before reopening it with the correct decoder.
Inspect exported CSV files before loading them into spreadsheets, databases, ETL jobs, or customer support systems that expect a specific encoding.
Find files with BOM markers or non-UTF-8 encodings before converting a repository or documentation set to a consistent encoding.
Review encoding clues when names, accents, Korean, Japanese, Chinese, or symbols appear corrupted after export or upload.
Many legacy tools and Windows workflows still create UTF-16, Windows-1252, or BOM-marked files. Check encoding before blaming the importer.
A UTF-8 BOM can be harmless in some tools but break headers, scripts, or CSV field names in others. Confirm whether the target accepts BOM markers.
Some legacy encodings share similar byte patterns, so detection is a best-effort signal. Verify suspicious files with a small preview before bulk conversion.
Detect BOM and encoding before importing customer data into a database or spreadsheet.
customers-export.csv
Bytes start with: EF BB BF 69 64 2C 6E 61 6D 65Encoding: UTF-8
BOM: Yes
Confidence: High
Next step: remove BOM if the importer treats the first header as id.Use encoding clues to reopen a file that shows mojibake after export.
support-log.txt
Preview: 안녕하세요Likely issue: UTF-8 bytes decoded as a legacy single-byte encoding
Next step: reopen the file as UTF-8.The detector reads raw bytes in the browser and checks for known Byte Order Mark sequences such as UTF-8, UTF-16LE, and UTF-16BE markers at the start of the file.
It validates byte patterns against common Unicode encodings and ASCII-compatible ranges, then reports a confidence score rather than claiming perfect certainty for every legacy encoding.
Encoding detection is most reliable for BOM-marked files and valid UTF-8 or UTF-16 data. Similar single-byte encodings can require human review or a known sample of expected text.
A. No. The file is read and analyzed in your browser. The raw bytes do not need to leave your device, which is useful for logs, exports, and customer files.
A. No detector can guarantee every legacy encoding from bytes alone. UTF-8, UTF-16, ASCII, and BOM markers are much more reliable than close single-byte encodings.
A. BOM means Byte Order Mark. It is a hidden byte sequence at the start of a file that can identify UTF-8 or UTF-16 variants and sometimes affects CSV headers or scripts.
A. The import tool may be reading the file with a different encoding than the one used to create it. Detect the source encoding, then import or convert with that encoding selected.
A. Remove it only if the target system mishandles it. Some Windows tools add BOM markers, while some scripts and CSV importers treat the marker as part of the first field name.
Use these focused guides when you need a practical workflow before opening the tool.
Use this workflow when a spreadsheet export needs to become API-ready JSON, a fixture, or a structured dataset for debugging.
Workflow guideUse this workflow when a spreadsheet export, API fixture, or CSV/JSON handoff fails because columns shift, headers disappear, characters look broken, or JSON will not parse.
Workflow guideUse this workflow when text looks garbled, CSV headers behave strangely, or an import job requires a known encoding such as UTF-8, UTF-16, ASCII, or a BOM-free file.
Workflow guideUse this workflow when copied text includes line-start numbers such as 1., (2), [3], 4:, or 5- and you need clean text for scripts, tickets, documentation, publishing, or further processing.
Explore more developer tools
Browse All ToolsAbout Character Set Detector Upload a file to detect its character encoding. This tool analyzes the raw bytes to identify UTF-8, UTF-16, or ASCII content.