Developer Workflow

Detect text file encoding before import

Check file encoding and BOM details before importing CSVs, logs, source files, or multilingual exports into spreadsheets, databases, and ETL jobs.

Problem

CSV files, logs, source files, and customer exports can be created by different operating systems and tools. If the importer assumes the wrong encoding, names, symbols, Korean, Japanese, Chinese, accents, or even the first CSV header can become corrupted.

When to use this

  • A CSV, log, source file, or customer export displays garbled text after upload or import.
  • You need to know whether a file is UTF-8, UTF-16, ASCII, or BOM-marked before processing it.
  • A database, spreadsheet, or ETL job rejects a file or treats the first header as a strange value.

Steps

  1. Step 1

    Upload the source file

    Drop the file into the character set detector before editing or converting it so the first analysis reflects the original bytes.

  2. Step 2

    Check the detected encoding and confidence

    Review whether the file appears to be UTF-8, UTF-16, ASCII, or another likely encoding. Treat low-confidence results as a signal to preview a few records manually.

  3. Step 3

    Inspect the BOM result

    If a Byte Order Mark is present, confirm whether the destination system accepts it. Some CSV importers treat BOM bytes as part of the first field name.

  4. Step 4

    Convert or clean before import

    Use the detected encoding in your editor, remove BOM when required, and then continue with CSV conversion or text cleanup.

Example

Check a CSV export with a hidden BOM

Input

customers.csv
First bytes: EF BB BF 69 64 2C 6E 61 6D 65
Header preview: \uFEFFid,name,email

Output

Encoding: UTF-8
BOM: Yes
Action: remove BOM or configure the importer before loading the CSV.

Common mistakes

Assuming UTF-8 without checking

Modern systems often use UTF-8, but legacy exports and Windows tools can still create UTF-16, BOM-marked UTF-8, or other encodings.

Treating detection as perfect certainty

BOM and valid Unicode patterns are strong signals, but close legacy encodings can be ambiguous. Always preview suspicious rows before a bulk import.

FAQ

What does BOM mean in a text file?

BOM means Byte Order Mark. It is a hidden byte sequence at the beginning of a file that can identify Unicode variants such as UTF-8 or UTF-16.

Why does a CSV header start with a strange character?

The file may contain a UTF-8 BOM and the importer may be treating it as part of the first header. Detect the BOM, then remove it or configure the importer.

Is the uploaded file sent to a server?

No. The detector reads the file locally in your browser, which is useful when checking logs, exports, or customer files.