why utf-8 seems broken sometimes#
The programming world most of the time does not really care about characters which are special in other languages others than englisch. So in Germany we have our all time favorites
ÄÖÜäüöß to take care about everytime we work with text documents.
Normally you will have no trouble with text editors, but an burning example is Microsoft Office. To correctly import an utf-8 encoded csv document (with special chars) you have to provide an BOM. If you don’t add an BOM Mircosoft will handle the file as it would be ASCII-encoded.
// utf-8 fs.writeFileSync(filename, "\ufeff" + content);
This is strange because Wikipedia says with utf-8 it should be
0xEFBBBF for utf-8.
But Mozilla Developer states:
0xFEFF: Used at the start of the script to mark it as Unicode and the text’s byte order. Mozilla Developer
But anyway. Node handels it for us and now we you can import the csv file in Microsoft Office with correct encoding.
extra round regex#
If I talk about this topic, I should shortly add how this things work with regex.
Instead of using
[A-Z]we have to use
[A-ZÄÜÖß]. Keep this in mind if you work a lot with text. You can try this here https://regexr.com/