[buug] recode vs. iconv

Ian Zimmerman itz at buug.org
Tue Dec 11 17:16:03 PST 2012


On Tue, 11 Dec 2012 16:48:09 -0800
Rick Moen <rick at linuxmafia.com> wrote:

Ian> Today I needed to convert some text files from latin1 to plain
Ian> ascii, preferably with transcriptions (e.g. ß into ss, ä into ae,
Ian> and so on). They were large enough that doing it manually was out
Ian> of the question, even in an excellent editor like emacs ;-) I
Ian> remembered that recode and iconv were the two programs potentially
Ian> suitable for the task.

Ian> I tried recode first, but it was a disaster, I couldn't make it
Ian> work despite reading the full fine manual (info version) in detail.
Ian> It would simply error out on any non-ascii character unless I gave
Ian> the --force option (even when I called it not in-place but as a
Ian> filter), and then it would succeed but silently drop some of them
Ian> without substituting anything.

Rick> Back when I was chief copyeditor for _Linux Gazette_ magazine, we
Rick> had one regular columnist who could not be cured of writing
Rick> submissions in Microsoft text editors and _not_ bothering to clean
Rick> up his (alleged) ASCII.  At the time, I did this to compensate for
Rick> the man's inability to write real plaintext, and it worked every
Rick> time:

Rick> $ recode windows-1257..ASCII lg_bytes.html 
Rick> $ tidy -cim lg_bytes.html 
Rick> $ aspell check lg_bytes.html

Mine also had CRLF line endings, but I suppose that was the case with
yours as well.

-- 
Ian Zimmerman
gpg public key: 1024D/C6FF61AD
fingerprint: 66DC D68F 5C1B 4D71 2EE5  BD03 8A00 786C C6FF 61AD
http://www.gravatar.com/avatar/c66875cda51109f76c6312f4d4743d1e.png
Rule 420: All persons more than eight miles high to leave the court.



More information about the buug mailing list