Converting all files to UTF-8 ?

Klaus Alexander Seistrup kseistrup at gmail.com
Tue Dec 28 21:12:46 UTC 2004


On Tue, 28 Dec 2004 21:22:10 +0100, Vincent Trouilliez
<vincent.trouilliez at wanadoo.fr> wrote:
> Okay.
> 
> So, now, how can I use iconv to convert all my file names ?
> 
> 1) it doesn't look like iconv has a 'recursive' option to process
> automatically sub-folders.

Use find(1) to traverse current directory (.) recursively:

find . -type f -print \
| while read oldName
  do
    newName="$(echo \"${oldName}\" | iconv -f iso-8859-1 -t utf-8)";
    mv "${oldName}" "${newName}";
  done

The expression above assumes that the input charset is iso-8859-1 and
that the output charset is utf-8.

However...,

> 2) it needs to know the original encoding format... how the hell do I
> know that ? Files are many thousands, over 10 year old and come from
> CDs, internet, MS-DOS, Win95,  Win XP etc, they probably don't use the
> same format ! :-/

... the input format seems to be a problem?!

I assume some of the filenames needn't be converted.  Those with ASCII
chars only, can be moved without conversion.  At least this saves some
manual work.  You can use

echo "${oldName}" \
| iconv -f iso-8859-1 -t us-ascii 2>/dev/null \
|| echo "${oldname}"

to test whether a filename needs conversion, and

echo "${oldName}" \
| iconv -f iso-8859-1 -t us-ascii 2>/dev/null \
&& echo "${oldname}"

to test whether a filename needn't be converted.

Or perhaps I just make the whole thing much more complicated than it is... ;-)

> Sure must be a solution, if UTF-8 really becomes the norm, then millions
> of people are facing the same problem as I am now. So there has to be a
> solution...

Another thing is . . . you wrote earlier that your filenames look like
"brûlée" when they should look like "brûlée", right?  But filenames
like "brûlée" are "utf-8 encoded and viewed in iso-8859-1".

-- 
Klaus Alexander Seistrup
SubZeroNet · Copenhagen · Denmark




More information about the ubuntu-users mailing list