最近通过pip安装一台机器的ansible,真是费劲了周折,总结如下,安装时报
‘ascii’ codec can’t decode byte 0xe2 in position 75: ordinal not in range(128)错误,我没特别管,安装上是装上了,但ansible运行不正常,我在python命令行下,导入paramiko模块,会报这个错误,
ImportError: No module named cryptography.hazmat.backends
明显表示没安装成功cryptography这个模块,然后通过pip install来安装,发现快结束时又报
‘ascii’ codec can’t decode byte 0xe2 in position 75: ordinal not in range(128)
解决如下:
具体export LC_ALL=C含义如下:
LC_ALL
is the environment variable that overrides all the other localisation settings (except$LANGUAGE
under some circumstances).Different aspects of localisations (like the thousand separator or decimal point character, character set, sorting order, month, day names, language or application messages like error messages, currency symbol) can be set using a few environment variables.
You’ll typically set
$LANG
to your preference with a value that identifies your region (likefr_CH.UTF-8
if you’re in French speaking Switzerland, using UTF-8). The individualLC_xxx
variables override a certain aspect.LC_ALL
overrides them all. Thelocale
command, when called without argument gives a summary of the current settings.For instance, on a GNU system, I get:
1234567891011121314151617 $ localeLANG=en_GB.UTF-8LANGUAGE=LC_CTYPE="en_GB.UTF-8"LC_NUMERIC="en_GB.UTF-8"LC_TIME="en_GB.UTF-8"LC_COLLATE="en_GB.UTF-8"LC_MONETARY="en_GB.UTF-8"LC_MESSAGES="en_GB.UTF-8"LC_PAPER="en_GB.UTF-8"LC_NAME="en_GB.UTF-8"LC_ADDRESS="en_GB.UTF-8"LC_TELEPHONE="en_GB.UTF-8"LC_MEASUREMENT="en_GB.UTF-8"LC_IDENTIFICATION="en_GB.UTF-8"LC_ALL=I can override an individual setting with for instance:
123 $ LC_TIME=fr_FR.UTF-8 datejeudi 22 août 2013, 10:41:30 (UTC+0100)Or:
123 $ LC_MONETARY=fr_FR.UTF-8 locale currency_symbol€Or override everything with LC_ALL.
123 $ LC_ALL=C LANG=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 cat /cat: /: Is a directoryIn a script, if you want to force a specific setting, as you don’t know what settings the user has forced (possibly LC_ALL as well), your best, safest and generally only option is to force LC_ALL.
The
C
locale is a special locale that is meant to be the simplest locale. You could also say that while the other locales are for humans, the C locale is for computers. In the C locale, characters are single bytes, the charset is ASCII (well, is not required to, but in practice will be in the systems most of us will ever get to use), the sorting order is based on the byte values, the language is usually US English (though for application messages (as opposed to things like month or day names or messages by system libraries), it’s at the discretion of the application author) and things like currency symbols are not defined.On some systems, there’s a difference with the POSIX locale where for instance the sort order for non-ASCII characters is not defined.
You generally run a command with LC_ALL=C to avoid the user’s settings to interfere with your script. For instance, if you want
[a-z]
to match the 26 ASCII characters froma
toz
, you have to setLC_ALL=C
.On GNU systems,
LC_ALL=C
andLC_ALL=POSIX
(orLC_MESSAGES=C|POSIX
) override$LANGUAGE
, whileLC_ALL=anything-else
wouldn’t.A few cases where you typically need to set
LC_ALL=C
:
sort -u
orsort ... | uniq...
. In many locales other than C, on some systems (notably GNU ones), some characters have the same sorting order.sort -u
doesn’t report unique lines, but one of each group of lines that have equal sorting order. So if you do want unique lines, you need a locale where characters are byte and all characters have different sorting order (which theC
locale guarantees).- the same applies to the
=
operator of POSIX compliantexpr
or==
operator of POSIX compliantawk
s (mawk
andgawk
are not POSIX in that regard), that don’t check whether two strings are identical but whether they sort the same.- Character ranges like in
grep
. If you mean to match a letter in the user’s language, usegrep '[[:alpha:]]'
and don’t modifyLC_ALL
. But if you want to match thea-zA-Z
ASCII characters, you need eitherLC_ALL=C grep '[[:alpha:]]'
orLC_ALL=C grep '[a-zA-Z]'
.[a-z]
matches the characters that sort aftera
and beforez
(though with many APIs it’s more complicated than that). In other locales, you generally don’t know what those are. For instance some locales ignore case for sorting so[a-z]
in some APIs likebash
patterns, could include[B-Z]
or[A-Y]
. In many UTF-8 locales (includingen_US.UTF-8
on most systems),[a-z]
will include the latin letters froma
toy
with diacritics but not those ofz
(sincez
sorts before them) which I can’t imagine would be what you want (why would you want to includeé
and notź
?).- floating point arithmetic in
ksh93
.ksh93
honours thedecimal_point
setting inLC_NUMERIC
. If you write a script that containsa=$((1.2/7))
, it will stop working when run by a user whose locale has comma as the decimal separator:
12345 $ ksh93 -c 'echo $((1.1/2))'0.55$ LANG=fr_FR.UTF-8 ksh93 -c 'echo $((1.1/2))'ksh93: 1.1/2: arithmetic syntax error
Then you need things like:
12345678 #! /bin/ksh93 -float input="$1" # get it as input from the user in his localefloat outputarith() { typeset LC_ALL=C; (($@)); }arith output=input/1.2 # use the dot here as it will be interpreted# under LC_ALL=Cecho "$output" # output in the user's locale
As a side note: the,
decimal separator conflicts with the,
arithmetic operator which can cause even more confusion.- When you need characters to be bytes. Nowadays, most locales are UTF-8 based which means characters can take up from 1 to 6 bytes. When dealing with data that is meant to be bytes, with text utilities, you’ll want to set LC_ALL=C. It will also improve performance significantly because parsing UTF-8 data has a cost.
- a corollary of the previous point: when processing text where you don’t know what character set the input is written in, but can assume it’s compatible with ASCII (as virtually all charsets are). For instance
grep '<.*>'
to look for lines containing a<
,>
pair will no work if you’re in a UTF-8 locale and the input is encoded in a single-byte 8-bit character set like iso8859-15. That’s because.
only matches characters and non-ASCII characters in iso8859-15 are likely not to form a valid character in UTF-8. On the other hand,LC_ALL=C grep '<.*>'
will work because any byte value forms a valid character in theC
locale.- Any time where you process input data or output data that is not intended from/for a human. If you’re talking to a user, you may want to use their convention and language, but for instance, if you generate some numbers to feed some other application that expects English style decimal points, or English month names, you’ll want to set LC_ALL=C:
123456789 $ printf '%g\n' 1e-20,01$ LC_ALL=C printf '%g\n' 1e-20.01$ date +%baoût$ LC_ALL=C date +%bAug
That also applies to things like case insensitive comparison (like ingrep -i
) and case conversion (awk
‘stoupper()
,dd conv=ucase
…). For instance:
12 grep -i i
is not guaranteed to match onI
in the user’s locale. In some Turkish locales for instance, it doesn’t as upper-casei
isİ
(note the dot) there and lower-caseI
isı
(note the missing dot).