+ -
当前位置:首页 → 问答吧 → TXT档案,当使用Unicode 或 UTF-8储存时, 於兼容度, 使用上, 有甚么不同?

TXT档案,当使用Unicode 或 UTF-8储存时, 於兼容度, 使用上, 有甚么不同?

时间:2014-08-01

来源:互联网


TXT档案,当使用Unicode 或 UTF-8储存时, 於兼容度, 使用上,

实际上有甚么不同?

我发现如果用 ANSI , 有些中文字会变成「?问号」,
Unicode 和 UTF-8, 分别在於甚么?



我参考了以下,两者的字库量,是不是一样:



http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html



[ 本帖最后由 黄金太过重要 於 2014-7-29 07:28 AM 编辑 ]

作者: 黄金太过重要   发布时间: 2014-08-01

The terms unicode and unicode big endian in this case is a misnomer. They really mean UCS-2 little and big endian. UCS-2 can only encode up to about 64k characters but the unicode standard is now more than 110k (including the latest additions the emojis.) However, UCS-2 encompasses most common Chinese characters already. UTF-8 can encode all the defined unicode characters with a lot of room to spare.

Bottom line, for true compatibility you should choose UTF-8. Saving as ANSI would give you a lot of question marks and strange characters. In practice if you only use Chinese, there's usually little difference between "unicode" and "utf-8." But some characters like emojis will only be saved correctly in utf-8 in notepad.

作者: chantaiman44264   发布时间: 2014-08-01

用咗中文字冇乱码囉

作者: cvguser   发布时间: 2014-08-01

引用:原帖由 chantaiman44264 於 2014-7-29 02:29 PM 发表
The terms unicode and unicode big endian in this case is a misnomer. They really mean UCS-2 little and big endian. UCS-2 can only encode up to about 64k characters but the unicode standard is now more than 110k (including the latest additions the emojis.) However, UCS-2 encompasses most common Chinese characters already. UTF-8 can encode all the defined unicode characters with a lot of room to spare.

Bottom line, for true compatibility you should choose UTF-8. Saving as ANSI would give you a lot of question marks and strange characters. In practice if you only use Chinese, there's usually little difference between "unicode" and "utf-8." But some characters like emojis will only be saved correctly in utf-8 in notepad.
1. Is this doesn't matter to choose any of the format when saving because it will alert warning if we choose the wrong the saving format?


参考: http://ez2learn.com/basic/unicode.html#id3

2. unicode is in the centre of the chart, is unicode more standard than UTF-8 ? Why there is a UTF-16 exist in there?


3. Is " ASCII " same as " ASCI " ?


4. Is " unicode (UTF-8) " same as " UTF-8 " ?
why there are two " unicode (UTF-8) " in the following?


[ 本帖最后由 黄金太过重要 於 2014-7-29 08:16 PM 编辑 ]

作者: 黄金太过重要   发布时间: 2014-08-01

引用:原帖由 黄金太过重要 於 2014-7-29 20:14 发表
1. Is this doesn't matter to choose any of the format when saving because it will alert warning if we choose the wrong the saving format?
Depends on the software you use, some might warn you and some might not. Better be safe and actually choose the correct encoding.
引用:2. unicode is in the centre of the chart, is unicode more standard than UTF-8 ? Why there is a UTF-16 exist in there?
The diagram is a little misleading because it conflates two concepts.

The first is code page. Computers deal with numbers. There are many ways to map alphabets and characters to numbers, and the most universal one is Unicode. Big5 is a popular way to map traditional Chinese and GB and HZ are popular ways to map simplified Chinese. ASCII is such a map for English. There are other mappings, known as code pages such as Shift JIS and KS for languages other than English, and some of the maps overlap. Software such as browsers use hints to determine how to map the numbers back to the characters. When they are correct you see the right content, and when they guess wrong you see monster characters and question marks. The most universal map is Unicode, which encompasses most character sets and is evolving to include new "character sets" like emojis and some very very very rarely used Chinese characters.

The second concept is how the numbers are encoded to be stored in computers and transmitted over the wire. UTF-8 is the most widely used and accepted one. If it doesn't specify one, that means it's stored as is, which could contribute to them not being saved or displayed correctly if there are mismatches.
引用:
3. Is " ASCII " same as " ASCI " ?
I haven't seen ASCI. ANSI, on the other hand, are a number of Windows standard code pages for non-Western European languages that are supersets of ASCII.
引用:
4. Is " unicode (UTF-8) " same as " UTF-8 " ?
why there are two " unicode (UTF-8) " in the following?
I don't know why it has two. Are you using Windows XP? The world was a very fragmented place in 2001 and XP has to cover all bases, sometimes poorly. Modern Windows are a lot cleaner.

作者: chantaiman44264   发布时间: 2014-08-01