MySQL Character Sets & Collations
Check
Table collation...
SHOW TABLE STATUS LIKE 'mytable'\G
Table character set...
SHOW CREATE TABLE mytable\G
Instance default character sets...
status
Connection id: 48486048Current database:Current user: root@localhostSSL: Not in useCurrent pager: stdoutUsing outfile: ''Using delimiter: ;Server version: 8.0.23-commercial MySQL Enterprise Server - CommercialProtocol version: 10Connection: Localhost via UNIX socketServer characterset: utf8mb4Db characterset: utf8mb4Client characterset: utf8mb4Conn. characterset: utf8mb4UNIX socket: /mysql/mysql.sockBinary data as: HexadecimalUptime: 559 days 2 hours 13 min 34 sec
Threads: 5 Questions: 325482610 Slow queries: 7908120 Opens: 1698 Flush tables: 3 Open tables: 1370 Queries per second avg: 6.737
In this case the mismatch is down to the difference in default character set between MySQL 5.7 and 8.0 when the table is created from the same DDL which explictly sets trhe table character set to utf8
Database character set and collation...
USE mydatabase;
SELECT @@character_set_database,
@@collation_database;
SELECT default_character_set_name,
default_collation_name
FROM information_schema.schemata
WHERE schema_name = 'mydatabase';
To see available character sets and their default collation...
SHOW CHARACTER SET;
To see all available collations for a character set...
SELECT collation_name,
pad_attribute
FROM information_scheme.collations
WHERE character_set_name = 'utf8mb4';
SHOW COLLATION WHERE Charset = 'utf8mb4';
Character Sets
Unicode...
utf8mb4
utf8mb3
utf8
ucs2
utf16
utf16le
utf32
uses 1 to 4 bytes per character
uses 1 to 3 bytes per character (deprecated in MySQL 8.0)
alias for utf8mb3 (may eventually become an alias for utf8mb4)
uses 2 bytes per character (deprecated in MuSQL 8.0.28)
use 2 to 4 bytes per character
Little-endian version of utf16
uses 4 bytes per character
Collation
Some notes about collation in MySQL...
In collation names, you will see the following suffixes...
_ai
_as
_ci
_cs
_bin
Accent-insensitive
Accent-sensitive
Case-insensitive
Case-Sensitive
Binary
National Character Set
Default is utf8 in MySQL 5.7 (and utf8 is an alias for utf8mb3)
Default is utf8mb4 in MySQL 8.0 (and utf8mb3 is deprecated. utf8 is still an alias for utf8mb3 but is eventually expected to become an alias for utf8mb4)
These statements are equivalent...
SELECT N'mytext';
SELECT n'mytext';
SELECT _utf8'mytext';
Change
Database
ALTER DATABASE mydatabase CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
Table
ALTER TABLE mytable CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Column
ALTER TABLE mytable CHANGE mycolumn VARCHAR(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Bibliography
8.0https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.htmlhttps://dev.mysql.com/doc/refman/8.0/en/charset-charsets.htmlhttps://dev.mysql.com/doc/refman/8.0/en/charset-restrictions.htmlhttps://dev.mysql.com/doc/refman/8.0/en/charset-collation-compatibility.html
5.7https://dev.mysql.com/doc/refman/5.7/en/charset-general.htmlhttps://dev.mysql.com/doc/refman/5.7/en/charset-mysql.htmlhttps://dev.mysql.com/doc/refman/5.7/en/charset-syntax.htmlhttps://dev.mysql.com/doc/refman/5.7/en/charset-database.htmlhttps://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html