MySQL学习笔记--InnoDB存储引擎的索引的问题

显示全部楼层 · 2011-9-3 18:52:38

本帖最后由 icer_repls 于 2011-11-16 21:22 编辑
最近瞅了下《MySQL性能调优与架构设计》这本书。看到索引部分时，自己对InnoDB存储引擎的索引有些疑问。
原书的内容是（page137）：
在InnoDB中有两种形式的索引，一种是cluster形式的主键索引（primary key），另外一种是和其他存储引擎（如MyISAM引擎）存放形式基本相同的普通b-tree索引，这种形式的索引在InnoDB存储引擎中被称为secondary index。主键索引和secondary index的区别在于叶子节点。在primary key中，叶子节点存放的是表的实际数据，不仅仅包含主键字段的数据，还包括其他字段的数据，整个数据以主键有序的排列。而secondary index则和其他普通的b-tree索引没有太大的差异，只是在叶子节点除了存放索引键的相关信息外，还存放了InnoDB的主键值。
所以，在InnoDB中如果通过主键来访问数据效率是相当高的。而如果通过secondary index来访问数据的话，InnoDB首先通过secondary index的相关信息及相应的索引键检索到叶子节点，然后通过叶子节点中的主键值和主键索引来获取相应的数据行。
接着也去查了下手册，没找到专门介绍索引的章节（谁知道，请告诉我一下），但是在调优章节也算是提到了这两种索引：
For InnoDB,a secondary index might cover the selected columns even if the query also selects the primary key because InnoDB stores the primary key value with each secondary index.

1.主键索引和secondary index（应该可以翻译成二级索引吧）的唯一区别就是在于叶子节点，都是采用b-tree实现的，这个我没什么疑问。
2.主键索引的叶子节点不仅仅包含主键字段的信息，还包括其他非主键的所有字段信息。这么做值得吗？这得浪费多大的存储空间啊？索引文件都可能比数据文件要大啊（索引文件除了叶子节点数据外，还有根节点数据，分支节点数据）。估计设计者是想用空间换时间来提高效率。
3.secondary index，除了存放索引键之外，还存放InnoDB的主键值。利用secondary index的时候，按照作者的意思是先扫描secondary index的索引文件，找到目的叶子节点之后，在利用这个叶子节点上的主键值去扫描主键索引，然后找到这行。这不相当于扫描了两个索引文件吗？这效率会要降低多少啊？ MySQL有没有提供一个定位到具体某一行的信息啊？就像oracle中的rowid一样，这样在时间和空间之间有个更好的折中考虑。
以上就是自己对InnoDB存储引擎的索引的一些疑问，谁对这部分有深刻理解，请说说自己的见解，欢迎讨论......

千问 · 2011-9-3 18:52:38

本帖最后由 justlooks 于 2011-11-8 14:20 编辑
2,就是oracle的 index organized table概念，oracle的设计理念在innodb中无处不在（貌似innodb是来自oracle的？？）

千问 · 2011-9-3 18:52:38

justlooks 发表于 2011-11-8 14:18
2,就是oracle的 index organized table概念，oracle的设计理念在innodb中无处不在（貌似innodb是来自oracl ...
哦，原来主键索引就是IOT的概念啊，那就没多大的疑问了。oracle里面确实有很多不错的理念，值得innodb借鉴

千问 · 2011-9-3 18:52:38

哥们你的理解就很深刻了，呵呵
high performance那本书上有详细说明innodb和myisam索引的
Schema Optimization and Indexing
讲得比较详细
2.没有说叶子节点的数据要拉到内存里吧，求高人解释
3.你说对了secodary就是这样走的。但一般primary只有3,4层，所以降低是肯定的，但是内存级别的相对磁盘还是要快n多。mysql没有定位行的，即便主键只指向一行，取得时候也是按页取得。如有错误请指正

千问 · 2011-9-3 18:52:38

本帖最后由 philip_zhong 于 2011-11-8 15:38 编辑
Non-clustered:
The data is present in random order, but the logical ordering is specified by the index. The data rows may be randomly spread throughout the table. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the page and the row number in the data page. In non-clustered index:
The physical order of the rows is not the same as the index order.
Typically created on column used in JOIN, WHERE, and ORDER BY clauses.
Good for tables whose values may be modified frequently.
Clustered:
Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. Therefore, only one clustered index can be created on a given database table. Clustered indices can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected.
Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required. The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, others put two completely different data blocks within the same physical file(s). Create an object where the physical order of rows is same as the index order of the rows and the bottom(leaf) level of clustered index contains the actual data rows.
They are known as "index organized tables" under Oracle database.
详细见：
http://philipzhong.blogspot.com/2011/11/cluster-index-and-non-clustered-index.html

千问 · 2011-9-3 18:52:38

Innodb只有索引没有表，我不久前说的INNODB的致命弱点就是这个！！Oracle把IOT作为一种选择，Innodb把它作为唯一的实现；对大数据量和多列表的性能令人担忧！！！

千问 · 2011-9-3 18:52:38

philip_zhong 发表于 2011-11-8 15:36
Non-clustered:
The data is present in random order, but the logical ordering is specified by the in ...
philip_zhong同学的帖子总是像官方的标准文档，佩服！！

千问 · 2011-9-3 18:52:38

jiwang1980 发表于 2011-11-8 14:37
哥们你的理解就很深刻了，呵呵
high performance那本书上有详细说明innodb和myisam索引的
Schema Optimiz ...
2. 我首先以为所谓的主键索引就是在主键上建立索引（数据文件和索引文件分开），而索引的叶子节点又包含了每一行的全部字段。所以认为浪费了存储空间，这个存储空间是指的物理磁盘的存储空间，不是内存。
但现在知道了主键索引其实就是个聚簇（cluster，IOT实际上就是个聚簇），所以那个叶子节点存全部的字段就很正常了。

千问 · 2011-9-3 18:52:38

@philip_zhong：
cluster和no-cluster这个之前还是有些了解的
cluster：实际上就是表在物理磁盘上的数据存放顺序是和索引顺序一样的，因而任何一个表最多只可能有一种聚簇形式。
no-cluster：当然就是表的物理磁盘上的数据存放顺序不是和索引一样咯，数据的信息是通过额外的索引文件来维护的。
大家如果了解过oracle的话，都会知道rowid这回事，在cluster（也就是IOT中）用的是物理rowid，而no-cluster中用的是逻辑rowid。
@kerlion:
你确定吗？
1.IOT这个其实还是表，只是这个表的数据物理存放顺序有些独特而已（与索引组织顺序一样），但他还是个表的数据文件啊。
2.如果我创建一个表，根本没有建立索引，那么就不存在主键索引吧，这种情况下就是最直白的数据文件了。

千问 · 2011-9-3 18:52:38

本帖最后由 philip_zhong 于 2011-11-8 17:25 编辑
icer_repls 发表于 2011-11-8 16:45
@philip_zhong：
cluster和no-cluster这个之前还是有些了解的
cluster：实际上就是表在物理磁盘上的数据存 ...
@icer_repls,mysql中，mysql innodb的pk默认就是cluster index，并且叶子节点上是挂接数据的，而普通索引是记录pk的真实值的，在mysql中没有oracle的rowid一说，因此使用普通索引的时候，select pk from table where "普通索引"的效率是比oracle高的，默认索引中已经包括了pk的值，这与oracle不同，oracle的普通索引存放的是rowid，因此不存在访问PK索引树的代价，在我们做数据库架构或者设计的时候一定要注意这个点,规避mysql的缺点，使用此优点。
（仔细评下这句：The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, others put two completely different data blocks within the same physical file(s). Create an object where the physical order of rows is same as the index order of the rows and the bottom(leaf) level of clustered index contains the actual data rows.）