求助 oracle like%..%模糊查询优化

显示全部楼层 · 2014-4-16 06:00:13

公司做交通项目，有一功能模糊查询车牌号。
1、需求：
一亿数据模糊查询（like%..%）3秒内得出结果。
2、查询sql：
select t.rowid
from
(select rowid
from clxx
wherejgsj >= to_date('2014-09-15 00:00:00','yyyy/mm/dd HH24:MI:SS')
and jgsj 现在查询走组合索引，用时40秒。
4、问题
一亿数据模糊查询（like%..%）3秒内得出结果，这一需求能不能实现
不管是硬件或oracle本身优化能否提速，
项目经理非让不管什么方案必须拿出一个，小弟刚接触oracle没有工作经验，望各位大神不吝赐教。
再次感谢！

千问 · 2014-4-16 06:00:13

If you allow % to be at the beginning of the string (e.g. "hphm like '%闽..."), then a regular B-Tree index won't work. You need a text index, like this
create index indctx_clxx on clxx (hphm) indextype is ctxsys.context;
The other text index type is ctxsys.ctxcat.

千问 · 2014-4-16 06:00:13

Yong Huang 发表于 2014-9-30 23:37
If you allow % to be at the beginning of the string (e.g. "hphm like '%闽..."), then a regular B-Tre ...
谢谢！过十一回家没有网没来的急看，回复的有点晚，抱歉！
test index效率怎么样？hphm一共就那么9个字符。

千问 · 2014-4-16 06:00:13

大家帮忙看一下，确定下需求是否能实现。

千问 · 2014-4-16 06:00:13

Yong Huang 发表于 2014-9-30 23:37
If you allow % to be at the beginning of the string (e.g. "hphm like '%闽..."), then a regular B-Tre ...
版主您好，我建了全文索引。
但是像“皖DSYQOW”这样的车牌号怎么分词？
像车牌号好多字母和数字的组合，分词怎么分？
希望版主指点一下，非常感谢！

千问 · 2014-4-16 06:00:13

1、分区是什么分区？月分区？周分区？天分区？
2、现在的情况是，可能走索引也可能不走（因为没看到执行计划），通过时间条件查出相应时段的行，然后通过like条件过滤，最后完成排序出结果，这样快不了；另外，like后面字符串（'%闽KWHWTQ%'）开始有必要加个“%”号吗，这是查车牌，我看”闽“字儿已经出现了，一般车牌号第一个字就是汉字？如果能去掉最前面那个”%“号，比这个会快很多。
3、此外，可以通过倒叙扫描索引（或在时间字段 jgsj上建立倒序索引），也可能会快很多，可以试试。

千问 · 2014-4-16 06:00:13

sqysl 发表于 2014-10-9 12:27
1、分区是什么分区？月分区？周分区？天分区？
2、现在的情况是，可能走索引也可能不走（因为没看到执行计 ...
谢谢您的回复！
1、按jgsj分区按月，一月3000w（有的地方9000w）
2、执行计划是走的组合索引，
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=HEXTORAW('878DF6F0E7C3C3FF')AND

SYS_OP_DESCEND("JGSJ")=TO_DATE

(' 2014-01-15 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND

SYS_OP_UNDESCEND(SYS_OP_DESCEND("JGSJ")) 我建了全文索引。但是像“皖DSYQOW”这样的车牌号怎么分词？
Somebody with text indexes on the Chinese language can help. Sorry. My experience has always been with English, or any language with clear word boundaries, e.g. in "Hello world", a text index is built with two tokens "hello" and "world". With Chinese such as “皖DSYQOW”, I'm not sure how it's broken down. Maybe each character is one token? I'd love to be educated.
In the meantime, make sure your query is changed to using the appropriate syntax. For example, if you created a context index, the syntax to use it is
where contains(col,'keyword') > 0
If you created a ctxcat index (which I believe is better and does not need periodic manual synchronization after DML), the syntax is
where catsearch(col,'keyword',null) > 0
The difference between these two types is described at
http://docs.oracle.com/cd/E11882 ... /csql.htm#CCREF0105
> like%..%是必须的因为要模糊查询的时候可能是车牌号中间部分
The data that is inputted must be structured. Allowing users to enter free text causes a relational database to suffer in performance. There's no better solution than stopping the free text input and, if there's already some in the database, cleaning up. Once you do that, follow sqysl's suggestion to drop the leading % in the where-clause.
You can allow users to enter free text in a column such as COMMENT or ADDITIONAL_INFO. But the values in those columns are usually not searched, but simply displayed when *other* columns are searched.