urlに含まれる?, #より前の部分だけを取得・列挙したい

MySQL上に「url 列に好き勝手にURLが入ったテーブル master」があります。
その他の列としてid, inserted, nameがあり、insertedとnameが同一となっているレコードも複数存在しています。例えばname=bobが複数のサイト(url）を閲覧した履歴がまとめて登録されて、登録されたレコードは同じ登録日(inserted)になるといった感じです。

ここで、inserted と nameの組み合わせ毎にグルーピングした上で、@aurl, @burl, @curl(中身は文字列で'http://%/default.html'など部分一致用条件)に類似せずtag=0を満たす最小のidを取得します。inserted と nameの組み合わせ自体は複数あるので最初のidも複数得られます。
そうして得られたurlについて?, #以降を割愛した状態で一覧を得たいと思っています。
?, #は入っていないこともありますし、両方はいっていることもあり順不同とします。（数も不定）

以下のようなSQLにて動作はするのですが、もっとエレガントで高速な方法はないでしょうか。
因みに45万レコードからの実行結果は、18.0776 seconds.です。

id(big int), url(text), name(text), inserted(UNIX_TIME=int)にはインデックスが張られています。（textについては256長を指定してインデックス作成）

```SQL
select url from master
where id in (
select min(id) from master
where
url not like @aurl
and url not like @burl
and url not like @curl
and tag = 0
group by inserted, name
)
and url not like '%#%'
and url not like '%?%'
union distinct
select id, inserted, name, left(url, instr(url, '?')-1) from master
where id in (
select min(id) from master
where
url not like @aurl
and url not like @burl
and url not like @curl
and tag = 0
group by inserted, name
)
union distinct
select id, inserted, name, left(url, instr(url, '#')-1) from master
where id in (
select min(id) from master
where
url not like @aurl
and url not like @burl
and url not like @curl
and tag = 0
group by inserted, name
)
and url like '%#%' and url not like '%?%'
;

3つの結果をUNIONしているので18秒ほど掛かっていて、UNIONせずにまとめて列挙できれば高速化できると想像しているのですが、replaceとregexpを組み合わせるような使い方が出来ないためこうしています。
left関数の辺りにたくさんif文を繋げれば出来そうな気もしますが見た目に悪くメンテナンスしづらそうです（もっと完結に書けないでしょうか）。
not likeの周りもregexpで1つにまとめようとしましたが、こちらは逆に遅くなりました（3倍ほど）。

kunai

2016/09/21 01:07

SQLだけで完結せず、外部のプログラム(PHPやPython等)を使うという選択肢はNGなのでしょうか。

cnx

2016/09/21 05:33

こちらで変更できるのがSQLだけなもので、PHPなどを使わずに出来ればと思っています。

行動規範の内容に同意します

回答2件

ベストアンサー

ここで、inserted と nameの組み合わせ毎にグルーピングした上で、@aurl, @burl, @curl(中身は文字列で'http://%/default.html'など部分一致用条件)に類似せずtag=0を満たす最小のidを取得します。

まず、目的の id を抽出してから、url を変換すれば、UNION を使う必要はないのでは？

そうして得られたurlについて?, #以降を割愛した状態で一覧を得たいと思っています。

url変換も式を工夫すればシンプルにできます。
url に検索文字をあえて付加しておくことで、ifによる場合分けを不要にしてます。

SQL
1SELECT 
2  left(url, instr(concat(left(url, instr(concat(url, '?'), '?')-1) ,'#'), '#')-1) url
3FROM master
4WHERE
5  id in (
6    SELECT
7      min(id)
8    FROM
9      master
10    WHERE
11      url not like @aurl
12      and url not like @burl
13      and url not like @curl
14      and tag = 0
15    GROUP BY
16      inserted, name
17);

追記
CASE式を使ったほうが読みやすいかも。

SQL
1SELECT 
2 CASE
3   WHEN url like '%#%' THEN left(url, instr(url, '#')-1)
4   WHEN url like '%?%' THEN left(url, instr(url, '?')-1)
5   ELSE  url END url
6FROM master
7　以下略