今天,在寫hive的HSQL語句,又是重復(fù)性的計(jì)算pv、uv(不爽),而且還是,算完分類算總類,就比如:算pc端的pv、uv,移動(dòng)端的pv、uv,然后又要計(jì)算總的pv、uv,總的pv還好說,pc+移動(dòng)端就OK了,但uv就得重新排重了,每次遇到這樣的事情就非常不爽,因?yàn)椴荒芸?/p>
今天,在寫hive 的HSQL語句,又是重復(fù)性的計(jì)算pv、uv(不爽),而且還是,算完分類算總類,就比如:算pc端的pv、uv,移動(dòng)端的pv、uv,然后又要計(jì)算總的pv、uv,總的pv還好說,pc+移動(dòng)端就OK了,但uv就得重新排重了,每次遇到這樣的事情就非常不爽,因?yàn)椴荒芸焖僭谝粋€(gè)HSQL中處理(可能自己有點(diǎn)強(qiáng)迫癥吧),于是自己擠出上班時(shí)間測試了幾種不同的寫法,對比效率
好了廢話不多說,上代碼了
<無>
$velocityCount-->
1、以前統(tǒng)計(jì)總量pv,uv和各分類的pv,uv都這么寫也就是 SELECT a.type,a.pv,a.uv FROM ( SELECT type,count(1) as pv,COUNT(distinct(uid))as uv FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' group by type union all SELECT 'all' as type,count(1) as pv,COUNT(distinct(uid))as uv FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' ) a 說明:distinct雖然寫起來挺方便的,但是效率真的太差,建議永遠(yuǎn)不要用distinct 2、然后我們的語句就可以改為: SELECT a.type,sum(pv),count(uid) FROM ( SELECT type,count(1) as pv,uid FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' group by uid,type union all SELECT 'all' as type,count(1) as pv,uid FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' group by uid ) a group by type 這樣雖然效率提高了些,而且我也一直這么用了,有段時(shí)間,但總感覺還是很不爽,總覺得沒有發(fā)揮union all的功能 3、今天才發(fā)現(xiàn),這group by 不能寫在里面,真的嚴(yán)重影響效率,而且按照上面寫job數(shù)量還多,果斷需改: SELECT type,SUM(pv),count(uid) FROM ( SELECT a.type,sum(pv),uid FROM ( SELECT type,1 as pv,uid FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' union all SELECT 'all' as type,1 as pv,uid FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' ) a group by uid,type) b group by type 經(jīng)測試,效率果然杠杠的
聲明:本網(wǎng)頁內(nèi)容旨在傳播知識,若有侵權(quán)等問題請及時(shí)與本網(wǎng)聯(lián)系,我們將在第一時(shí)間刪除處理。TEL:177 7030 7066 E-MAIL:11247931@qq.com