近日,連續收到ASM磁盤dismount,并且是錯誤“Waited 15 secs for write IO to PST”的問題,這是ASM特有的心跳超時檢測,ASM instance會定期檢查每個asm disk是不是能正常反
近日,連續收到ASM磁盤dismount,并且是錯誤“Waited 15 secs for write IO to PST”的問題,這是ASM特有的心跳超時檢測,ASM instance會定期檢查每個asm disk是不是能正常反饋。所以決定針對這個問題,做個小總結。
在文檔ASM diskgroup dismount with "Waited 15 secs for write IO to PST" (Doc ID 1581684.1) 中有下面一段描述:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generally this kind messages comes in ASM alertlog file on below situations,
Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,
thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.
By the way the heart beat delays are sort of ignored for external redundancy diskgroup.
ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,
but the heart beat delays do not dismount external redundancy diskgroup directly.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
上面描述,可以理解為下面幾點:
1. ASM實例會定期檢查每一個磁盤組的磁盤狀態,是否通信正常;
2. 這個檢查,只是針對normal和high冗余模式,對于external冗余,不會遇到這個錯誤;
3. 默認情況是15s超時,也就是說15s磁盤組還是沒有對ASM實例響應的話,就會dismount磁盤組。
而遇到這個問題的客戶,都是使用光纖網絡存儲,在存儲網絡出現問題的情況下,會引發這個錯誤的出現。也就是說,在ASM定期發出檢查信息的時候,如果磁盤沒有在15s內反饋的話,我就認為磁盤已經無法訪問。
針對這個錯誤,我嘗試在測試環境測試,由于測試環境是VMware的虛擬機,在物理層面刪除磁盤,并不會引發這個問題。原因是在同一個主機上的磁盤被異常刪除后,ASM的讀取操作會立即返回系統層面的IO錯誤,而不需要去等待錯誤“Waited 15 secs for write IO to PST”的超時。
所以,我總結這個錯誤,只會出現在共享的ASM磁盤,不在物理主機的本地,而是在存儲網絡中,ASM發出去的檢測信息,不能及時被反饋,才會出現這個錯誤。這時,可能是存儲主機,存儲網絡,甚至存儲磁盤的問題,anyway,我ASM沒有收到我需要的確認信息,我認為你有問題,如果有問題的磁盤數夠多,達到影響數據完整性了,那我ASM就要dismount這個磁盤組了。
這里對于“Waited 15 secs for write IO to PST”錯誤信息,根據文檔1581684.1介紹,是在11.2.0.3.0之后出現的。同時在文檔中有描述,如何手動修改這個檢測超時的時間,可以通過參數_asm_hbeatiowait來控制:
alter system set "_asm_hbeatiowait"=
<需要重啟ASM/CRS來時修改生效。>
為了確認,這個參數是在11.2.0.3之后出現的,我將全部數據庫版本都查詢一遍,具體可以參考下面信息:
======================10.2===================== SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Prod PL/SQL Release 10.2.0.5.0 - Production CORE 10.2.0.5.0 Production TNS for Linux: Version 10.2.0.5.0 - Production NLSRTL Version 10.2.0.5.0 - Production SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%undo%' order by ksppinm; hidden parameter value -------------------------------------------------------------------------------- ---------- _asm_acd_chunks 1 _asm_allow_only_raw_disks TRUE _asm_allow_resilver_corruption FALSE _asm_ausize 1048576 _asm_blksize 4096 _asm_direct_con_expire_time 120 _asm_disk_repair_time 14400 _asm_droptimeout 60 _asm_emulmax 10000 _asm_emultimeout 0 _asm_fob_tac_frequency 3 hidden parameter value -------------------------------------------------------------------------------- ---------- _asm_instlock_quota 0 _asm_kfdpevent 0 _asm_libraries ufs _asm_maxio 1048576 _asm_skip_resize_check FALSE _asm_stripesize 131072 _asm_stripewidth 8 _asm_wait_time 18 _asmlib_test 0 _asmsid asm 21 rows selected. ======================11.2.0.1===================== sqlplus / as sysdba Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%asm_hb%' order by ksppinm; hidden parameter value -------------------------------------------------------------------------------- _asm_hbeatwaitquantum 2 ======================11.2.0.2===================== $ sqlplus / as sysdba Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - bit Production With the Partitioning, Oracle Label Security, OLAP, Data Mining and Real Application Testing options SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%asm_hb%' order by ksppinm; hidden parameter value -------------------------------------------------------------------------------- _asm_hbeatwaitquantum 2 在11.2.0.3.0之后才有這個參數出現,也就是說ASM實例對磁盤超時的檢測是在11.2.0.3之后才出現的 ======================11.2.0.3===================== sys@R11203> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - bit Production SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%undo%' order by ksppinm; hidden parameter value hidden parameter value -------------------------------------------------- -------------------- _asm_hbeatiowait 15 _asm_hbeatwaitquantum 2 ======================11.2.0.4===================== SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - Production SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%undo%' order by ksppinm; hidden parameter value -------------------------------------------------------------------------------- --------- _asm_hbeatiowait 15 <<<<<<<<<<<<<<<<<<<< _asm_hbeatwaitquantum 2 ======================12.1.0.1===================== $ sqlplus / as sysdba Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - bit Production With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%asm_hb%' order by ksppinm; hidden parameter value -------------------------------------------------------------------------------- _asm_hbeatiowait 15 _asm_hbeatwaitquantum 2 在12.1.0.2之后,這個參數默認值被調整為120s ======================12.1.0.2===================== $ sqlplus / as sysdba Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - bit Production With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options SQL> select ksppinm as "hidden parameter", ksppstvl as "value" from x$ksppi join x$ksppcv using (indx) where ksppinm like '\_%' escape '\' and ksppinm like '%asm_hb%' order by ksppinm; hidden parameter value -------------------------------------------------------------------------------- _asm_hbeatiowait 120 _asm_hbeatwaitquantum 2
希望總結的這個知識點,對你有幫助。日常中,經常感嘆,這個問題很簡單,但是不sure,測試過后,記錄下來,以備查詢。
聲明:本網頁內容旨在傳播知識,若有侵權等問題請及時與本網聯系,我們將在第一時間刪除處理。TEL:177 7030 7066 E-MAIL:11247931@qq.com