一開始設計這架構時,就排除使用remote agent的想法
因為在專業分工的組織下,要求安裝新軟體於既有系統
是一件不太可行的方案,既然身為DBA
就只能把資料庫instance當作是一個最大的agent
想辦法在資料庫內做到我想做的事情
即使Nagios Core 本身就有NSCA或NPRE等
甚或整合SNMP
我也一律不考慮使用
如果在資料庫session內做不到
就透過Nagios plugin check_by_ssh
來透過ssh 連線到target主機執行我要執行的指令
與其說Ngaios是個監控系統 |
不如說我是把它拿來幫我執行日常檢核作業自動化軟體 |
DB Genres | 監控項目 | 解說 |
Postgres | query over threshold | current_timestamp - pg_stat_activity.query_start |
WAIT QUERY | pg_stat_activitywaiting | |
connection count | count pg_stat_activity | |
wrap transaction id | 200000000 - age(pg_database.datfrozenxid) | |
Disk Usage | check_by_ssh plugin call remote script | |
Log Check | check_by_ssh plugin call remote pgbadger | |
MSSQL | Idle Session With Open TXN | sys.dm_exec_sessions, sys.dm_tran_session_transactions超過5天的idle session,但仍帶有open transaction |
DBDSKSPACE CHECK | MASTER..xp_fixeddrives資料檔所在的磁碟槽位小於5GB | |
FGSPACE CHECK | sys.filegroups, sys.database_filesFILE GROUP 使用率監控 使用率>80% and 可用空間<10G | |
LOG CHECK | master.dbo.xp_readerrorlog掃描偵測系統紀錄是否有Failed、Error等字串 | |
Password Expired CHECK | LOGINPROPERTY(sys.sql_logins.name,'DaysUntilExpiration') 密碼到期日 < 6天 | |
query over threshold | sys.dm_exec_requests,sys.databasesover 10mins | |
Transaction ReplicationLogReader Latency | master.dbo.sp_replcounters | |
Transaction Replication Dist to sub Latencyd | dbo.sp_replmonitorsubscriptionpendingcmds | |
Oracle | Tablespace Check | dba_free_space,dba_data_filesFILE GROUP 使用率監控 使用率>80% and 可用空間<10G |
ASM DiskGroup Usage | check_by_ssh plugin call remote script | |
query over threshold | expect , oratop -bn1 | |
Disk Usage | check_by_ssh plugin call remote script | |
Log Check | check_by_ssh plugin call remote tfactl analyze or adrci | |
RAC CRS or HAS Status | check_by_ssh plugin call remote crsctl command |