Postgres 是一個幾乎把所有訊息都寫到系統記錄檔的RDBMS,所以在日常維運上,不透過工具來分析log內容,我還真不知道該怎麼管理我手上的postgres。我用的工具是pgbager-3.5 ,有興趣的,去問一下谷大哥,它一定會告訴你相關資訊。打開pgbager報表後,最上頭就是可供讀取的報表
pgBadger: PostgreSQL log analyzer
Normalized reports are marked with a "(N)".
每份報表都是用pipe符號來切開,不太可能日檢時查看每個部份的報表,就個人的經驗,只會看底下這幾個報表的資訊 :
- Hourly statistics
- Most frequent waiting queries (N)
- Queries that waited the most
- Slowest queries
- Most frequent events (N)
礙於報表列的部份用截圖的方式像素在這裡太難表達出我想傳達的意念,所以我把上述幾個報表的html code直接抓下來修改後貼上,你用桌機的螢幕看pbagder報表時,當然不會看到這麼鱉的畫面。
我們來看報表吧~~ 很直覺啦 ~~
Most frequent events (N) ^
Rank | Times reported | Error | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
5,761
|
FATAL: sorry, too many clients already
| |||||||||||||||
2 |
395
|
ERROR: could not serialize access due to concurrent update
| |||||||||||||||
3 |
343
|
ERROR: duplicate key value violates unique constraint "..."
|
Most frequent waiting queries (N)^
Rank | Count | Total wait time (s) | Min/Max/Avg duration (s) | Query |
---|---|---|---|---|
1 | 2 | 16.695s | 8.321s/8.374s/4.187s |
INSERT into public.trans_log (card_nbr,header_termino,merchant_id,message_type,process_code,prog_name,receive_date,resp_code,rrn,system_ip,term_id,trans_date,id )
VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$10,$10,$10); |
Queries that waited the most^
Rank | Wait time (s) | Query |
---|---|---|
1 | 8.374s |
INSERT into public.trans_log (card_nbr,header_termino,merchant_id,message_type,process_code,prog_name,receive_date,resp_code,rrn,system_ip,term_id,trans_date,id )
VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$10,$10,$10); |
2 | 8.321s |
INSERT into public.trans_log (card_nbr,header_termino,merchant_id,message_type,process_code,prog_name,receive_date,resp_code,rrn,system_ip,term_id,trans_date,id )
VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$10,$10,$10); |
Slowest queries ^
Rank | Duration (s) | Query |
---|---|---|
1 | 14.598s |
SELECT count (* ) FROM (
SELECT DISTINCT s_time, e_time, request_id, STATUS FROM common_status ) WHERE STATUS = 'Success' AND to_char(s_time,'yyyy/MM/dd' ) = to_char(now) - interval'1 days','yyyy/MM/dd' ); |
2 | 14.397s |
SELECT fun_id, count (fun_id )
FROM ( SELECT fun_id, s_time FROM ( SELECT DISTINCT s_time, e_time, request_id, fun_id FROM common_status ) WHERE to_char (s_time,'yyyy/MM/dd' ) = to_char (now () - interval '1 days', 'yyyy/MM/dd' ) ) GROUP BY fun_id; |
Hourly Statistics
我日檢時只看checkpoint sync的數據,因為它跟整體I/O有關,圖我修改過,完整報表不會這麼精簡這是我丟在系統cron跑的一段shell script,為的是自動跑出前一天08:00~今天08:00間的報表,上班的時候就能看系統前一天的運作概況了
LOG_DIR=/data/pg_log
BADGER_HOME=~/steven/ma/pgbadger-3.5
LOG_DATE=`date +%Y%m%d`
LOG_BEGIN="$(date --date='1 days ago' +%Y-%m-%d) 08:00:00"
LOG_END="$(date +%Y-%m-%d) 08:00:00"
PREV_DATE=$(date --date='1 days ago' +%Y-%m-%d)
CURR_DATE=`date +%Y-%m-%d`
$BADGER_HOME/pgbadger -f stderr -b "$LOG_BEGIN" -e "$LOG_END" -G -o ~/steven/log/YOURDB."$LOG_DATE".html $LOG_DIR/*$PREV_DATE*.log $LOG_DIR/*$CURR_DATE*.log
cd ~/steven/log/
HOST=your ftp host
USER=your ftp account
PASSWD=your ftp account password
/usr/bin/ftp -nv «!x
open $HOST
user $USER $PASSWD
prompt
cd pgbager_parse
binary
mput YOURDB.$LOG_DATE.html
bye
!x