f 該怎麼進行Postgres 最基本的例行檢核工作 ~ 迪貝之家

該怎麼進行Postgres 最基本的例行檢核工作


Postgres 是一個幾乎把所有訊息都寫到系統記錄檔的RDBMS,所以在日常維運上,不透過工具來分析log內容,我還真不知道該怎麼管理我手上的postgres。我用的工具是pgbager-3.5 ,有興趣的,去問一下谷大哥,它一定會告訴你相關資訊。打開pgbager報表後,最上頭就是可供讀取的報表

pgBadger: PostgreSQL log analyzer


Normalized reports are marked with a "(N)".

每份報表都是用pipe符號來切開,不太可能日檢時查看每個部份的報表,就個人的經驗,只會看底下這幾個報表的資訊 :

礙於報表列的部份用截圖的方式像素在這裡太難表達出我想傳達的意念,所以我把上述幾個報表的html code直接抓下來修改後貼上,你用桌機的螢幕看pbagder報表時,當然不會看到這麼鱉的畫面。

我們來看報表吧~~ 很直覺啦 ~~

Most frequent events (N) ^

Rank Times reported Error
1
FATAL: sorry, too many clients already

2
ERROR: could not serialize access due to concurrent update

3
ERROR: duplicate key value violates unique constraint "..."


Most frequent waiting queries (N)^

Rank Count Total wait time (s) Min/Max/Avg duration (s) Query
1216.695s8.321s/8.374s/4.187s
INSERT into public.trans_log (card_nbr,header_termino,merchant_id,message_type,process_code,prog_name,receive_date,resp_code,rrn,system_ip,term_id,trans_date,id )
VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$10,$10,$10);

Queries that waited the most^

Rank Wait time (s) Query
18.374s
INSERT into public.trans_log (card_nbr,header_termino,merchant_id,message_type,process_code,prog_name,receive_date,resp_code,rrn,system_ip,term_id,trans_date,id )
VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$10,$10,$10);
28.321s
INSERT into public.trans_log (card_nbr,header_termino,merchant_id,message_type,process_code,prog_name,receive_date,resp_code,rrn,system_ip,term_id,trans_date,id )
VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$10,$10,$10);

Slowest queries ^

Rank Duration (s) Query
114.598s
SELECT count (* ) FROM (
SELECT
DISTINCT s_time,
e_time,
request_id,
STATUS
FROM
common_status )
WHERE
STATUS = 'Success'
AND to_char(s_time,'yyyy/MM/dd' )
= to_char(now) - interval'1 days','yyyy/MM/dd' );
214.397s
SELECT fun_id, count (fun_id )
FROM ( SELECT fun_id,
s_time
FROM (
SELECT
DISTINCT s_time,
e_time,
request_id,
fun_id
FROM
common_status )
WHERE
to_char (s_time,'yyyy/MM/dd' ) = to_char (now () - interval '1 days', 'yyyy/MM/dd' ) )
GROUP BY
fun_id;

Hourly Statistics

我日檢時只看checkpoint sync的數據,因為它跟整體I/O有關,圖我修改過,完整報表不會這麼精簡
checkpoint sync time
這是我丟在系統cron跑的一段shell script,為的是自動跑出前一天08:00~今天08:00間的報表,上班的時候就能看系統前一天的運作概況了

LOG_DIR=/data/pg_log
BADGER_HOME=~/steven/ma/pgbadger-3.5
LOG_DATE=`date +%Y%m%d`

LOG_BEGIN="$(date --date='1 days ago' +%Y-%m-%d) 08:00:00"
LOG_END="$(date  +%Y-%m-%d) 08:00:00"
PREV_DATE=$(date --date='1 days ago' +%Y-%m-%d)
CURR_DATE=`date +%Y-%m-%d`
$BADGER_HOME/pgbadger -f stderr -b "$LOG_BEGIN" -e "$LOG_END" -G  -o ~/steven/log/YOURDB."$LOG_DATE".html $LOG_DIR/*$PREV_DATE*.log $LOG_DIR/*$CURR_DATE*.log

cd ~/steven/log/
HOST=your ftp host
USER=your ftp account
PASSWD=your ftp account password
/usr/bin/ftp -nv «!x
open $HOST
user $USER $PASSWD
prompt
cd pgbager_parse
binary
mput YOURDB.$LOG_DATE.html
bye
!x