The results and ramblings of research

phewww!

PostGRES performance issues

leave a comment »

So I have been working with PostGRES for a while. I am currently using the database to store large amounts of social media data on which I am applying NLP techniques to extract locations. Once locations are extracted I convert them to their corresponding spatial coordinates. However, since social media has high volumes I am ending up with large number of updates every day (250K+ records / day). Over the past year, I have moved from my alma mater to Xerox for a bit. I was unable to monitor the database for about 3 months. At the end of about 3 months I logged on my database server and found that my server was running low on disk (not surprising). Turns out that the database autovacuum daemon was turned off and no one in my group had run vacuum db in a while. I think the first question I was asked was why do we need to run vacuum.

The answer to that question lies in the fact that once a record is updated, Postgres keeps a copy of the old record to maintain consistency (remember ACID for relational DB’s). A novice might say that’s dumb, however, when a record is updated, the DB needs to maintain an old copy so that any outstanding transactions can complete and the ACID guarantees are enforced. The auto-vacuum daemon automatically reclaims space by removing old copies of records, however, since ours was turned off the database keep increasing in size due to the DB maintaining old records. To understand DB space usage use the following commands from the postgres shell (You need to be logged in as the database superuser postgres):

db=# SELECT relname, pg_size_pretty(relpages::bigint * 8 *1024) AS size, CASE WHEN relkind = 't' THEN (SELECT pgd.relname FROM pg_class pgd WHERE pgd.relfilenode::text = SUBSTRING(pg.relname FROM 10)) ELSE (SELECT pgc.relname FROM pg_class pgc WHERE pg.reltoastrelid = pgc.relfilenode) END AS refrelname, relfilenode, relkind, reltuples::bigint, relpages FROM pg_class pg ORDER BY relpages DESC;

relname | size | refrelname | relfilenode | relkind | reltuples | relpages
—————————————————+————+—————————-+————-+———+———–+———-
pg_toast_12317481 | 3059 MB | table1 | 12317489 | t | 8955 | 391493
pg_toast_12317473 | 2847 MB | table2 | 12317476 | t | 86909 | 364427
table2                              | 29 MB      | pg_toast_12317473          |    12317473 | r       |    276011 |     3766
table1                            | 8976 kB    | pg_toast_12317481 | 12317481 | r | 42837 | 1122

The TOAST tables show how much data each tables copies are utilizing. For example, table1 has a toast table that is using 3GB and has 8955 tuples over 391493 pages (each database page is 8KB). Obviously table1 needs to be vacuumed as 8955 tuples really dont need that many pages on disk. So I ran

db=# vacuum full verbose analyse table1;

Once the vacuum finished (It may take a while), I looked at the toasts tables again using the command above.

relname | size | refrelname | relfilenode | relkind | reltuples | relpages
—————————————————+————+—————————-+————-+———+———–+———-
table2                              | 29 MB      | pg_toast_12317473          |    12317473 | r       |    276011 |     3766
pg_toast_12317481                                 | 11 MB      | table1      |    12317489 | t       |      5726 |     1396
pg_toast_12317473                                 | 11 MB      | table2       |    12317476 | t       |      5694 |     1377
table1                            | 8976 kB    | pg_toast_12317481 | 12317481 | r | 42837 | 1122

Saved a lot of space 🙂

UPDATE:For postgresql 9.0 the command is

db# SELECT pgn.nspname, relname, pg_size_pretty(relpages::bigint * 8 * 1024) AS size, CASE WHEN relkind = 't' THEN (SELECT pgd.relname FROM pg_class pgd WHERE pgd.reltoastrelid = pg.oid) WHEN nspname = 'pg_toast' AND relkind = 'i' THEN (SELECT pgt.relname FROM pg_class pgt WHERE SUBSTRING(pgt.relname FROM 10) = REPLACE(SUBSTRING(pg.relname FROM 10), '_index', '')) ELSE (SELECT pgc.relname FROM pg_class pgc WHERE pg.reltoastrelid = pgc.oid) END::varchar AS refrelname, CASE WHEN nspname = 'pg_toast' AND relkind = 'i' THEN (SELECT pgts.relname FROM pg_class pgts WHERE pgts.reltoastrelid = (SELECT pgt.oid FROM pg_class pgt WHERE SUBSTRING(pgt.relname FROM 10) = REPLACE(SUBSTRING(pg.relname FROM 10), '_index', ''))) END AS relidxrefrelname, relfilenode, relkind, reltuples::bigint, relpages FROM pg_class pg, pg_namespace pgn WHERE pg.relnamespace = pgn.oid AND pgn.nspname NOT IN ('information_schema', 'pg_catalog') ORDER BY relpages DESC; 
Advertisements

Written by anujjaiswal

May 29, 2011 at 7:48 pm

Posted in Database, PostGRES

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: