PostgreSQL allows end users to store BLOBs (binary large objects) in the database. Many people use these functions to handle data directly in SQL. There has long been a discussion about whether this is a good thing or not. We expect that particular discussion to go on indefinitely. However, if you are in the “pro BLOB” camp, we want to share some insights into how binary data can be handled in PostgreSQL with maximum efficiency.
Loading files into the database
BLOBs are important: to show how they work, I have created a simple file which we can import into the database later on:
iMac:~ hs$ echo abcdefgh > /tmp/file.txt
To import this file, we can call the lo_import function and pass the name of the file you want to load (as superuser). Here is how it works:
test=# SELECT lo_import('/tmp/file.txt'); lo_import ----------- 98425 (1 row)
What we see here is that PostgreSQL gives us a number (= object ID). Note that the filename has “vanished”, so you have to store it somewhere if you want to remember those values. The way to do this is to create a basic table:
test=# CREATE TABLE t_file (name text, oid_number oid); CREATE TABLE test=# INSERT INTO t_file VALUES ('/tmp/file.txt', lo_import('/tmp/file.txt')) RETURNING *; name | oid_number ---------------+------------ /tmp/file.txt | 98432 (1 row)
PostgreSQL gives you the option of using the OID data type to store object IDs. Storing the filename is easy – a text or varchar column will do the job, in case the path is needed later on. In general, the large object is totally independent of the file in the filesystem – it has nothing to do with it. Storing the filename is therefore merely done in order to remember what we imported.
However, don’t let’s worry about a single file. Let’s import it a million times, and see what happens:
test=# INSERT INTO t_file SELECT '/tmp/file.txt', lo_import('/tmp/file.txt') FROM generate_series(1, 1000000); INSERT 0 1000000 test=# TABLE t_file LIMIT 10; name | oid_number ---------------+------------ /tmp/file.txt | 98432 /tmp/file.txt | 98433 /tmp/file.txt | 98434 /tmp/file.txt | 98435 /tmp/file.txt | 98436 /tmp/file.txt | 98437 /tmp/file.txt | 98438 /tmp/file.txt | 98439 /tmp/file.txt | 98440 /tmp/file.txt | 98441 (10 rows)
The file has been imported nicely. Each file has a new object ID, as shown in the previous listing.
Behind the scenes: How PostgreSQL stores BLOBs
Now that you have seen how to import data, we can take a look at the internals to figure out how data is stored behind the scenes. The way PostgreSQL handles things is by adding the data to pg_largeobject:
test=# \d pg_largeobject Table "pg_catalog.pg_largeobject" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- loid | oid | | not null | pageno | integer | | not null | data | bytea | | not null | Indexes: "pg_largeobject_loid_pn_index" UNIQUE, btree (loid, pageno)
The “loid” contains the object ID we have stored in our table. The real data is stored as a “bytea” (byte array) column. Since some files might be really large, PostgreSQL breaks them up into chunks.
Byte array fields are easy to display. However, in the default setting it is not really human readable (which is pointless anyway, because we are talking about binary data). I have imported some text in my example, so the “escape” format is a bit more readable:
test=# SET bytea_output TO escape; SET test=# SELECT * FROM pg_largeobject WHERE loid = 98437 ORDER BY pageno; loid | pageno | data -------+--------+-------------- 98437 | 0 | abcdefgh\012 (1 row)
What we see here is that the BLOB interface is really a wrapper around the “bytea” data type. That is the real magic.
Now let’s move on to deleting BLOBs again. The way to do that is to call the “lo_unlink” function. It will remove the entry from the system catalog:
test=# SELECT lo_unlink(98432); lo_unlink ----------- 1 (1 row)
Removing a simple entry will be easy. However, let’s try to answer the following question: What happens when millions of BLOBs are removed from the system within the very same transaction?
Removing millions of BLOBs at a time
Running the deletion is straightforward. The result, on the other hand, is fairly unexpected:
test=# BEGIN; BEGIN test=*# SELECT lo_unlink(oid_number) FROM t_file ; ERROR: out of shared memory HINT: You might need to increase max_locks_per_transaction. test=!# ROLLBACK; ROLLBACK
PostgreSQL is not able to run the transaction because we have run out of memory! How can that happen? When a large object is deleted, it has to be locked. The trouble is that the number of locks in shared memory is limited.
NOTE: Please bear in mind that row locks are NOT stored in shared memory. This is
only true for objects (table locks, etc.)