Advanced Rails - Building Industrial-Strength Web Apps in Record Time

(Tuis.) #1

102 | Chapter 4: Database


Database Storage


The DBA types among us might prefer database storage of large objects. From a the-
oretical standpoint, storing binary data in the database is the most clean and straight-
forward solution. It offers some immediate advantages:



  • All of your application data is in the same place: the database. There is only one
    interface to the data, and one program is responsible for managing the data in all
    its forms.

  • You have greater flexibility with access control, which really helps when work-
    ing with large-scale projects. DBMS permitting, different permissions may be
    assigned to different tables within the same database.

  • The binary data is not tied to a physical file path; when using filesystem storage,
    you must update the file paths in the referring database if you move the storage
    location.


There are many practical considerations, though, depending on your DBMS’s imple-
mentation of large objects.


PostgreSQL


PostgreSQL has some downright weird support for binary data. There are two ways to
store binary data in a PostgreSQL database: the BYTEA data type and large objects.


The BYTEA*type is the closest thing PostgreSQL has to a BLOB type—just a
sequence of bytes—but it is really terrible for large amounts of binary data. The pro-
tocol for shuttling BYTEA types back and forth from the database requires escaping
all non-printable bytes, so a single null byte would be encoded as the ASCII string
\000(4 bytes). Needless to say, this causes unnecessary expansion of the data. In
addition, it is impossible to stream data from the database to the web browser with-
out running it through an unescape filter. Pulling a 2 MB binary file from the data-
base usually means streaming somewhere around 6 MB of data through the unescape
code.†The naïve method runs all of the data through Ruby strings, where it balloons
tremendously in memory. A better option would be to have thepostgresC library
handle quoting and unquoting, but this is a lot of work and still suboptimal. Up to 1
GB of data can be stored in a BYTEA column.


The other option is large objects. The large object features in PostgreSQL work well
enough, but they are also a little bit clunky. Files are kept in thepg_largeobjectsystem
catalog in small pages.‡A pointer is kept in the referring table to the OID (object ID) of
the file. Up to 2 GB of data may be stored in a large object. This method is fast, and has
good APIs, but there are drawbacks. There is no per-table or per-object access control;



  • Short for “byte array.”
    † Assuming fairly uniform binary data, the BYTEA quoting rules lead to an average expansion ratio of 1:2.9.
    ‡ The size is defined in LOBLKSIZE. It defaults to 2 KB.

Free download pdf