Advanced Rails - Building Industrial-Strength Web Apps in Record Time

(Tuis.) #1
Large/Binary Objects | 103

thepg_largeobjectcatalog is global to the database, and accessible by anyone with
permission to connect to the database. The large object mechanism is also slightly dep-
recated in favor of in-table storage, as the TOAST storage technique allows values of
up to 1 GB in length to be stored directly as attributes within the table.


My recommendation is to use filesystem storage for all binary objects if you use
PostgreSQL. Although the database might be the more proper place for this type of
data, it just does not work well enough yet. If you have to use the database, large
objects actually perform pretty well. Avoid BYTEA at all costs.


MySQL


MySQL does a fairly good job with binary data. LOB-type columns (including the
TEXT types) can store up to 4 GB of data, using the LONGBLOB type. Actual stor-
age and performance depend on the wire protocol being used, buffer size, and avail-
able memory. Storage is efficient, using up to 4 bytes to store the data length,
followed by the binary data itself. However, MySQL suffers from issues similar to
PostgreSQL with streaming data, and it is always more awkward for a web applica-
tion to stream data from the database than from the filesystem.


Oracle


Oracle supports the BLOB data type, for objects up to 4 GB. It is supported by a
fairly mature API, and can be used directly from Rails.


Oracle also provides the BFILE type, which is a pointer to a binary file on disk. Con-
sider it a formalization of the filesystem storage method discussed below. This may
prove to be of value in some situations.


Filesystem Storage


The reality is that filesystem storage is the best option, as a general rule. Filesystems
are optimized to handle large amounts of binary and/or character data, and they are
fast at it. The Linux kernel has syscalls such assendfile( )that work on physical
files. There are hundreds of third-party utilities that you can only leverage when
using physical files:



  • Image processing is arguably the most popular application for storing binary
    data. Programs like ImageMagick are much easier to use in their command-line
    form, operating on files, rather than getting often-problematic libraries like
    RMagick to work with Ruby.

  • Physical files can be shared with NFS or AFS, put on a MogileFS host, or other-
    wise clustered. Achieving high availability or load balancing with database large
    objects can be tricky.

  • Any other utility that works on files will have to be integrated or otherwise mod-
    ified to work from a database.

Free download pdf