Advanced Rails - Building Industrial-Strength Web Apps in Record Time

(Tuis.) #1
ActiveRecord Performance | 169

One complication in indexing decisions is that certain DBMSs (MySQL 4 and ear-
lier, and PostgreSQL 8.0 and earlier) will use at most a single index on each table per
query. This means that you should choose your indexes carefully. Consider this
query, which shows a list of a user’s payments, showing the most recent first:


SELECT * FROM payments WHERE user_id = 12345 ORDER BY paid_at DESC;

Even if there are separate indexes defined on(user_id)and(paid_at), the best those
DBMSs can do is to use theuser_idindex and manually sort onpaid_at, without the
help of an index. In order to use an index, you must define one that includes both
columns:(user_id, paid_at). This situation happens often when usingacts_as_list:
many times you want to filter based on a foreign key and then sort by a position
column.


Check your DBMS manual for details. Some database systems have further restric-
tions or hints. For example, on multicolumn indexes, PostgreSQL performs best
when the most restrictive condition applies to the leftmost column in the index.


In PostgreSQL, it is also possible to create an index on an expression such as
LOWER(last_name). This index would be used, for example, to satisfy clauses like
WHERE LOWER(last_name) = 'smith' or ORDER BY LOWER(last_name) ASC. This is a
tradeoff: the expression must be calculated for each row when the index is created,
but that expression is treated as a constant when querying.


You should take a look at your development log to find common queries that are
either being executed too often or take too long. You may find that you have missed
an index. Common places where indexes are omitted are the following:



  • Columns that need to be sorted:position(when usingacts_as_list) or any
    other natural sort field such as last name, transaction date, or price.

  • Lookup fields other than foreign keys: order number, department code, or
    username.

  • Columns that are commonly used in aGROUP BYquery—but be careful that the
    indexed attribute has a high enough cardinality (number of unique values) to
    make it worth an index.


Full-text indexing


I do not recommend using a DBMS’s built-in full-text indexing capability. Many
applications will quickly outgrow the limitations of the built-in indexing features.
Repopulating a large full-text index can be slow, and it is better to have indexing
under your control in situations where the database may grow.


The industry leader in full-text search is Ferret, Dave Balmain’s port of Lucene. It is
available fromhttp://ferret.davebalmain.com/trac/. It has a great reputation and is
almost infinitely configurable. It keeps most of Lucene’s API, so it will be more famil-
iar to Java programmers.

Free download pdf