Database and SQL Guidelines
The following is a set of conventions and guidelines adopted by one company to facilitate database
development by programmers unfamiliar with relational databases and SQL coding. Some of these
guidelines reflect the habits of the author – who would be the first to admit that other conventions may
better suit different circumstances and established styles.
Use descriptive names for databases, tables, columns, variables, and procedures. A column named
application_received_datetime is more descriptive than a column named just "date". Names based on 3 or
4 character codes are not very descriptive.
Spell out name components fully; minimize use of abbreviations. A variable named
@application_received_datetime is inherently easier to read and understand than one named
@app_rcvd_date. Programs should be written for people, not for computers.
When abbreviations are necessary, adopt a standard list, and include it as part of your guidelines.
Use lower-case letters, with underscores separating words in names, or alternately, capitalize the first letter
of each word, and forgo the underscores, if a legacy database uses this convention. Do not mix the two
styles. Do not use all capital letters - studies have proven that such text is harder to read.
Table names should be singular: application rather than applications.
Limit table and column names to 60 characters in length.
Suggested convention for lookup tables is to end the table name with _type, and the primary key column
name with _type_code.
The convention for non-lookup tables with a surrogate primary key is to name the primary key column the
same as the table name, with _id appended: consumer_id.
Foreign key columns should be named the same as the primary key columns they refer to, unless multiple
foreign keys to the same table exist, in which case they should be distinguished by usage:
The convention for associative tables is to use both parent tables in the name; a table associating
consumers with applications would be named consumer_application.
Avoid repetitious inclusion of the table name as part of its column names, except when naming the primary
key, or when helpful for clarity.
When naming smalldatetime or datetime columns, append _date to the name when the time is not
important, or _datetime when it is.
Columns used as Yes/No or True/False flags should have _flag appended to the name, should be char(1)
not null, and should be constrained to accept only 'T' or 'F'.
Flag columns which allow for unknowns should have _nflag appended to the name, should be varchar(1)
not null, and accept only 'T', 'F', or '' (empty string).
Check constraints or rules whose names are not generated automatically should be named like:
Foreign key constraints whose names are not generated automatically should be named like:
Primary key constraints whose names are not generated automatically should be named like:
Indexes whose names are not generated automatically should be named like: AK1_<tablename> for
alternate keys (unique indexes), or <columnname(s)>_index, such as ssn_index, or <usage>_index, such
Triggers should be named as: <tablename>_<action(s)>_trigger, such as:
application_insert_update_trigger. If the trigger is an "instead of" trigger, rather than the default (after)
trigger, use _instead_of_trigger.
Database naming convention is to append _dev for development, _qa for quality assurance, _ct for
customer test regions, and append nothing for production. Logins for programs should be identical to the
database name, additional logins with _ro and _dbo appended are customary for read-only and database
Do NOT use character datatypes to hold numbers or dates. (The benefits of using appropriate native types
are numerous, including considerable savings in space, and automatic error checking. To illustrate this
point using a pathological but actual example, in one instance where character datatypes were used for
every field in a table, a file upload to that table succeeded even though an unexpected carriage-return
character was present on every line. All the resulting rows were one character longer than they should have
been, so each row's data was shifted over by as many characters as there were previous rows. A month's
worth of processing was done using the corrupted table, and the customer lost millions of dollars. If some
of the columns were non-character, the load would have failed, and the situation could have been corrected
Define varchar columns as "not null", and use the empty string as a null. This helps client programmers,
who can then always surround strings with single quotes, whether empty or not, rather than having to use
the keyword null without quotes.
Make columns "not null" whenever possible; if a value should always be present, or if a default can be used,
there is no reason to allow nulls.
Do not use the char datatype for columns longer than 5 characters, use varchar instead. NULL values in
char columns take as much space as the full width of the column, unlike varchar columns, where a NULL
value occupies a single byte.
Do not use nchar and nvarchar columns unless there is a specific need for the double-byte character set,
otherwise the contents double in size compared to standard char and varchar storage.
All tables should have a primary key defined, with VERY few exceptions.
Unless demonstrated performance penalties dictate otherwise, make use of foreign key constraints. Do not
rely on client programmers to enforce referential integrity - they are usually too busy, too ignorant, or too lazy
to do it consistently. Foreign key relationships should be enforced in nearly all cases. Use the "alter table
add constraint" statement with a named constraint, to allow dropping and recreating constraints when
necessary. Consider using the cascade or nullify keywords when building the foreign key constraints. Even
though a delete procedure may be programmed to delete children before parents, the server will not know
this, and will check the constraints anyway. Might as well have it do the deletes themselves, too, while it is
performing the constraint checks.
Use natural keys when appropriate. When a composite natural key becomes too cumbersome, consider
retaining it as an alternate key, at least in the "parent" table.
Try to avoid primary keys whose values are not under your control. Consider using a surrogate key for the
primary key instead, and making the would-be key an alternate key. That way, when the alternate keys
change, it will not affect your relations.
Avoid mnemonic character keys, except where obviously suitable, such as state abbreviations. Instead,
define lookup tables with an integer surrogate key.
Avoid character codes in general, mnemonic or otherwise, whether or not they are used in keys. Character
codes belong in old mainframe systems, not in modern designs. (A real world example: A small database
was built to allow "box numbers" of boxes in which paper records were stored to be looked up by station
name and date. The manager insisted that the first character of the station name be included as the first
character of the box "number". All of the paper records were boxed, and all of the boxes were organized in
alphabetical order on warehouse shelves for the first time ever, a considerable accomplishment. Sadly, the
manager only then noticed that when the last of the "A" boxes was full, storing the next "A" box would require
moving all of the "B" - "Z" boxes down to make room...)
Keep foreign key relationships precise by using composite foreign keys when appropriate. For example, a
pair of fields, application_id and consumer_id, should usually point to a consumer_application row taken
together, rather than the two tables individually, eliminating the possibility of records for consumers not
actually on the application.
Normalize design to 3rd or Boyce-Codd normal form. Explicitly note any de-normalizations for performance
or convenience reasons in table and field comments. Enforce the integrity of de-normalizations made for
convenience reasons with triggers or in stored procedures.
Use a modeling tool such as ERStudio or ERWin to produce a data model for the design, and keep it up to
Insert descriptive comments for all tables and all fields into the model for all new databases. Attempt at
least table-level comments for legacy databases.
Consider data-sharing, usage, and backup needs when designing databases. Sometimes a customer-
supplied portion of the database may warrant a separate backup schedule, or the data will be used by other
projects, and thus deserves its own database.
Never use "select * " syntax in production code (except with the "if exists" or "select into" syntax).
Always use column lists in insert statements, preferably formatted one line per column.
Capitalization of SQL keywords is optional.
Ideally, almost all database access should be through stored procedures, and the only SQL in client
programs should be stored procedure calls. Another exception is when client web or system code must
handle variable numbers of inserts, and needs to group them into a transaction, however, each individual
insert can still be done with a procedure call. Putting all SQL into stored procedures can often eliminate the
need to change client code, making enhancements and fixes much less of a hassle when the system is in
Formatting style is at the discretion of the coder, however, a style such as the example below is encouraged:
from application a,
where a.application_id = ac.application_id
and c.consumer_id = ac.consumer_id
and a.received_datetime > dateadd(dd, -30, getdate())
order by c.last_name, c.first_name
If you require rows to be returned in a specific order, use an order by clause. Multiple processor servers
means that the programmer cannot rely on any physical storage order, explicit or implied. If you do not
require a specific order, avoid the order by clause, as sorting can be expensive.
The "distinct" keyword used to eliminate duplicate rows in result sets should seldom be needed. On most
properly written queries it is superfluous, and causes an extra sorting operation which can be quite
expensive for large result sets. Try running the query without distinct. If duplicates are present, look for
unneeded tables in the "from" clause, or add restrictions to the "where" clause.
Never, ever use a cursor to return results to a client program. Use of cursors for "scrolling" back and forth in
result sets is explicitly forbidden, as it results in lock contention and deadlocks. Cursors should be reserved
for server-side processing only.
Avoid the use of the "like" keyword when exact matching should be possible. Especially avoid "like '%
pattern'" with the '%' in front, as it will require a scan of all rows in the table.
Use parenthesis when mixing "and" and "or" conditions, and whenever it makes intent clearer.
Do not use outer joins ( *= or =* syntax, or "left join" or "right join" syntax) unless it is actually necessary.
Outer joins take all rows from one table, regardless of whether there is a match from the other table, and
should not be ordinarily be needed.
Be aware that the difference in execution time between a good query and a bad one can be milliseconds
versus hours, literally. One common cause is unintentional cross products, i.e. two tables in the "from"
clause, with no restriction on how they match up provided in the "where" clause. For example, if there are
10000 applications and 12000 consumers, then the query:
select application_id, consumer_id from application, consumer
...with no where clause would return 120 million rows, rather than the 12000 probably intended!
Avoid putting quotes around numeric constants. It causes unnecessary type conversion, which has a nasty
side effect, namely translating an empty sting to zero when the destination is an integer or a float.
Use ANSI-standard single quotes around character constants, not double quotes. Apostrophes contained
within character data must be escaped with an additional apostrophe (single quote).
Avoid doing date arithmetic, causing implicit type conversions, or using convert( ) on a date column in the
where clause. Instead, perform the arithmetic beforehand to yield a constant value (or two), to which the
column can be directly compared.
For example, instead of the code below…
WHERE convert(char(8), InsertDate, 112) = '20001203'
WHERE convert(char(8), InsertDate, 112) between '20001201' and '20001231'
WHERE convert(char(8), InsertDate, 112) like ‘200012%’
WHERE dateadd(dd, 90, InsertDate) > getdate( )
WHERE datediff(dd, InsertDate, getdate( )) > 30
…use the "between" syntax without the convert, appending the last possible hh:mm:ss:ttt of the time to the
end date string, or perform the date arithmetic to yield a constant.:
WHERE InsertDate between '20001203' and '20001203 23:59:59:999'
WHERE InsertDate between '20001201' and '20001231 23:59:59:999'
WHERE InsertDate between '20001201' and '20001231 23:59:59:999'
WHERE InsertDate > dateadd(dd, -90, getdate( ))
WHERE InsertDate < dateadd(dd, -30, getdate( ))
Note that no arithmetic or type conversion is performed on, or function calls applied to, the InsertDate
column itself; it is simply being compared to constant values. This allows SQL Server to use indexes when
present, instead of performing an operation on the column value for every row in the table to see if it
satisfies the where clause.
Stored procedures should be coded to return a zero to the caller for a successful operation, or a non-zero
value when errors are encountered. This refers to the integer return value of the procedure, not rows
returned to the client in result sets.
A convention for procedures performing an insert where a new primary key is generated is to return the
primary key as a single-row, single-column result set to the client program, with a value of -1 if the insert
was not successful.