Dont Panic!

Oracle SQL Coding Guideline

PreviousPage NextPage

Updated: 19 Feb 16

Main

Contact Me

Software

Guideline, Oracle SQL Date: Jul-2002 Copyright © 2000 Jerry Mattsson / Timehole.com All Rights Reserved. CG_jm1.php

Guideline Oracle SQL Database Access



Written by Jerry Mattsson
This document was originally written a some time ago and has not been published, but why not let it out. I revised it and put in some new comments here and there. The contents is not very version specific and should be valid today as well as then. The purpose of this document is to serve as a practical guide or as a standard for programmers and system designers, writing, generating or specifying program modules that access an Oracle Rdbms. All this to create code and systems that are easy to maintain and that performs well in demanding applications. All this with the primary focus of functionality and performance. It is not intended as a SQL tutorial. There are a lot of these around anyway.
This document may be included in your system design guidelines or standards, as long as it is included as is, or the reference is clearly stated. Your own addition could be placed in an addendum or a preface, stating which parts that are used, changes to the parts used, or the parts that you want to exclude from your design guide.
Guidelines in this document is intended to be independent on what tools, that is actually used for the access. Suggestions and questions are welcomed. Send any questions or comments to me at: jerry.mattsson@timehole.com.
Reformatted Dec-2002. I hope it looks a little bit better.
ORACLE is a registered trademarks of Oracle Corporation.

Contents:

1 Introduction


SQL is a set oriented language and is used to define and change data in a relational database. The SQL functionality together with a well designed database can eliminate a lot of application and client side code.
In a perfect world it could probably possible to put more than 90% of the code on the database side in SQL. You could do data verification, validation, data formatting and data manipulation that also could go into the application side code. But if put in the database it will minimize the amount of code you write in the application. Data will be consistently verified and you could easily use different end-user interfaces to handle the code. If you do, what is left for the application code is normally only user-interaction activities and some formatting and error handling.
The drawback could be that you must know how to write efficient, complex and long SQL functions. And you must have to have a well designed and normalized data structure. An often seen sign of a bad database design is the absence of more complex SQL statement and the view that it is hard to write the SQL-functions that does what you want it to do.
The opposite is also true, a well normalized database makes it easier to write SQL-functions that does a lot of the work that you otherwise would put in your application.

When the design quality and normalization level of the database is questioned, the argument is often that the design is done for performance reasons, witch almost never are true. A well normalized database schema in an Oracle database does not normally degrade performance, the contrary is more often true. The most common error in the design phase is that the performance reasons are introduced to early. This in turn is a little bit strange when we often in the next breath defend our bad design with the argument that we have enough computer power to do it this way anyway.
The logical design of the database should be made in a very idealistic way. If you in implementation phase have to compromise or store data in a non-normalized way that should be a very well thought through decision, with both flexibility and performance in mind.
In addition to this constraint definition is seldom used to the extent that it should. If used, this also means that application code is eliminated. The total number of access to the database decreases and performance increases. Further to this the database normally makes checks faster and more concise than if you put this in the application code, especially primary and foreign key relationship checks.
All this is positive and time saving results if you do a good database design and make use the SQL language in the way that it should.

If we then start to use the procedural capabilities of the database then an extra level of functionality is introduced with much of the same advantages as well. But this is not covered in this document. When we create applications today, we have the possibility to use many different environments and tools to develop, test and execute our code. Actual access to an the database could be hidden from us or done from different interfaces and programs, but the access will always be made through the SQL-language. Different tools is often optimized to generate one or a couple of different types of applications, reports, data-loading or manipulative programs. It is therefore important to use the right type of tool ( forms-generators, report-generators, data loaders, 3-GL etc. ) for the specific problem we are trying to solve. If you are developing large applications, or systems with high performance expectations, it will be important to utilize the power of tool in the best possible way. This is to efficiently use the power of the computers that will handle the database request, both on the client and the server side. Even if it today sometimes feels like we have access to infinite CPU and memory resources, it is easy to try to use more than you have. Using 3-GL-type interfaces ( like C, C++, Fortran, Java etc.) gives us more control and more fine tuning options. It also poses more responsibilities on us, to code efficient and to create good and trouble free multi-user programs.

<Up to Contents>

2 Naming Conventions



2.1 Naming Database Objects.

Names of all database objects, tables, views and synonyms should be consistently constructed and named so it will be possible to "guess" or deduce the name when writing code. It's not really important how this is set up as long as you use it and makes it easy to find and understand what kind of objects you are dealing with. Simple is good.
Hopefully you are using tools to design and create your database schema. These tools has their own naming conventions and it should be wise to accept the tools standard.
If not, define a standard to all objects types that you will use. Make your own list, something like this:

Type

Format / Sample

Description

Tables

Adm_Personnel Persons

Short unique descriptive name. Uniqueness may be enforced with application system prefix.

Views

Adm_Persons

Same as Tables

Synonyms

Adm_Persons

Same as Tables

Table Aliases

PER

Three to five character of initial table name.

Columns

Ename

Short descriptive name. Preferable without special characters. Used as required.

Column Aliases

"Family Name"

If special characters are used this should be a double quoted String. Used as required.

Column Prefix

PER

Use table Alias, Used as required.

Constraints


Composed from the tablename + constraint type + column short name
+ type of constraint or number

- Primary key

Adm_Per_PK


- Foreign key

Adm_Per_FK_Dept_Dname


- Unique key

Adm_Per_UK_Ename


- Check

Adm_Per_CHK_Ename_UC

Check ename is upper case

- Check

Adm_Per_CHK_Ename1


Index

Adm_Per_IX_Dname

Appl. + Table alias or index type + IX + Col names

<Up to Contents>

2.1.1 Synonyms.

Public objects are references to tables, synonyms, packages, procedures, functions and views globally and commonly accessible for users or programs that the user run within a database.

Private objects are references within a schema to objects that are hidden from the other users and used only by programs or views that access the schema directly.

This can be used to hide the actual physical objects from the user or application and give you the possibility to exchange the actual underlying physical object without changing the application.

Create public synonyms for all object that should be commonly available, for reporting or ad-hoq access. The name of the synonym could be the same as the actual object but the importance here is that the synonyms are the easy-to-remember object names.

To create public synonyms requires the specific right to do this. This right is by default only are owned by database administrators ( DBA's ), but this right could be given to the application owner accounts, consult your DBA for advice.
All public database objects must by definition be created with a database unique name. Creating public synonyms will enforce object name uniqueness in a database.

You create your public synonyms with the create synonym statement:

CREATE public synonym EMP for SCOTT.EMP;

All users and programs can there after use the synonym EMP to refer to the table SCOTT.EMP. Recreating the synonym after that for another object will redirect all accesses to that "new" object.

CREATE public synonym EMP for SCOTT.EMP2;

This is not the same thing as to give everybody access to the (public) object. The rights to objects are defined separately and the access right should be granted to the users directly or via roles. The access structure and the object rights should be defined and set up in the development phase in the development system in the same way as it will be in the production system.
This will let you define and test your security structure throughout the development phase.
There might also be a number of private synonyms pointing on other objects with the name of EMP. Private synonyms override public and could be used in the development schema for different users pointing to private temporary objects during development.

<Up to Contents>

2.1.2 Constraints.

Name constraints consistently. This gives you the possibility to track the source of the problem to the underlying objects and to understand what kind of rule that was violated in a program. Whenever there is an error in the database, caused by violating a constraint, the database reports this with a message containing the name of the constraint. The constraint name should be assembled from the table short name and a shortening of the constraint type, plus the column that the constraint is defined for. Some constraint types may be declared multiple times for columns and should therefore be numbered or otherwise further defined.

CREATE TABLE EMP ( EMPNO NUMBER(4,0) Constraint EMP_CHK_EMPNO_NN check(EMPNO NOT NULL), ENAME VARCHAR(10) Constraint EMP_CHK_ENAME_NN check(ENAME NOT NULL), EMP_CHK_ENAME_UC check(ENAME=upper(ENAME)), JOB VARCHAR(9) Constraint EMP_CHK_JOB_NN check(JOB NOT NULL), EMP_CHK_JOB_UC check(JOB = upper(JOB)), MGR NUMBER(4,0) REFERENCES EMP (EMPNO) Constraint EMP_FK_EMP, HIREDATE DATE Constraint EMP_CHK_HIRED_NN check(HIREDATE NOT NULL), SAL NUMBER(7,2) Constraint EMP_CHK_SAL_Range check(SAL > 0), COMM NUMBER(7,2) Constraint EMP_CHK_COMM_Range check(COMM > 0), DEPTNO NUMBER(2,0) Constraint EMP_CHK_DEPTNO_NN check(DEPTNO NOT NULL), FOREIGN KEY (DEPTNO) REFERENCES DEPT (DEPTNO) Constraint EMP_FK_DEPT, PRIMARY KEY (EMPNO) Constraint EMP_PK_EMPNO);

<Up to Contents>

2.2 Naming Files.

2.2.1 General

Programs (source, files, modules) should be named according to your general program naming standard. CASE-tools usually use a naming standard and could generate names like APP0123 for a program module. This is a good practice to adopt for a general programming naming standard. The application system short name would be used as a prefix and added to that, a running numbering of the program or a short description.

Use short program/module names with 8 character or less. If this program is a file containing SQL statements, intended to be run as a ( e.g. maintenance ) program, this should be named APP0123.SQL, where APP is a short identification for the application system, and 0123 the number of the program. All program file names should be short, preferably at the most 8 characters with a 3 letter suffix, to allow portability to simple file systems.
Files used to create or manipulate database objects can be included in or run from a number of different tools. The naming of these files should be in conformance to the tools convention. If this is not appropriate or if the file is intended for general use, the following file name schema could be used:

Type

Name

Suffix

Sample

General SQL command file

AppShortName + Number +

sql

PER0012.sql

Table definitions

AppShortName + Number +

tab

PER0013.tab

Constraints definitions

AppShortName + Number +

con

PER0014.con

Grant commands

AppShortName + Number +

grt

PER0015.grt

<Up to Contents>

2.2.2 Source code comments

All source code should at least contain information about who wrote the code, what it was intended to do, and when the code was created. Header information could look like:

REM Title: Administration of user grants. REM File: Adm0033.SQL REM Version: 1.0 REM REM Description: This program is intended to be used for...... REM Author: Jerry Mattsson / Timehole.com. REM REM Create date: 5-apr-99 REM Modified date: 6-may-00 REM Modification purpose: Added functions in ..... REM Modified by: JM

Scripts that are intended to be used from or within a specific interface or program, should be documented in accordance to that program standard. Like all other source code it should be version controlled and well documented.
If the code is generated from a code generator, the generator can include comments, and history information in the code. Enter the appropriate information in the tools Description, Notes, Modification History, etc. This will then hopefully be used in the documentation part of the generated code. Remember to put your modification information in the tool, and to reverse engineer the code when you are modifying the code manually.

<Up to Contents>


3 The SQL Language



3.1 Data definition and data manipulation

3.1.1 Definitions

The SQL language can be divided into:

DDL-statements - data definition language. DML-statements - data manipulation language. Control-statements - controls the environment or session.

DML-statements are used to process or manipulate data in the database. This type of operations should be used in your application programs. The basic functions for this is:

SELECT - Retrieves data from the database INSERT - Create data in the database UPDATE - Change data in the database DELETE - Remove data from the database

DDL-statements are used to create objects in the database, tables, indexes, procedures etc. This type of operations should NOT be used in your application programs.

DDL-commands are:

CREATE - Create tables, views, procedures, tablespaces..... ALTER - Changes objects like tables, database .... DROP - Removes tables, views, procedures..... GRANT - Gives rights to roles or users. REVOKE - Removes rights from users or roles.

Remember that SQL-scripts containing DDL-statements to create or modify database objects, including tables and views should be regarded as source code and should be treated as such.
Keep the valid DDL-code up-to-date and deliverable when new application code is produced.

<Up to Contents>

3.2 Transactions

3.1.2 Transactions and savepoints

A transaction is a series of data manipulations that we want to regard and handle as one undividable unit. When we have done our changes to one or more tables or set of data, we issue a COMMIT, to finalize and store the changes in the database.
When the COMMIT is accepted by the database, it is confirmed, with a success return code or a message from the database. The transaction is only completed in it's whole or not at all. Changes in the database can not be viewed by other users, until the transaction is committed.
The transaction starts at the last commit or full rollback.

Some tools starts the transaction as the first action after the log-on event, some explicitly set the first transaction, after some initial work, or on user request. Check the tools behavior to be sure.
We always commit the total number of changes to the database, and if one part in this fails, the transaction fails. If the first update in a transaction is possible, but the second one fails, we can undo the changes made, with the ROLLBACK statement. This will back all the changes made since the transaction began and reset data to the state it was when we started our transaction.

COMMIT; End Point for last transaction, start new transaction. DELETE FROM Emp Delete a row in one table. WHERE Empno = 7139; DELETE FROM Dept; Delete of all rows by mistake ROLLBACK; Undo to last commit.

This type of undo function can also be done to return to a earlier defined "bookmark", called savepoints. This could be set somewhere after the start of the transaction and can then be used by us as a return point, if things fails.

UPDATE Emp SET Sal = Sal + 10; Update before savepoint SAVEPOINT BeforeDelete; Set a label to rollback to DELETE FROM Dept Delete some row WHERE Deptno = 10; DELETE FROM Emp Delete some other row WHERE Deptno = 10; ROLLBACK To BeforeDelete; Regret the two previous deletes, but keeps the update.

<Up to Contents>

3.2.2 DML vs. DDL statements in the application code.

DML-statements is transaction dependent requires that the user or program issues a commit or rollback for the transaction, whenever the operation is regarded as complete.
DDL-statements have a number of specific properties that requires special database rights and resource privileges, these should not normally be available to a application users. DDL-statements are not transaction dependent. Each statement makes an implicit COMMIT. This means that it is not reversible as a data manipulating statement is, and that if there are uncommitted transactions pending, they will be committed when a DDL-statement is executed. DDL-statements should normally not exist in the application code.
Do not use DDL-statements in application code.
If you do:

INSERT into emp ( empno, ename, sal, hiredate ) VALUES ( 7303, 'SMITH', 3201, '12-MAY-94'); /* and after this do a */ DROP table dept;

The drop command would commit the inserted row(s), when the drop command is issued. No rollback option exist for the insert statement anymore. DML-commands should be the only type of statements that are used in the application program code.
Exceptions to this is control statements:

SET Set the current role, or set the "transaction read only". ALTER In the "alter session"; command, to set NLS-or trace- functions.

<Up to Contents>

3.3 Read Consistency.

Read consistency is an important concept in the Oracle database. Statement level read consistency guarantee that the information from one operation is consistent, during the time the statement is executed. The data that is acted upon or retrieved to us is the same set of data, and in the same state as when we started the operation.
When we do a:

SELECT sum(sal) FROM large_emp;

the sum is the actual sum from the moment that we started the query. It will not change during that operation, and it is not dependent on what other users do during the time the statement is executing.

If another user is deleting all the rows:

DELETE FROM large_emp;

the information does not disappear in some strange way for the first user, because the other user(s) operation was faster, or the priority was higher than ours. This is always guaranteed for each statement that we execute, and the read consistency, by default works without any "read locking".

To have read consistency on consecutive statements you have to issue the command:

SET transaction read only; SELECT 'some information' FROM any_table; Delete from another_table; Insert into any_table select * from backup_of_another_table; COMMIT;

Read consistency does not cause any locking. It is done automatically but it requires the database to save changed information in the rollback segments, and if the time or size of the reads and writes done simultaneously are large, or the time between the statements is long, these has to hold the active information longer and these areas have to be dimensioned properly.
Always consider the effect of the read-consistency functions in the database.

<Up to Contents>

3.4 Use the power

3.4.1 SQL functionality.

The select statements are the most commonly used and a versatile SQL statement.
There are a number of basic types of select-statements, and you should be familiar with the different variations and the functions that you can use. E.g. outer joins, subqueries, exist functions, group by functions, having clause, union, intersect, minus etc.

Simple one table searches:

SELECT e.ename Give me the name of the person having FROM emp e Empno equal to 7139 WHERE e.empno = 7139;

Simple Join searches:

SELECT e.ename Give me all employee names that has a FROM emp e, dept d department registered to them. WHERE e.deptno = d.deptno; Eliminate the ones that don't have one.

Outer Joins:

SELECT e.ename, e.empno Give me only the name of the employees that FROM emp e, dept d not are assigned any department, WHERE e.deptno = d.deptno(+) or the department has vanished from AND d.deptno is null; the department table.

Subqueries:

SELECT e.ename Give me the name of the highest paid employee. FROM emp e WHERE e.sal = ( SELECT max(e2.sal ) FROM emp e2 );

Correlated Sub query:

SELECT e1.ename Give me all names of the persons that have FROM emp e1 a higher salary than the average on their WHERE e1.sal > own department. ( SELECT avg(e2.sal) FROM emp e2 WHERE e1.deptno = e2.deptno );

Set-oriented queries:

SELECT e1.ename,e1.sal Give me all employees names and salaries that FROM emp e1 are above the average, WHERE e1.sal > but not those with ..... ( SELECT avg(e2.sal) FROM emp e2 ) MINUS SELECT e3.ename, e3.sal FROM emp e3 WHERE e3.sal > ( SELECT avg(e4.sal) + avg( e4.com) FROM emp e4 );

These small examples of select statement shows that the basic functions can be combined and linked with one another to solve quite complex problems or questions ( this also applies to "update, delete and insert" statements). It is easy to understand that when the complexity and the number of tables involved increase, it will be more important to give the database as much information as possible, so that it will perform it's task as good and efficient as possible.

Try to perform as much as possible of the logical work in each SQL-statement.

<Up to Contents>

3.4.2 Oracle Rdbms SQL-functions

In addition to the standard SQL functions that the Rdbms can handle there are a number of functions in the Rdbms. There are about a hundred predefined functions for string handling, numeric and date conversions, calculations etc. Here are some examples of commonly used function, that you may use in your SQL-statement:

decode - a "case-test" construction, nvl - tests and replaces null values, substr - extracts a part of a string, instr - locates a part of a string, add_months - adds a number of months to a date, to_date - converts a string to a date, to_char - converts dates and numbers to strings, To_number - converts strings to numbers

These can be used to manipulate, change or test the data retrieved on rows that are processed. It also makes it possible to perform further data manipulating at the data access layer than with the "pure-SQL" and it could be used instead of putting this code in the application.

This will minimize both the the amount of code that are needed in the application program and the number of round trips that the program has to do to access the database.
E.g. This database function eliminates leading spaces and gives us the initial character of the name:

DECLARE -- a one character variable in your host program to receive the initial l_EnameInitial varchar(1); BEGIN -- some more code ..... SELECT substr(ltrim(P.Ename,' '),1,1) INTO l_EnameInitial FROM Adm_Pers P WHERE P.Empno = 1234;

The alternative should be to write the host application code to perform the string manipulating functions. Receive the string, search it, and move the first character to a separate variable.
In Oracle Rdbms it is possible to define your own functions in PL/SQL or in JAVA, and use these in your SQL statements in addition to the predefined ones.

<Up to Contents>

3.4.3 Type conversion

Automatic type conversion is done whenever it is needed within the Oracle Rdbms-kernel.

Conditions like:

WHERE numeric_column = '123'

works fine although this test does not compare data of the same data type.
The database automatically converts either one of the two values to a common data type before comparison.
But this functionality should not be used by default.
You should always try to do an explicit data conversion in your statements if it is required. This is because we don't know for certain that the database applies the conversion on the "right" element. The result is normally ok, but it could also mean that indexes are not used and we cause the database to chose a slower access path than necessary.
Explicit conversion:

WHERE numeric_column = to_number('123'); or WHERE date_col = to_date('19-may-94'); instead of implicit conversion: WHERE date_col = '19-may-94' or even worse, do not explicitly convert columns that could use indexes in the search: WHERE to_char(date_col,'yymmdd') = '951212'

Be sure that the data types you are comparing are of the same type or do the conversion before you submit the code to the database, or do it explicitly in the statement.

<Up to Contents>

3.5 Comments in the SQL code

SQL statements are created to perform a specific operation. The same operation can be written in a lot of different ways, good or bad and there are few limitations on how you can write your SQL. It should be written both for ease of understanding and efficiency.
Format your SQL so that it will be easily read.
The more complex your SQL is becoming, the more important it will be to include comments in it. Later it will be hard, to figure out what some large and complex SQL- statement actually is supposed to do, especially for someone else than the author.
Use comments whenever it might be tricky to understand, e.g. when there are complex parts or were the SQL-code is hard to get to work in the way you want it to do. Comment the SQL-code, whenever necessary.

We don't want code like:

SELECT substr(text, instr(text,'(')+1, instr(text,')') -instr(text,'(')-1 ) FROM text_table;

without any comment on what this is supposed to do or why.
Comments in SQL are done with C/C++ -like syntax:

/* comment start --- end after this */.


The previous code should look something like this:

SELECT substr ( text, /* search text for position of*/ instr(text,'(')+1, /* left parentheses */ instr(text,')')- instr(text,'(')-1 /* and right parentheses */ ) /* and return the string inside */ FROM text_table;

If there is any doubt or difficulty to identify the precedence of an expression this should be coded with surrounding parentheses.
This is an example, with comments and other functionality, taken from the dictionary view ALL_CATALOG:

create or replace view ALL_CATALOG (OWNER, TABLE_NAME, TABLE_TYPE) as select u.name, o.name, decode(o.type, 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER', 4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE', 'UNDEFINED') from sys.user$ u, sys.obj$ o where o.owner# = u.user# and o.type in (2, 4, 5, 6) and o.linkname is null and (o.owner# in (uid, 1) /* public objects */ or obj# in ( select obj# /* directly granted privileges */ from sys.objauth$ where grantee# in ( select kzsrorol from x$kzsro ) ) or ( o.type in (7, 8, 9) /* prc, fcn, pkg */ and exists (select null from v$enabledprivs where priv_number = -144 /* EXECUTE ANY PROCEDURE */ ) ) or ( o.type in (2, 4, 5) /* table, view, synonym */ and exists (select null from v$enabledprivs where priv_number in (-45 /* LOCK ANY TABLE */, -47 /* SELECT ANY TABLE */, -48 /* INSERT ANY TABLE */, -49 /* UPDATE ANY TABLE */, -50 /* DELETE ANY TABLE */) ) ) or ( o.type = 6 /* sequence */ and exists (select null from v$enabledprivs where priv_number = -109 /* SELECT ANY SEQUENCE */)))

<Up to Contents>

4 Style and layout

4.0 Appearance

All base SQL keywords should be written in upper case and indented for ease of readability.
Keywords should start on a new line.

4.1 Select-Statements.


Select statements should be written with the following keywords starting on a new line:

SELECT, INTO, FROM WHERE, AND, OR, GROUP by, HAVING, CONNECT by, FOR UPDATE of, ORDER by. SELECT t1.name, t2.name INTO l_first_name, l_second_name FROM parties t1, party_roles t2, pr_types t3 WHERE t1.code = t3.code AND t2.code = t3.code AND t3.code = l_code;

Indentation and alignment should be used to make the code more readable.
Columns should be prefixed with the table alias, or otherwise with the full table name.

SELECT E.Name "Name", D.Deptno "Department", D.Location "Location", E.Sal + nvl(Comm,0) "TotalSalary" FROM Emp E, Dept D WHERE E.Deptno = D.Deptno;

Fully qualify the columns in all SQL statements. If this is done the database does not have to search all the columns in all tables to find out in which table the column exists.
Use table aliases, column aliases if necessary and always use table prefixing for columns.
Always specify the column list in a select statement. If this is done, it will later be possible to add columns to the table, without re-coding existing applications.

Do not use the asterisk as a shorthand to select all columns in your SQL-code:

SELECT * FROM emp;

This might be used in dynamic programming or in ad-hoc situations, but normally not in your application code. The number of columns returned, must be placed in a the same number of receiving host program variables. If the table changes, the matching of columns against variables will fail.
There are exceptions, it does not apply when you declare %ROWTYPE in PL/SQL, or if you are writing dynamic SQL code in some 3-GL. The efficiency and the resource utilization will also be better if you just select the columns that you need.
Use column name references in the "ORDER BY" clauses, do not use column numbering reference ( unless it is necessary as in the case of unions ).

SELECT p.name, 'Dept:'||p.department FROM people p ORDER by p.name,p.department; /* do not use 1,2 it will not use indexes */

<Up to Contents>

4.2 Insert-Statements.

Insert statements should be written in a way that the following keywords starts on a new line:

INSERT, VALUES

Indentation and alignment should be used to make the code more readable.
Insert statements should always name the columns that the statement is inserting data into.
This will eliminate re-coding of the statement if the table is expanded with new columns or the table is recreated with the columns in a different order. If this is done then you don't have to specify null values for columns that is inserted with a null value or if a default value for the column will be inserted.

INSERT INTO emp ( ename, sal, empno, deptno ) VALUES ( l_ename, l_sal, l_empno, l_deptno);

<Up to Contents>

4.3 Update statements.

Update statements should be written with the following keywords starting on a new line:

UPDATE, SET, WHERE, OR, AND

Indentation and alignment should be used to make the code more readable.

UPDATE emp e SET e.sal = e.sal + 10 WHERE e.sal < 1000 AND e.comm is null;

Update and delete operations is usually done on single rows or on a subset of rows in the table. This implies that there are some search criteria's and a where-clause in the statement. This also implies that most statements of this kind should be optimized for ( read / search) performance as well as the select-statements.

<Up to Contents>

4.4 Delete statements.

Delete statements should be written in a way that the following keywords starts on a new line:

DELETE, WHERE, AND, OR, CASCADE

Indentation and alignment should be used to make the code more readable.

DELETE FROM emp e WHERE e.empno = 7139 AND e.deptno = 10;

To perform this type of operation ( deletes and updates ), you should previously have locked the row(s) with a locking statement. You should have fetched the row-address (the ROWID), and consequently you can use that in your search criteria instead of a value based search criteria again, when deleting or updating.

<Up to Contents>

4.5 Locking.

4.5.1 Automatic and manual locking.

The RDBMS have a number of locking functions. The most important reason is to protect data in the database, changed by one user, from being changed or overwritten by another user.
All SQL-statements that change data in the database will cause locking. This can be implicit (automatic) or manually declared and controlled.
Always try to use explicit ( manual ) locking in programs where data is changed and there ever might be more than one process or user changing the same data. Always consider the locking strategy when you are updating or deleting data.
Locks is acquired when these commands are issued to the database or when a cursor like these are opened:

SELECT ename FROM emp WHERE empno = 7139 FOR UPDATE of sal; UPDATE emp SET sal = sal + 10; DELETE FROM emp WHERE empno = 7139; INSERT INTO emp VALUES ( 1212,'Smith', 1212 ); LOCK TABLE emp IN EXCLUSIVE MODE NOWAIT;

The select, update and delete statements all creates exclusive row-locks on the rows that are hit by the statement. There are both exclusive locks and shared locks. Exclusive locks prevents other user to attain the same level of lock or a lower level of lock on the object. Shared locks may be attained by several users at the same time, and prevents other users to attain exclusive locks on the same object (or row).

<Up to Contents>

4.5.2 Table locks and row locks.

The database can provide and use locks both on the table level and on row level. Programs can attain both types of locks. Your programs should not use or cause exclusive table locking if it can be avoided. Table-locking should only be used after very thorough consideration of the consequences. If done this would increase the possibility of conflicts and decrease the concurrency between users.
Always use row-locking.
All statement causes shared table locks, these can be held by several users at the same time.
These locks are put on the table to prevent the data definition to change during use.
Table locks may be exclusive, but normal data manipulation operation does not require this type of locks.
They could be placed on the table with the:

LOCK TABLE ..... statement,

but this should be avoided.

<Up to Contents>

4.5.2 Key locks

The database automatically ensures the integrity of the "foreign key relation", by locking the master-detail related data. This is to prevent the key values in the relation to change during operations on the related keys. This locking is done on the column level, using the index tree.
The locking mechanism does not prevent any user to query the locked rows.

<Up to Contents>

4.5.3 How to do row locking.

Manual locking of the rows that are to be changed or deleted, should follow a common strategy among the applications, working on a set of tables, or competing of the same data. This is to avoid different locking strategies to cause concurrency problems. The coding should therefore be done in a streamlined and consistent way, in a fashion like Oracle*Forms does:

Step 1, Search and display information ( optional):

SELECT e.rowid, e.ename, e.sal INTO l_ename, l_rowid, l_rowid FROM emp e WHERE e.empno = l_empno1;

Step 2, Lock the row(s):

SELECT e.sal INTO l_sal FROM emp e WHERE e.rowid = l_rowid FOR UPDATE of e.sal NOWAIT;

Rowid is the actual physical address to the row. This is always the fastest way to access a single row, because you actually know where the row is, through the address.
By default the database waits for a lock until it is available, or the user sends a break to cancel the lock request. The NOWAIT option is not default.
The use of this option allows us to control and handle locking conflicts, as desired in our code. If the locking statement fails, the database will return the ORA-0054 error, and a "record locked" message.

Step 3, Update or delete the row:

UPDATE emp e SET e.sal = e.sal + l_AddSal WHERE e.rowid = l_rowid;

Step 4, Commit or rollback to end the transaction and to release the locks:

COMMIT;

If this is coded in PL/SQL or another 3-GL like language then these statements should rather be coded with cursors and "cursor for loops" than as exemplified above.
The "cursor for loop" construct uses an internal addressing of the row, which is as fast as the rowid address, but without any explicit fetching and storing of the address value.
From sample4.sql

/* This program modifies the ACCOUNTS table based on instructions ** stored in the ACTION table. Each row of the ACTION table ....... */ DECLARE CURSOR c1 IS SELECT a.account_id, a.oper_type, a.new_value FROM action a ORDER BY a.time_tag FOR UPDATE of status NOWAIT; BEGIN FOR acct IN c1 LOOP -- process each row one at a time acct.oper_type := upper(acct.oper_type); /* Process an UPDATE. If account does not exist, create a new account */ IF acct.oper_type = 'U' THEN UPDATE accounts a SET a.bal = acct.new_value WHERE a.account_id = acct.account_id; IF SQL%NOTFOUND THEN -- account did not exist. Create it. INSERT INTO accounts ( account_id, bal ) VALUES ( acct.account_id, acct.new_value); UPDATE action a SET a.status = 'Update: ID not found. Value inserted.' WHERE current of c1; ELSE UPDATE action a SET a.status = 'Update:Success.' WHERE current of c1; END IF; END LOOP; EXCEPTION WHEN TIMEOUT_ON_RESOURCE THEN -- locking conflict raise_application_error(-20101,'Record(s) locked by other users'); END;

<Up to Contents>


5 Minimizing system resources



5.1 Performance factors.

5.1.1 General

If the actual SQL-code is hidden from us, generated by some tool then we will not have control over the way code is generated, how it looks, or how it is sent to the server to be executed. We must therefore, try to gain information on how the access is done, and try to learn the important aspects of the tools processing. This knowledge should then be used to write code with the tool and from other interfaces, in a way that co-operates well with the used tools. All this should be done to avoid conflicts (e.g. locking ) or resource exhausting on the server. This means that the tool must generate efficient and well designed code, and that it is well documented so that it is easy to understand and analyze.

It is important that all SQL statement used from different programs are written in the same way, using exactly the same strings, to take advantage of the SQL-caching, done in the Rdbms kernel.
The two statements:

SELECT e.ename FROM emp e WHERE e.empno = 7139; SELECT e.ename FROM emp e WHERE e.empno = 7138;

is regarded as two different statements, the second one will be parsed and analyzed by the Rdbms in the same way as the first one and consume the same amount of both i/o and cpu-power.
Try to create and set up "libraries" of the access code and reference/use this code in your program code. This will guarantee that the same string are used every time you execute a SQL-statement. use parameter variables instead of strings if possible. The database will cache and reuse the information it has on those SQL-statements and consume less resources.
Reuse the actual string of the SQL-statement to take benefit of the SQL-caching.
If the example above instead is coded:

SELECT e.ename FROM emp e WHERE e.empno = :l_empno;

this could be sent to the database with different bind variable values (l_empno is my host variable with the value 7139 and 7138 at different times). This would then be regarded as only one SQL statement by the database, eliminating the analyze and parse overhead the second and subsequent times it is handled by the rdbms.
This also implies that dynamic SQL should be avoided and that all variable information should be communicated via bind variables from your host programs.

<Up to Contents>

5.1.2 Data access performance.

To create efficient applications we must write our SQL-operations as efficient as possible.
A lot of things influence the applications performance. The database-design, the table layout and the construct of indexes. The design of PK- and FK- constraints and the physical data storage is also of great importance. This together with how data is used and actually accessed at the data layer in the database will decide how much power the application will use in the database server.
We have to consider the effect of all this in terms of memory usage, CPU-utilization and i/o efficiency.
Sending a SQL-statement to the database requires the database to analyze the statement in several steps,
for accuracy,
access rights,
and to optimize

that statement for fastest access path.
Analyzing, opening and closing of cursors is therefore a relative expensive process that requires both memory-, i/o and CPU- resource. Access optimization of the SQL statements should always be done. There are several ways to analyze the work done on the database side. By using the explain command, or The database trace functions and program utilities. We should use one of these functions to verify that the statements we are sending to the database are using the expected indexes and that access times are in the expected and required ranges.

<Up to Contents>

5.1.3 The Explain command.

You can use the explain command to examine each SQL access that you submit o the database and look at the result and optimization. To do this you execute the command:

EXPLAIN PLAN SET statement_id = 'jm1' INTO plan_table FOR select * from emp where sal > 1000;

This will results in an access analyze from the database optimizer without actually executing the statement. The result is placed in your plan_table table, and you may then read the result with a SQL-statement.
To use this function regularly you should set up procedures and reports to handle the information efficiently or use Oracle's or some other commercially available tool that does this for you.

<Up to Contents>

5.1.4 The Trace function.

The database have a couple of ways to turn on a SQL trace function and direct commands to turn this on.

One way to do this in your applications is with the SQL command:

ALTER SESSION SET sql_trace = true;

and you turn it off with:

ALTER SESSION SET sql_trace = false;

These commands should be possible to execute from any of your programs if needed.
You should always provide a way to turn on the trace function in the application program. Make sure that a "TRACE" switch is provided ( in the same way ) in all program and if possible, in a similar way in all program environments.
Oracle tools often provides a switch for this purpose, so no extra coding will be required. This is sometimes the STATISTICS switch and can be found among the run options.

What happens is that the database writes a trace file on the server, on behalf of the user. This file may then be analyzed in different ways and with different programs. As a traditionalist I like the " tkprof "-program that is supplied by Oracle since version 4 or something. The tkprof-program is normally executed on the database server, where the trace files are created. The program can give different information, depending on how the database is set up, and depending on the server operating system. It is also possible to ask tkprof to execute the explain-command for all SQL-statements in the trace file. This is really a useful and the simplest way to do an application performance analysis.

Command line syntax for tkprof is:

tkprof tracefile.trc outfile.prf [sort=option] [explain=usr/pwd]

File naming and place of the trace file is system dependent, so ask your system administrator to set this up and give you information on where and how they can be accessed.

<Up to Contents>

5.1.5 The Optimizer.

The database engine have developed significantly during the years and have had a number of different changes to the optimizer. A RULE-based optimizer was the only optimizer in a number of earlier Oracle versions, and a STATISTIC optimizer that was introduced with version 7 of the database.
The rule optimizer had a small number of rules that it applies to the SQL statement to determine the access path. These rules are not obsolete and are sometimes used if forced to or if no other access information is available. It these cases it will happen that several rules end up with the same weight or priority, and that the rule that was found first will be applied. The effect is that the order of table names in the from clause, and the order of the where conditions may be important when the rule optimizer is used.
The statistic optimizer has a more extensive set of rules and does take into account the statistic information stored with the tables and indexes in the database. Using this optimizer requires statistic information to be collected with the "analyze" command. This also means that it is important to have the right amount of data in the tables during development and testing.

<Up to Contents>

5.1.6 Giving hints to the Optimizer.

There are a lot of hints that can be given in the SQL-code to give instructions to the database how to optimize the code. This is done with a special type of comment format in C or ADA style,

"*+ Start of hint ... until *" or "--+ Hint until end of line". SELECT /*+ index ( i_empno ) */ e.ename FROM emp e WHERE e.empno = 7139;

If you think you know the best way to do the actual access to data and that this never will change, then you may use hints. Otherwise it should be avoided. It could case more damage than good. Here are some hints possibilities:

ALL_ROWS The Intention should be to retrieve all rows as fast as possible. AND_EQUAL (tab.idx, tab.idx) Merge index in the search. CLUSTER (table) Read the table via the cluster. FIRST_ROWS The intention is to retrieve the first rows as soon as possible. FULL (table) Do a full scan on table. HASH (table) Do read with a hash-scan. INDEX (tab.index) Read the table though this index INDEX_ASC (tab.index) Read the table through this index, ascending, if range-scan. INDEX_DESC (tab.index) Read the table through this index, descending, if range-scan. ORDERED Use the order of the tables in the from clause to perform join. ROWID (table) Read with the tables row-address. RULE Use the RULE-based optimizer. USE_MERGE (table) Do a sort-merge-join USE_NL (table) Use a nested loop to do the join.

Use your best judgment to decide when to let the Rdbms do all of the optimization and when you know the best way to perform the access.

<Up to Contents>

5.1.7 Indexing

Index are created for several reasons, primarily to increase performance, to give faster access to data for key values that you use and to enforce uniqueness in the tables. The decisions for where, on what and how the indexes should be created should be done in the design phase. The design tool that you use should help you define most indexes that you need but you should always be prepared to examine and evaluate the outcome manually.

Create index to enforce uniqueness and primary keys.
Create index on foreign key columns.
Create indexes on search items in the tables.
Create indexes on columns that has a good selective impact.
Try to keep the number of indexes few ( < 5 ) on tables that are frequently updated.

Defining and creating the right indexes can speed up retrieval operations. It may eliminate sort operations. But if the number of rows that are retrieved via an index is greater than 20 - 30% of the total number of rows, a full table scan is probably faster than a scan via one or more indexes.
Index degrade performance in the database on update, delete and insert operations. The more indexes that the database has to be maintained on a table, the more power and i/o is required in operations that change the index tree.
Indexes takes up database space. The index space is often as large as the actual data in the database.
Avoid creating indexes on columns that contains few different values.

<Up to Contents>

5.1.8 Index Usage

To get the fastest possible speed on searches in the database you must provide good search conditions that the database may use in the optimization. You always have a number of ways to write your SQL-statements, and choose whether you think that a join or a sub query is the best way for the database to access the data. Different ways of writing conditions, although they might be similar in appearance will probably be treated and optimized differently. Every time you create or change a new SQL-statement you have to take the access performance consequences in consideration, what columns that are indexed, what primary key's that exist on the table, how selective your search criteria is etc.
The database has a number of different ways to retrieve the rows requested by a SQL-statement.

In order of efficiency the most important are:
By ROWID.
By hash-key-value.
By index search.
By a full table scan.

The ROWID is the actual address to the row, so if that is known that is a retrieval "without a search". This is a direct fetch of the row from a known place, and therefore the fastest way to fetch a row from the database.
This can often be used in the secondary operations, update and deletes, in your applications. The rowid is stored in the index, so when an index search is done, the index tree is searched for the index value. When this is found the row address is used to retrieve the row with it's associated data.
When there is a hash cluster the key values is used to calculate the place that the row should be stored or retrieved from. This approach does not need any i/o find the address of the row, it is always calculated and will only use CPU-power. This type of search is only efficient when you search for distinct values.

Full table scans ( reading a table in full, sequentially ) are the most efficient way to retrieve data when more than 20 - 30% of the tables rows are to be retrieved. Index searches are most effective when single rows are to be retrieved in an operation.
"Where-clauses" that causes extensive full table scans and other non-expected behaviors should identified in the applications and should be corrected if not intended.

When searches are made for specific values there are several comparison types that you will use:

Equality:
Equality searches ( comparing values that should be equal to a constant or a value in another table ) of indexed columns is very efficient and easy to use is:

WHERE index_column = FIXED_VALUE;

This will make it possible for the database to do a direct index scan search.

Concatenated indexes:

If concatenated indexes are used, make sure that all columns from the beginning of the index is given in the search criteria's.

WHERE index_column1 = FIXED_VALUE1 AND index_column2 = FIXED_VALUE2;

This will use an index if the index is created with index_column1 first and index_column2 next. If the index has a column index_column0 as the first part in the concatenated index the index can not be used. Likewise you will not cause an index scan if you only refer to index_column2 in your search criteria.

Null values:

Be especially careful and avoid search criteria like:

WHERE col_x is null or WHERE col_x is not null

This will often result in a full table scans, because no index entry can exists for a value that does not exist.

Functions:

Do not apply functions on columns that are used in the search criteria's that should use indexes. If done, the database will not be able to use the index, because the index value is modified by the function, and a full table scan will be done. The possibility to create index on functions are seldom used in my experience.

Avoid unnecessary sorting to gain performance in:

Set operations:

Set operations like "union, intersect, and minus" will cause a sort/merge of the sets retrieved. Be sure that the number of rows retrieved from each set operation is small.

Distinct:
The "DISTINCT" operator should be avoided, because it will always result in a sort to ensure that distinct values will be returned.

Order by and Group by:
Order by and group by will use indexes if the columns referenced in the list are indexed. The same rules as for searches applies for these "sort" operations.

Correlated subqueries:
Be careful when using correlated subqueries. The subquery will be executed for each record retrieved in the master query and could cause an extensive amount of database access.

<Up to Contents>

5.1.9 Network considerations

Network communication is an other factor to take into consideration. Every SQL-call causes a number of packages to be sent to and from the program/client to the database server. The more information the SQL-statement contains the more calls have to be made. The number of columns in a select statement might therefore be important in this respect. The size of the data is another factor. The larger the data is the more packages. If Oracle's array interface is used then more rows can be transferred in fewer packages. One SQL-statement normally causes 15 - 20 calls to be executed. In an client/server environment it is essential to try to keep the total number of SQL-statements as low as possible.
This also supports the argument that you should write more complex SQL-functions and try to do as much work as possible in each call to the database. If you split up your database access to several distinct select statements and send them one after one to the database, perform some data manipulation in between and interact with the database a number of times instead of writing a more complex SQL-statement the result is probably a much slower application. A major performance gain can often be archived by this approach.

<Up to Contents>

5.1.10 Memory

One goal when you try to create efficient applications, should be to keep the total number of different SQL operations as low as possible both for the reason explained above and for memory reasons.
If information is needed from two or more tables, then it is usually more efficient to join these tables in one SQL-statement than to code a number of separate SQL cursors read the information from them and combine the data from them in the application.
The database tries to reuse the information it has about the database, and the SQL-statements through a SQL-caching mechanism. When the database receives a SQL statement it compares this string to all SQL statements already in use, to see if it have information on how to perform the SQL-statement. If found it will use the parse and access information already in the cache.
Each SQL statement needs a number of memory data areas to communicate with the database. The "cursor area" holds information about the SQL-statement and the data that are to be transferred between the database and the application. Each SQL-statement has it's own data areas and the total size is typically in the range of 5kb - 50kb. The more cursors that are active and open the more memory we consume, both on the server side and on the client side. This areas are released when the cursors is closed. If your application requires hundreds of simultaneously open cursors, you are probably using a lot more system resources than expected, and this might be constraining in a resource limited system. So by closing the cursors that are no longer needed you free some memory, If you need to reopen the cursor then you will cause the database to reevaluate the statement and use more cpu and memory again. If this is the case you are not using the power of the SQL language optimally.

<Up to Contents>

5.1.11 Use of Cursors

Programs both in PL/SQL and other environments should be written in a way to minimize the number of cursors used. This is to improve performance and optimize the internal communication. To access data in the database one should try to declare and use explicit cursor declaration and not implicit cursors.
Use as few cursors as possible.
Implicit cursors:

SELECT e.ename, d.dname INTO l_ename, l_dname FROM emp e, dept d WHERE e.deptno = d.deptno;

Explicit cursor:

DECLARE CURSOR cr_emp1 is SELECT e.ename, d.dname FROM emp e, dept d WHERE e.detpno = d.deptno; l_ename emp.ename%TYPE; l_dname dept.dname%TYPE; BEGIN OPEN cr_emp1; FETCH cr_emp1 into l_ename, l_dname; CLOSE cr_emp1; END;

Because SQL is a set oriented language it should assume that when an implicit cursor is executed it is supposed to fetch a set of records. If you are coding with implicit cursors, and you only expect to fetch only one row, the database will fetch one row from the table and return that data to you. It will then again try to fetch another row, at least to be sure that there are no more rows to be fetched. If there were, it would return the error code for "TOO MANY ROWS" to you. This means that if the aim is to get only one or no record back to the program, the implicit cursor will do two accesses in the database and more work than if you use an explicit cursor and stopped fetching data after the first row is returned.

<Up to Contents>


6 Designing the system and Testing it.



6.1 General

All non obvious or very simple SQL statements must be analyzed and tested for performance, security and accuracy of the result.

6.2 Security aspects

To be able to do realistic tests we need to test all functions in a production like database. We also need realistic data and a user-like database identity. This identity should not have more privileges than is intended for the user that is going to run the application. Either do the test under a specific user identity, given to you by the database administrator, or let the database administrator give you the same rights as is intended for the user or create a separate user to own, create and manipulate objects and run all applications and do development as a different user.
Views can be used to enforce security and limit the data a user may access.
Views might be seen as the users API to a system and is therefore an important part of an open system.
They can act as a filter for the data in the database to make it easier to access.
There may be performance consequences if the views are created and used in a way that makes the optimizer to perform separate operations to construct the view before the users queries might be applied to the view.

<Up to Contents>

6.3 Database design

Use good tools and methods to design and verify your database design. A bad design is hard to optimize in respect of performance. We should perform both a logical and a physical database design. It is almost always true, that a strictly normalized database perform better in a mixed and evolving system, than a poorly normalized database. Deviation from the ultimate database design can improve performance for certain programs or a set of programs, but the compromises should not be made until the logical design has been made, requirements is known and all aspects of the relationships between data and it's keys has been established.

When the logical design is done, a physical implementation should be done. Design compromises that affect the performance may be implemented in the physical implementation. Do not start to compromise in the analyze phase ( logical database design ), to achieve physical performance gains. The physical storage of data and index will influence performance on critical or large tables in the database. Always provide a storage clause to the table specification and do not accept the default storage parameters.

<Up to Contents>

6.4 Constraints

The declarative constraints can replace much application code if used right. It will help you control data in a very thoroughly way. It allows you to define a lot of checks and consistency controls directly on the table. It may influence performance and increases concurrency problems.
Be sure that primary-key- and foreign-key- constraint columns are indexed, and that the index are constructed in the same way as the constraint declaration. The index created on the constraint columns are used by the database in the process of securing the relational constraints. If an index do not exist the database will issue a table lock, to avoid a scan of the whole table, and this will decrease the concurrency of the system.
The primary- and unique- key constraint declarations creates an index automatically for you, but the foreign key constraint does not.
Check constraint ( and not null ) constraints is less harmful in these aspects and should be defined whenever possible.

<Up to Contents>

6.5 Accuracy

To test for accuracy use a reasonable scaled database, with a realistic spread of data and key values. Be sure that your test database contains data, scaled to at least 10 - 30% of the expected amount of data. This is especially important for tables that will be larger than a couple of hundred of rows.

80% 1000 1

Nice things

images/ora_logo.gif
images/mysql_100x52.gif


And OpenSource Sw

images/samba.gif


I use fedora linux

try it or try

Ubuntu or PcLinuxOs


Share information and knowledge!

images/Nohat-logo-nowords-bgwhite-200px.jpg

I'm based in Jersey, Available for assignments in all of Europe, On-Site or Remote. Call or mail me!

Jersey Information