This introduction and additional information is available as a PDF file
The Chapman & Hall/CRC Chemical Database is a structured database holding information on chemical substances. It includes descriptive and numerical data on chemical, physical and biological properties of compounds; systematic and common names of compounds; literature references; structure diagrams and their associated connection tables. The Dictionary of Organic Compounds Online is a subset of this and includes all compounds contained in the Dictionary of Organic Compounds (Sixth Edition and Supplements), Dictionary of Organophosphorus Compounds and some other publications (with the omission of the majority of natural products, which are available in the companion Dictionary of Natural Products Online).
In general, DOC includes the following compounds:
DOC also covers the principal organic compounds of Se, Te, As, Sb and Bi. There is also selective coverage of organo-B and organo-Si compounds, especially those of importance in synthetic organic chemistry. For a fuller coverage of the organometallic compounds of B, Si, As, Sb and Bi, see the companion Dictionary of Inorganic and Organometallic Compounds Online.
DOC does not in general document isotopically labelled variants except for a few isotopically labelled drug variants used in chemotherapy or tracer work.
In the database, closely related compounds are grouped together to form an
entry. Stereoisomers and derivatives of a parent compound are all listed under
one entry. The compounds in the Dictionary of Organic Compounds are grouped
together into approximately 71,000 entries. The structure of an entry is shown
In a simple entry, there is just one compound, with no derivatives or variants. Variants are commonly stereoisomers, e.g., (R)-form, endo-form. Derivatives may include hydrates, complexes, salts, classical organic derivatives, substitution products and oxidation products, except when the derivative has a relatively extensive literature of its own or where the derivative requires special stereochemical description (e.g. some simple chiral sulfoxides have individual entries and therefore are not shown under the parent sulfide). In such cases the derivative entry is cross-referenced from the parent compound.
The most common functional groups and their commonest derivatives are listed below. The order in which the derivatives are listed here is the order in which they appear in DOC entries under the parent compound.
Carbonyl compounds, RR'C=0
Carboxylic acids, RCOOH
Amines, RNH2, RR'NH,RR'R"N
Disulfide, R-S-S-R (sometimes included; usually a readily obtainable oxidative dimer of the thiol)
Molecular formulae are included in DOC for nearly all of these derivatives and so are readily searchable, whether they are documented as derivatives or have their own individual entry. Molecular formulae are not in general given for salts, hydrates or complexes (e.g. picrates) nor for most �characterisation� derivatives of carbonyl compounds such as 2,4-dinitrophenylhydrazones and semicarbazones.
Where a derivative appears to have been characterised only as a salt, the properties of the salt may be given under the heading for the derivative. In such cases the data is clearly labelled, e.g. Mp 179º (as hydrochloride).
A completely consistent scheme for covering all such entries is not possible or desirable. Some variation is necessary in the way the appropriate DOC entries are organised in order to cover the various possibilities, but it is hoped and believed that the maximum possible clarity has been achieved. The general principles which have been followed are described here using as an illustration one of the commonest types of tautomerism exhibited by simple organic compounds, which is heterocyclic NH OH prototropy as exemplified by 2- and 4-hydroxypyridines.
The format of a typical entry is given in Fig.1, and shows the individual types of data that may be present in an entry.
All of the names discussed below can be searched using the Chemical Name field.
The DOC Name is that chosen to head each entry and is that which, in the opinion of the Editors, is most likely to be known by, and of use to, most readers. Systematic DOC Names following IUPAC conventions are used wherever convenient, but trivial names may be used for more complex structures such as pharmaceuticals and natural products. In cases where no one name stands out as being clearly more familiar or convenient than others, the Chemical Abstracts name is normally used as the entry name. The American spelling sulf- for organosulfur compounds, is used throughout DOC in preference to the British sulph-. For a fuller treatment of nomenclature principles and details, see The Organic Chemist's Desk Reference (Chapman & Hall/CRC, 1995).
An important function of DOC is to present a wide range of synonyms. In general, the selection is made as useful as possible, but no attempt is made to provide exhaustive lists of proprietary names for pharmaceuticals and other commercial substances.
Archaic systematic names are in general not given, but obsolete synonyms have often been retained where there has been a change in numbering of the parent ring system and these synonyms could assist readers who have to consult the older literature. In a few cases incorrect synonyms from the literature have also been reported. Synonyms in these classes are distinguished as 'obsol' or 'incorrect' respectively. Several obsolescent systems such as the carbinol and hydroxyalkane alcohol nomenclatures have been almost completely discarded, since although they are still occasionally met with, users should have no difficulty in converting these to the normal nomenclature.
Names which are known to be duplicated within the chemical literature (not necessarily within DOC), are marked with the sign †. These are usually duplicate trivial names for natural products or pharmaceuticals, but there are a few cases (of organophosphorus compounds) where two or more compounds of different structure have been allocated the same CAS name.
CAS Registry Numbers are identifying numbers allocated to each distinctly definable chemical substance indexed by the Chemical Abstracts Service since 1965 (plus retrospective allocation of numbers by CAS to compounds from the sixth and seventh collective index periods). The numbers have no chemical significance but they provide a label for each substance independent of any system of nomenclature.
In DOC, much effort has been expended to ensure that accurate CAS numbers are given for as many substances as possible. If a CAS number is not given for a particular compound, it may be (a) because CAS have not allocated one, (b) very occasionally, because an editorial decision cannot be made as to the correct number to cite, or (c) because the substance was added to the DOC database at a late stage in the compilation process, in which case the number will probably be added to the database soon.
At the foot of the DOC entry, immediately before the references, may be
shown additional registry numbers. These are numbers which have been
recognised by the DOC editors or contributors as belonging to the entry
concerned but which cannot be unequivocally assigned to any of the
compounds covered by the entry. Their main use will be in helping those who
need to carry out additional searches, especially online searches in CAS or
other databases, and who will be able to obtain additional hits using these
numbers. Clearly, discretion is needed in their use for this purpose.
Additional registry numbers may arise for a variety of reasons:
In each entry display there is a single diagram which applies to the parent entry. Separate diagrams are not given for variants or derivatives.
Every attempt has been made to present the structures of chemical substances as accurately as possible according to current best practice and IUPAC recommendations. In drawing the formulae, as much consistency as possible between closely related structures has been aimed at. Thus, for example, sugars have been standardised as Haworth formulae and, wherever possible in complex structures, the rings are oriented in the standard Haworth manner so that structural comparisons can quickly be made. In formulae the pseudoatom abbreviations Me, Et and Ac for methyl, ethyl and acetyl respectively, are used only when attached to a heteroatom. Ph is used throughout whether attached to carbon or to a heteroatom. Other pseudoatom abbreviations such as Pri for isopropyl and Bz for benzoyl are not used in DOC.
Care must be taken with the numbering of natural products, as problems may arise due to differences in systematic and non-systematic schemes. Biogenetic numbering schemes which are generally favoured in DOC may not always be contiguous, e.g., where one or more carbon atoms have been lost during biogenesis.
Structures for derivatives can be viewed in Structure Search, but remember
that these structures are generated from connection tables and may not always
be oriented consistently.
Where the absolute configuration of a compound is known or can be inferred from the published literature without undue difficulty, this is indicated. Where only one stereoisomer is referred to in the text, the structural diagram indicates that stereoisomer. Wherever possible, stereostructures are described using the Cahn-Ingold-Prelog sequence-rule (R,S) and (E,Z) conventions but, in cases where these are cumbersome or inapplicable, alternatives such as the α,ß-system are used instead. Alternative designations are frequently presented in such cases.
The structure diagrams for compounds containing one or two chiral centres are given in DOC as Fischer-type diagrams showing the stereochemistry unequivocally. True Fischer diagrams in which the configuration is implied by the North-South-East-West positions of the substituents are widespread in the literature; they are quite unambiguous but need to be used with caution by the inexperienced. They cannot be reoriented without the risk of introducing errors
In the simplest case, the four substituent atoms about a tetrahedral carbon atom are placed in order of decreasing atomic number and the molecule is then viewed from the side remote from the substituent of lowest priority. The configuration is (R) (rectus) if the order of the three other groups from highest to lowest is clockwise, and (S) (sinister) if it is anticlockwise.
If two or more of the four atoms attached to the central atom are identical, the molecule is explored outwards by a process of comparing atom with atom.
Extensions of the (R,S)-system refer to situations such as axial and planar chirality (biaryls, cyclophanes, etc.) and to molecules with central atoms other than carbon (e.g. chiral sulfoxides).
Where only the relative configuration of a compound containing more than one chiral centre is known, the symbols (R*) and (S*) are used, the lowest-numbered chiral centre being arbitrarily assigned the symbol (R*). For racemic modifications of compounds containing more than one chiral centre the symbols (RS) and (SR) are used, with the lowest-numbered chiral centre being arbitrarily assigned the symbol (RS). The racemate of a compound containing one chiral centre only is described in DOC as (±)-.
In comparing CAS descriptors with those given in DOC, it is important to remember that the order of presentation of the chirality labels in CAS is itself based on the sequence rule priority and not on any numbering scheme. For example in DOC, the following compound:
is (2R,3S)-2-Amino-3-hydroxyphenylpropanoic acid. In CAS it is [S-(R*,S*)]-β- Hydroxyphenylalanine. The relative stereochemical label (R*,S*) is first applied with the R* applying to chiral centre 3 because it has higher priority than centre 2 (OH > NH2). The absolute stereochemical descriptor (S)- is then applied changing R* to S for chiral centre 3 and S* to R for chiral centre 2. For further details, see the current CAS Index Guide. For simplicity, the enantiomers of bridged-ring compounds, such as camphor, are described simply as (+)- and (�)-. Although camphor has two chiral centres, steric restraints mean that only one pair of enantiomers can be prepared.
The (R,S) descriptor system can be extended to describe the configurations of many types of symmetrical compound, e.g. the 1,4-Dimethylcyclohexanes.
At chiral centre 1, an arbitrary choice is made between the two equivalent
sequence chains (a) and (b). Choosing (a) arbitrarily gives
For further information on the (R,S)-system, see Cahn, R,S et al, J. Chem. Soc., 1951, 612; Experientia, 1956, 12, 81; Angew. Chem. Int. Ed. Engl., 1966, 5, 383.
Where appropriate, alternative stereochemical descriptors may be given using
the D, L or α,ß-systems. For a fuller description of these systems, consult The Organic Chemist's Desk Reference (Chapman & Hall, 1995).
The elements in the molecular formula are given according to the Hill convention (C, H, then other elements in alphabetical order). The molecular weights given are formula weights (or more strictly, molar masses in daltons) and are rounded to one place of decimals. In the case of some high molecular mass substances such as proteins the value quoted may be that taken from an original literature source and may be an aggregate molar mass.
Care has been taken to make the information given on the importance and uses
of chemical substances as accurate as possible. Data in this field may be
searched under Use/Importance or All Text.
Organic compounds are considered to be colourless unless otherwise stated.
Where the compound contains a chromophore which would be expected to
lead to a visible colour, but no colour is mentioned in the literature, the DOC
entry will mention this fact if it has been noticed by the contributor. An
indication of crystal form and of recrystallisation solvent is often given but
these are imprecise items of data; most organic compounds can be
crystallised from several solvent systems and the crystal form often varies. In
the case of the small number of compounds where crystal behaviour has been
intensively studied (e.g. pharmaceuticals), it is found that polymorphism is a
very common phenomenon and there is no reason to believe that it is not
widespread among organic compounds generally.
Melting points and boiling points
The policy followed in the case of conflicting data is as follows:
These are given whenever possible, and normally refer to what the DOC contributor believes to be the best-characterised sample of highest chemical and optical purity. Where available an indication of the optical purity (op) or enantiomeric excess (ee) of the sample measured now follows the specific rotation value.
Specific rotations are dimensionless numbers and the degree sign which was
formerly universal in the literature has been discontinued.
Densities and refractive indexes
Densities and refractive indexes are now of less importance for the identification of liquids than has been the case in the past, but are quoted for common or industrially important substances (e.g. monoterpenoids), or where no boiling point can be found in the literature.
Densities and refractive indexes are not quoted where the determination
appears to refer to an undefined mixture of stereoisomers.
Solubilities are given only where the solubility is unusual. Typical organic
compounds are soluble in the usual organic solvents such as ether and
chloroform, and virtually insoluble in water. The presence of polar groups (OH,
NH2 and especially COOH, SO3H, NR+) increases water solubility.
pKa values are given for both acids and bases. The pKb of a base can be
obtained by subtracting its pKa from 14.17 (at 20°) or from 14.00 (at 25°).
Spectroscopic data such as uv wavelengths and extinction coefficients are given only where the spectrum is a main point of interest, or where the compound is unstable and has been identified only by spectroscopic data.
In many other cases, spectroscopic data can be rapidly located through the
Toxicity and hazard information is highlighted by the sign , and has been selected to assist in risk assessments for experimental, manufacturing and manipulative procedures with chemicals.
Physical, reactive and toxic properties all contribute to the hazard associated with a particular chemical. As part of the physical data, flash points, explosive limits and autoignition temperatures have been included (where appropriate). Flammability classifications, which are based on flash point measurements and boiling points, are also mentioned, and the opportunity has been taken to include UK occupational exposure limits, or for some compounds threshold limit values published by the American Conference of Governmental Industrial Hygienists (ACGIH).
For the reactive hazards, a brief comment is made on any explosive (or violent polymerisation) properties and aspects of the chemical reactivity of a substance which are of concern. These include the potential for peroxidation, oxidizing/reducing properties and incompatibility with commonly available chemicals.
Toxicity information has been chosen to show hazardous effects from short-term or long-term exposure. Observations from human exposure are summarised if available (including possible adverse effects of drugs), otherwise experimental (exp.) tests are quoted. Included in the toxicity data are the results of irritancy tests, acute lethality data, target organ toxicity, and carcinogenic and reproductive properties where appropriate. Those chemicals which have been classified by the International Agency for Research on Cancer (IARC) as human carcinogens, probable human carcinogens or possible human carcinogens have been identified in DOC accordingly.
The Publishers cannot be held responsible for any inaccuracies in the reported information, neither does the omission of hazard data in the Dictionary imply an absence of this data from the literature. Widely recognised hazards are included however, and where possible key toxicity reviews are identified in the references. Further advice on the storage, handling and disposal of chemicals is given in The Organic Chemist's Desk Reference.
Finally, it should be emphasised that any chemical has the potential for
harm if it is carelessly used. For many newly synthesised materials (e.g. new
synthetic reagents), hazardous properties may not be apparent or may have
been cited in the literature. In addition, the toxicity of some very reactive
chemicals may not have been evaluated for ethical reasons, and these
substances in particular should be handled with caution.
RTECS® Accession Numbers
RTECS® Accession Numbers are compiled and distributed by the National Institute for Occupational Safety and Health Service of the U.S. Department of Health and Human Services of The United States of America. All rights reserved. (1996)
Many entries in DOC contain one or more RTECS® Accession Numbers. Possession of these numbers allows users to locate toxicity information on relevant substances from the NIOSH Registry of Toxic Effects of Chemical Substances, which is a compendium of toxicity data extracted from the scientific literature.
For each Accession Number, the RTECS® database provides the following
data when available: substance prime name and synonyms; date when the
substance record was last updated; CAS Registry Number; molecular weight
and formula; reproductive, tumorigenic, and toxic dose data; and citations to
aquatic toxicity ratings, IARC reviews, ACGIH Threshold Limit Values,
toxicological reviews, existing Federal standards, the NIOSH criteria
document program for recommended standards, the NIOSH current
intelligence program, the NCI Carcinogenesis Testing Program, and the EPA
Toxic Substances Control Act inventory. Each data line and citation is
referenced to the source from which the information was extracted.
The selection of references is made with the aim of facilitating entry into the literature for the user who wishes to locate more detailed information about a particular compound. Thus, in general, recent references are preferred to older ones, particularly for chiral compounds where optical purity and absolute configuration may have been determined relatively recently. The number of references quoted cannot therefore be taken as an indication of the relative importance of a compound, and the references quoted for important substances may not be the most significant historically. For very common compounds which are nowadays readily available from bulk suppliers, long lists of syntheses are not presented, but the emphasis is on references to spectra, chromatography, etc.
References are given in date order except for references to spectroscopic library collections, which sort at the top of the list, and those to hazard/toxicity sources which sort at the bottom.
The content of many references are indicated by means of suffixes. A list of the most common ones is given in Table 1.
Some reference suffixes are now given in boldface type, where the editors consider the reference to be particularly important, for example the best synthesis giving full experimental details and often claiming a higher yield than previously reported methods.
In some entries, minor items of information, particularly the physical
properties of derivatives, may arise from references not cited in the entry.
In general these are uniform with the Chemical Abstracts Service Source Index
(CASSI) listing except for a short list of very common journals:
DOC 21.2 Copyright © 2013 Taylor & Francis Group
All Rights Reserved