The entity relationship diagram gathers all types of records -- from all over the system -- into one place and works out how they are related. It then maps directly into the structure of most DBMSs.
An ERD shows a single database -- or a part of a larger database. There is no official way in an ERD to indicate that data is spread out over several data bases or file systems -- largely because it is a tool for integrating data onto one system. Even when you distribute data over several servers you need to first draw a single data base that ignores the deployment. It is (in my experience) rare -- probably due to some strange non-functional requirement -- that you distribute data over several "systems". And example would be the Internets DNS system -- millions of servers with their own data but linked into a single data base. ERD are not used to describe the placement of data in a system. We use an UML deployment diagram to show how we distribute various parts of a system onto hardware. We use them to show that some data is held on one system and the rest of it elsewhere. We saw these in [ a3.html ] earlier in this course.
_____ is taking class _____ with teacher ____ at time ____Filling in the blanks generates facts. Listing the blanks in a template gives us a table in a data base! A Binary relation would have a sentence with two blanks, a ternary with 3, and so on. The number of columns equals the number of blanks in the template.
RBOTTING CS372
| Teacher | Course |
|---|---|
| RBOTTING | CS372 |
| TYU | CS310 |
| KVOIGT | CS201 |
| ... | ... |
| Course | Section | Room | Time | Teacher |
|---|---|---|---|---|
| CSCI372 | 01 | JBH146 | 4-6pm TuTh | RBOTTING |
| ... |
Diagrams have special ways to show the fields/columns/attributes in a table/file/entity. You need to know them all since all are in use.
Key
Keys are fields or attributes that uniquely identify records in files/tables uniquely.
They are used to connect the different tables in a data base together.
I will show how these are documented in an ERD when we cover normalization.
There are many types of key: primary, candidate, foreign, secondary. They
are the heart of normalization and the most popular type of database: the
relational data base. Relational data bases and normalization are in
[ d4.html ]
and will covered later.
Cardinality and Multiplicity
The data base community talk about Cardinality and the UML community
talk about multiplicity. In both cases the question is
The common answers are: one, many, zero or one, and one or more.
In a relational database we talk about tables and each row in the table is an object or entity. Cardinalities and Multiplicities count how many entities are found in a given table's columns -- is each one unique or are there many of them.
A Conceptual Data Base Design Process
Second, there must be many occurrences of the entity. So, we don't store the name of CSUSB in a CSUSB data base: there is only one CSUSB. However in the Chancellor's office CSUSB will be an occurrence of the entity Campus.
Third, the occurrences should be identifiable and countable. Thus Butter is not an entity, but CaseOfButter would be. On current and/or planned systems is there an ID or number(#)? How many occurrences are there currently? Record this "frequency" because it will determine the speed of the resulting programs!
Fourth, it may be something that has data stored in the current system. It is nice if it is something that will need to have data stored in the future system. So, in a system about students and faculty, Student and Faculty are likely to be entities. Ask: If we lost the data, how much damage would this do to the enterprise? For example, CSUSB doesn't get hurt much if it doesn't know the color of your hair. Notice: you can extract a lot of structure from existing and proposed input and output formats. [ d4.html ]
Fifth, the occurrence should have interesting life cycles that last longer than a single program, function, or application. Ask: when do these entities get created and when are they destroyed?
Finally, most attributes are not entities. For example, Address may be an entity but NumberOnStreet is not. Things that are elementary data -- numbers, measurements, character strings, etc don't make good entities. They will end up as attributes/fields/columns of an entity/class/table. Be careful to distinguish attributes from identifiers and keys which point to entities. Also do not confuse the identifier/key with the type of entity it points at: The ISBN is an identifier for "Book" but is not itself an entity. Book is an entity with attribute ISBN.
Student "J Coyote" has enrolled in section "01" of class "#CSCI372".Given that "Student" and "Section" are entities in your ERD then you have uncovered the relationship: "Enrollment" between them. You can also see another relationship in the above sentence: Each "Section" is for a particular "Course". Exercise: sketch a quick informal ERD showing: Course, Section, Enrollment, and Student.
Here are some classic analysis patterns that I've noticed over the years. Use this list to help discover ERD models:
Master and Detail Pattern Example: A Sale has a number of Items. An Order has a number of order items.
Factoring attributes -- When two entities have a many-to-one relation between them and the entity at the many end has attributes, check to see if they can be moved to the other end. You can do this if all the entities that map into one entity all have the same value for an attribute. It can be shared by putting it in the other end of the association. A common example is in the [ Master and Detail Pattern ] where the same information is recorded in many details when it should only occur once in the master entity.
Pattern -- Actor plays a Role for a Time A Person can be a Student in a Quarter. A Person can Work on campus in a Quarter. They can be a Teacher in another Quarter. Combinations of roles is possible.
Events Time and Place Pattern Event of a particular Type takes place at a Time and a Place. Example: CSCI372 meets 4-6pm in JBH146 during winter quarter 2007.
Description and Instance Pattern Example: A product describes a sales Item. Also good for entities that are grouped by shared attribute values. Example: A Course has many Sections.
Part and Whole Pattern Building has many rooms. A Week has many days. Dog has 4 legs. Project has team members.
Access Point Pattern -- An entity selects a collection of entities needed for a process. Example: DeansList of Students with GPA>3.9.
General and Particular Pattern -- A Student and a Teacher are special types of Person. In an ERD the special types are shown with a 0..1 multiplicity in the General. In general, you can have a whole hierarchy of special types. There are ways to document whether the special types overlap or not (UML Generalization). Avoid this detail in simple ERDs.
The Composite Pattern This a Gang of Four "Object-Oriented Programming" pattern that also turns up in some ERDs. A General entity has two types: Composite and Atomic. The Composite has many Parts and each Part is (in turn) a General entity.
The Enumeration pattern It is very common to define an entity type not by its relationships and attributes, but by listing its possible values. For example: Gender, Race, Boolean, ... Either document the values elsewhere or use the <<enumerating>> stereotype in the UML for the second of these. Do not confuse listing the attributes of an entity: Person has age, gender, eye color, ... with listing the possible values of an enumerated date type: Gender = {male, female, unknown}. Another example: student status = (fresher, sophomore, junior, senior, gradauate).
Any time you find yourself drawing the same diagram for the same reason -- you've found a pattern.
. . . . . . . . . ( end of section ERD Patterns) <<Contents | End>>
Check list of ERD Errors
ERD Hint -- Use Layout to encode structure
I normalize my ERDs so that all relationships have one end with multiplicity "1" and I always
place the boxes so that the "1" end
is above the "many" end. It is a discipline from the SSADM method
that I've found helpful for many years.
ERD Hint -- Entities with too many attributes
Entities that have many attributes can become quite
cumbersome and space consuming in an ERD. When this happens
you should document the details separately and only show a few
important atttributes in the diagram. You should then (informally)
add three dots (". . .") at the end of the attribute list to show you have
hidden something.
ERDs Hint -- High Speed ERDS
How to do ERDs very rapidly:
Later I transcribed it to a napkin using the UML. And even later, I used Visio to get this ERD:
. . . . . . . . . ( end of section ERD Hints) <<Contents | End>>
I use three special
tagged values to quantify
the above data bases.
Table
| tag | aplied to | meaning |
|---|---|---|
| L | attribute | length. Number of bytes needed to store attribute |
| L | class | length. Number of bytes in one record in class/entity |
| N | class | expected number of entities/rows in table |
| V | class | volume in bytes, KB, MB |
| average | role | average number of objects/rows per associated entity |
| document | any | where to find detailed documentation |
Some of the constraints are based on the mathematical principle that if you put 20 pigeons in 10 holes than on average you've got 2 pigeons in each hole.
| Context | Constraint | Note |
|---|---|---|
| class | L = Σ[a:attribute](a.L). | Length of entity is sum of its entity lengths. |
| class | V = L * N | Volume is Length times Number of occurences |
| association | All are one-to-many or one-to-optional, no 1-1 and no many-to-many | |
| association | If class with N=N1 has a one-to-average A role in class with N=N2 then N2=N1*A. | |
| association | If class N=N1 has a one-to-(min..max) role in class with N=N2 then min <= N2/N1 <= max. |
There is a book of design patterns for enterprises:
An analysis pattern [Coad92] is a commonly observed set of real world entities and relations. In most models you'll find several patterns.
Data Base Management Systems
Oracle
[ index.html ]
[ Oracle_%28DBMS%29 ]
[ http://www.oracjobs.com/ ]
, MSAccess
[ FX100487571033.aspx ]
[ Microsoft_Access ]
, MySQL
[ MySQL ]
, Postgres
[ PostGres ]
, etc.
SQL injection attacks to database-driven web services
What is the average security risk level of SQL injection attacks to
database-driven web services? Are there standard security techniques that
help to safeguard against this kind of attack, or are the security
solutions variable depending on the structure of the database design and/or
web interface design?
You've got the usual suspect: extra quotation marks that
fake a stupid script into executing the data as a command. See
the Wikipedia article
[ SQL_injection ]
which includes the usual ways to defend against the threat.
I would also expect there are standard security techniques for them.
Other Application Programmer Interfaces
Examples of working with SQL data bases:
[ 372doc6.pdf ]
(Aaron Smith) and
[ 372doc7.pdf ]
(Nieleh Wilcots).
If you need to see an example of an SQL base API check out the
PHP language that includes special functions
[ ref.mysql.php ]
that use SQL
to drive a MySQL database.
Mapping ERDs to XML
Given all the attributes this is not difficult and there are tools
that can help you:
[ xml_wrangling.html ]
. . . . . . . . . ( end of section Online Resources) <<Contents | End>>
Review Questions
. . . . . . . . . ( end of section Entity Relationship Models) <<Contents | End>>
Also see [ glossary.html ] for more special abbreviations and phrases.