PDF document - New Mexico Institute of Mining and Technology

Transcription

PDF document - New Mexico Institute of Mining and Technology
pycbc: A Python interface
to the Christmas Bird Count
database
John W. Shipman
2016-03-08 17:52
Abstract
Describes a database to represent data from the Audubon Christmas Bird Counts, and a Python-language interface
to that database.
This publication is available in Web form1 and also as a PDF document2. Please forward any comments to
john@nmt.edu.
Table of Contents
1. Introduction and scope ............................................................................................................. 3
2. Downloadable files .................................................................................................................. 3
3. Glossary .................................................................................................................................. 4
3.1. Count ........................................................................................................................... 4
3.2. Circle ............................................................................................................................ 4
3.3. Year number ................................................................................................................. 4
3.4. Circle-year .................................................................................................................... 4
3.5. Year key ........................................................................................................................ 5
3.6. Kind of bird .................................................................................................................. 5
3.7. Count week birds ........................................................................................................... 5
4. General design notes ................................................................................................................ 5
4.1. Attributes of the principal entities ................................................................................... 6
4.2. SQL considerations ...................................................................................................... 11
5. Using the pycbc interface ...................................................................................................... 12
5.1. The CBCData class ....................................................................................................... 13
5.2. The Nation class ........................................................................................................ 14
5.3. The Region class ........................................................................................................ 15
5.4. The Physio class ........................................................................................................ 15
5.5. The Circle class ........................................................................................................ 15
5.6. The Effort class ........................................................................................................ 16
5.7. The Census class ......................................................................................................... 17
6. The SQL schema .................................................................................................................... 17
6.1. pycbc.py: Prologue .................................................................................................... 18
6.2. Imports ....................................................................................................................... 18
1
2
http://www.nmt.edu/~shipman/z/cbc/pycbc/
http://www.nmt.edu/~shipman/z/cbc/pycbc/pycbc.pdf
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
1
6.3. Manifest constants .......................................................................................................
6.4. class CBCData: The database interface ......................................................................
6.5. The nations table ......................................................................................................
6.6. The regions table ......................................................................................................
6.7. The physios table .......................................................................................................
6.8. The circles table ......................................................................................................
6.9. The cir_reg table ......................................................................................................
6.10. The cir_physio table ...............................................................................................
6.11. The efforts table .....................................................................................................
6.12. The censuses table ...................................................................................................
6.13. Object-relational mapping ...........................................................................................
6.14. CBCData.__init__(): Constructor ..........................................................................
6.15. CBCData.genNations() .........................................................................................
6.16. CBCData.getNation() ...........................................................................................
6.17. CBCData.genRegions() ..........................................................................................
6.18. CBCData.getRegion() ...........................................................................................
6.19. CBCData.genPhysios() .........................................................................................
6.20. CBCData.getPhysio() ...........................................................................................
6.21. CBCData.getRegionCircle() ...............................................................................
6.22. CBCData.genCircles() .........................................................................................
6.23. CBCData.genCirclesByName() .............................................................................
6.24. CBCData.genRegionCircles() .............................................................................
6.25. CBCData.genPrimaryRegionCircles() ...............................................................
6.26. CBCData.genCirclesByPhysio() .........................................................................
6.27. CBCData.getCircle(): Retrieve a specific circle .......................................................
6.28. CBCData.genEfforts() .........................................................................................
6.29. CBCData.getEffort(): Retrieve a specific effort record ............................................
6.30. CBCData.overlappers(): Find overlapping circles ..................................................
6.31. CBCData.degMinAdd(): Lat/long arithmetic ............................................................
6.32. CBCData.overlapCheck(): Do these circles overlap? ...............................................
6.33. CBCData.__circleSep(): Compute the separation of two circles ..............................
6.34. CBCData.__terraCircle(): Convert a circle center to a terrestrial position ..............
7. The staticloader script: Populate the static tables ................................................................
7.1. staticloader: Prologue ............................................................................................
7.2. staticloader: main() .............................................................................................
7.3. staticloader: loadNations() ...............................................................................
7.4. staticloader: addNation .......................................................................................
7.5. staticloader: loadRegions() ...............................................................................
7.6. staticloader: addRegion() ...................................................................................
7.7. staticloader: loadPhysios() ................................................................................
7.8. staticloader: addPhysio() ...................................................................................
7.9. staticloader: check() ...........................................................................................
7.10. staticloader: Epilogue ...........................................................................................
8. Conversion from the old MySQL database ...............................................................................
8.1. Schema of the 1998 database .........................................................................................
8.2. mycbc.py: Interface to the 1998 database ......................................................................
8.3. class MyCBC: Interface to the old database .................................................................
8.4. MyCBC.__init__() ...................................................................................................
8.5. MyCBC.__mapTable: Locate and bind a table ...............................................................
8.6. MyCBC.genCirs(): Generate all circles .......................................................................
8.7. MyCBC.genStnds(): Generate all the circle-years for a given circle ................................
8.8. MyCBC.getEff(): Retrieve the eff row for a given circle-year .....................................
2
pycbc: Python interface for the CBC database
18
22
22
23
23
24
24
25
26
27
28
30
30
31
31
31
32
32
32
33
33
33
33
34
34
35
35
35
37
38
39
40
40
41
41
43
44
44
45
45
46
46
47
47
47
49
50
53
54
54
55
55
New Mexico Tech Computer Center
8.9. MyCBC.getAsPub(): Retrieve the aspub row for a circle-year ......................................
8.10. MyCBC.genCens(): Generate census records for one circle-year ...................................
9. transloader: Copy over the MySQL database .......................................................................
9.1. transloader: Prologue ..............................................................................................
9.2. transloader: main() ...............................................................................................
9.3. transloader: readPassword() ...............................................................................
9.4. transloader: dbCopy() ...........................................................................................
9.5. transloader: copyCir(): Copy data for one circle ....................................................
9.6. transloader: addCircle() .....................................................................................
9.7. transloader: addCircleYear() .............................................................................
9.8. transloader: addCensus() .....................................................................................
9.9. transloader: Epilogue ..............................................................................................
10. Static data files .....................................................................................................................
55
55
55
56
56
57
57
58
59
60
61
61
61
1. Introduction and scope
The National Audubon Society has been conducting the Christmas Bird Count (CBC) since 1900. The
author has been working with digital representations of this database since 1975. This document represents a complete redesign of a previous version of the database.
• This database is an older version of the current database3 maintained by the National Audubon Society.
The author's current work is not an attempt to provide a parallel database. It is mainly of interest as
an example of contemporary database design, and it also supports the author's work as the New
Mexico regional editor of the CBC.
• The starting point for the current work was the database design documented in the 1998 database
specification4. The current effort is a reimplementation of the data in this older database, with an
improved design based on third normal form database normalization5.
• The current work is also a case study in database implementation using the Python programming
language6 and the SQLAlchemy7 object-relational database mapping system.
It will be implemented using the Postgresql8 database engine, and the data will be loaded from a
representation of the old database that uses the MySQL9 database engine. Details of this translation
process are discussed in Section 8, “Conversion from the old MySQL database” (p. 47).
2. Downloadable files
• pycbc.py10: The Python module defined here. For the user interface, see Section 5.1, “The CBCData
class” (p. 13). For internals, see Section 6, “The SQL schema” (p. 17).
• nationlist11: The file that defines the nation codes; see Section 10, “Static data files” (p. 61).
• regionlist12: The file that defines the region (state or province) codes; see Section 10, “Static data
files” (p. 61).
3
http://www.audubon.org/bird/cbc/index.html
http://www.nmt.edu/~shipman/z/cbc/db_spec.html
5
http://en.wikipedia.org/wiki/3NF
6
http://www.python.org/
7
http://www.sqlalchemy.org/
8
http://en.wikipedia.org/wiki/Postgresql
9
http://en.wikipedia.org/wiki/Mysql
10
http://www.nmt.edu/~shipman/z/cbc/pycbc/pycbc.py
11
http://www.nmt.edu/~shipman/z/cbc/pycbc/nationlist
12
http://www.nmt.edu/~shipman/z/cbc/pycbc/regionlist
4
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
3
• physiolist13: The file that defines the physiographic region code system of the USGS Bird Banding
Lab. A slightly edited version of the list scraped from the Bird Banding Lab's page14. See Section 10,
“Static data files” (p. 61).
• staticloader15: A script that initializes the database and loads the three static tables. See Section 7,
“The staticloader script: Populate the static tables” (p. 40).
• transloader16: A script that reads the old MySQL database and reformats it as the current Postgresql
database. See Section 9, “transloader: Copy over the MySQL database” (p. 55).
• This document was written using DocBook 4.317, according to the New Mexico Tech Computer
Center's publication, Writing documentation with DocBook-XML 4.318. You can examine the XML source
for the present document19.
3. Glossary
Because the selection of good names is so important, let's start by defining some terms, and also discussing
some terms that are problematic.
3.1. Count
This is a highly problematic term. It may refer to: the entire institution of the CBC; only those circles
counted within a given year; only one circle counted in one year; or the number of birds of a given kind.
Consequently, the use of this term without abundant context is discouraged.
3.2. Circle
This term generally refers to one of the 15-mile-diameter circles that are the standard unit of counting.
However, in many cases, especially pelagic transects, the term may refer to areas of some other shape.
3.3. Year number
The First Christmas Bird Count was run on Christmas Day in 1900. Later, the count period was extended
to allow counts on dates from mid-December through early January.
Each yearly cycle is published separately. We will use the term year number to mean 1 for the first year
of the CBC, 2 for the second, and so on. In general, year number N includes dates in December of year
1899+N and January of year 1900+N.
3.4. Circle-year
In the published data, each year is published in a single periodical. We use the term “circle-year” to
mean one circle counted in one year.
13
http://www.nmt.edu/~shipman/z/cbc/pycbc/physiolist
http://www.mbr-pwrc.usgs.gov/bbs/physio.html
15
http://www.nmt.edu/~shipman/z/cbc/pycbc/staticloader
16
http://www.nmt.edu/~shipman/z/cbc/pycbc/transloader
17
http://en.wikipedia.org/wiki/DocBook
18
http://www.nmt.edu/tcc/help/pubs/docbook/
19
http://www.nmt.edu/~shipman/z/cbc/pycbc/pycbc.xml
14
4
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
3.5. Year key
We use the term year key to mean any value that uniquely identifies a circle within one year number.
• For year numbers 1–90 in the database, circles are usually numbered starting from 1 up to the total
number of circles counted that year. For example, in the 88th CBC, there were 1502 circles counted,
numbered 0001-1502 in the database.
However, there are a few published counts that were added to the sequence late in the publication
cycle. For example, in year number 075, between circles number 0027 and 0028 there are two counts
numbered 0027a and 0027b.
In the actual publication, these numbers started appearing only around mid-1960s. The author handnumbered his copies for earlier years.
• For years 91 and later in the database, the year key is a four-letter code of the form RRKK, where RR
is the two-letter region code and KK is a two-letter code unique within that region. For example, the
Zuñi, NM circle has code “NMZU”.
3.6. Kind of bird
In order to understand how we represent the kinds of bird seen, see A system for representing bird taxonomy20, which describes a system of six-letter codes for describing kinds of birds.
That system allows for single kinds of birds, “species pairs” (e.g., we think it was either a Hammond's
or Dusky flycatcher), or hybrids (e.g., it looked like a hybrid of Blue-winged and Cinnamon teal). In
general, the representation is a triple (form, rel, altForm). The form is the first or only six-letter code.
The rel value is the relationship code, blank for single forms, “×” for hybrids; and “/” for species pairs.
If the relationship code is not blank, the altForm value is the second form's six-letter code. If there are
two codes, we stipulate that form < altForm, so we don't have to search twice to pick up a given hybrid
or pair.
Additionally, this database sometimes represents the sighting's age category (adult, immature, or female/immature) or sex category (male or female).
3.7. Count week birds
Born from the frustration of participants who find interesting birds in advance of the official day but
cannot locate them on the official day, Audubon has long been publishing records “seen count week
but not count day”. Careful researchers who use weighted analyses such as “birds per party-hour” will
want to exclude these records from their data sets, but other consumers of the CBC data will be quite
interested in them.
4. General design notes
The first step in redesigning the older database21 is to represent it as an entity-relationship model22.
20
http://www.nmt.edu/~shipman/xnomo
http://www.nmt.edu/~shipman/z/cbc/db_spec.html
22
http://en.wikipedia.org/wiki/Entity-relationship_diagram
21
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
5
circle
Describes one 15-mile-diameter count circle by the latitude and longitude of its center.
region
One political region: a U.S. state, a Canadian province, or the only other nation represented in this
database, the circle located on the French islands of Saint Pierre and Miquelon.
nation
One country.
physio
A physiographic region stratum code as defined by the U.S. Fish & Wildlife Service's Bird Banding
Lab. For the authority file, see Section 2, “Downloadable files” (p. 3). This code is useful for
grouping circles by their biogeographic similarity. Many circle records have not been coded for
physiographic strata. Circles may have up to two physiographic strata codes, and for those that
have two, the first code is the major stratum and the second the minor stratum, so the ordering of
the two codes is important.
effort
This entity describes one year in which there is a published census of the circle.
kind of bird
This entity represents a specific kind of bird seen in one year, and the number of individuals of that
kind that the counters saw. Note that there may be many records for a given species within one
effort entity, differing by several details: age; sex; whether seen count day or only during count
week; or whether the identification is in question.
Note also that on a few occasions an effort has resulted in zero birds (mainly in remote parts of
Alaska and Canada), but this is still considered a valid count.
4.1. Attributes of the principal entities
The tables that represent the entities described above will carry the names of those entities, as plurals.
Here are the attributes of these tables, and some discussion of how they are derived from the old database.
4.1.1. Attributes of the nations table
For the script that loads this table, see Section 7, “The staticloader script: Populate the static
tables” (p. 40).
nation_code
Three-character code for the country.
nation_name
Full name of the country.
6
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
4.1.2. Attributes of the regions table
For the script that loads this table, see Section 7, “The staticloader script: Populate the static
tables” (p. 40).
reg_nation
National code for this region, defined in Section 4.1.1, “Attributes of the nations table” (p. 6).
reg_code
Two-character postal code, e.g., HI or YT.
reg_name
Conventional name of the region, e.g., “West Virginia” or “Province Quebec”.
4.1.3. Attributes of the circles table
lat
North latitude of the circle's center in degrees and minutes as ddmm.
lon
West longitude of the circle's center in degrees and minutes as dddmm.
water
Describes whether salt water occurs in the circle. This attribute is not always properly encoded, so
the lack of a code does not imply a lack of salt water. Codes are:
(blank)
Unknown or no salt water.
p
Pelagic: the entire circle is in open ocean.
o
Some open ocean is included in the circle.
e
Some ocean estuary is included in the circle, but no open ocean.
odd
Code to indicate an area that is not the standard 15-mile-diameter circle. As with the water attribute,
not all circles were properly encoded. Code values may be any of:
(blank)
Standard circle or unknown shape.
p
Pelagic-only transect.
x
Not a pelagic-only transect, and not a standard circle.
cir_name
The published name of the circle. Many circles have changed their names; generally the attribute
in the circles table is the last name used for that particular center. A few standard abbreviations
are used:
M.A.
Management Area
N.M.
National Monument
N.P.
National Park
N.W.R.
National Wildlife Refuge
P.P.
Provincial Park
S.P.
State Park
W.M.A.
Wildlife Management Area
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
7
4.1.4. Attributes of the physios table
physio_code
Two-digit code for the physiographic stratum, with left zero fill.
physio_name
Description of this physiographic stratum, e.g., “Southern Rockies”.
4.1.5. Attributes of the cir_reg table
This table represents the many-to-many relation between circles and regions.
lat, lon
Link to the circles table.
reg_pos
Position of this region within the list of regions for the circle. This value is necessary because the
regions are ordered. Values are 0 for the first or only region; 1 for the second region; 2 for the third
region.
reg_code
Link to the regions table.
4.1.6. Attributes of the cir_physio table
This table represents the many-to-many relation between circles and physiographic strata.
lat, lon
Link to the circles table.
physio_pos
Position of this stratum code: 0 for the first or only stratum, 1 for the second.
4.1.7. Attributes of the efforts table
Each row of this table represents the censusing of one circle in a given year number.
lat
Latitude, encoded as in Section 4.1.3, “Attributes of the circles table” (p. 7).
lon
Longitude, encoded as in Section 4.1.3, “Attributes of the circles table” (p. 7).
year_no
The year number, three digits, with left zero fill. For example, “043” for the Forty-third CBC
(December 1942 and January 1943).
year_key
A five-character key that uniquely identifies an effort within a year. In particular, this field can help
a researcher rapidly find the published data in the original periodical. See Section 3.5, “Year
key” (p. 5) for a discussion of what actually appeared in the published data.
For year numbers 1–90, this column has format NNNNX, where NNNN is the serial number within the
year, and X is either blank or a lowercase letter.
For years 91 through the present, this column's format always has the four-character SSKK form.
8
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
See Section 8, “Conversion from the old MySQL database” (p. 47) for more information about the
origin of this field.
yyyymmdd
The date of the count, if known. For many records this is the date of Christmas because the true
date was not recorded in the old database.
as_lat
Published latitude, if known. May be null.
as_lon
Published longitude, if known. May be null.
as_name
Circle name as published. The old database also tracked the region codes as published, but there is
no strong reason to retain these data.
n_obs
Number of observers; an integer, one or greater.
ph_tot
Note
This attribute and all the remaining attributes in this table may be null.
Total party-hours, to tenths.
ph_foot
Party-hours on foot.
ph_car
Party-hours by car.
ph_other
Party-hours by means other than foot or car.
h_fd
Hours (not party-hours) by feeder-watchers.
h_owl
Hours “owling” or, as it was later known, nocturnal birding.
pm_tot
Total party-miles, to tenths.
pm_other
Party-miles by means other than foot or car.
m_owl
Miles owling or other nocturnal birding.
4.1.8. Attributes of the censuses table
lat
Latitude, encoded as in Section 4.1.3, “Attributes of the circles table” (p. 7).
lon
Longitude, encoded as in Section 4.1.3, “Attributes of the circles table” (p. 7).
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
9
year_no
The year number, encoded as in Section 4.1.7, “Attributes of the efforts table” (p. 8).
seq_no
Sequence number of this record within the circle and year. Audubon chose to discard this field in
their database, but it is vital to the Christmas Bird Count Database corrections project23, because
the records must be in the original order to be proofread efficiently against the original publication.
form
First or only form code describing the type of bird. Example: AMEROB for American Robin. These
codes are defined in the nomenclature system specification24.
rel
(blank)
Not a species pair or hybrid.
/
Species pair; the code for the second alternative is in the alt_form attribute.
x
Hybrid; the code for the second assumed parent form is in the alt_form attribute.
alt_form
Second form code when the rel attribute is not blank; null when rel is blank.
Because, for example, “Downy Woodpecker/Hairy Woodpecker” is the same kind of bird as “Hairy
Woodpecker/Downy Woodpecker”, the form codes are always ordered such that, lexically, the
form code is less than the alt_form code so that a given species pair or hybrid will always have
the same representation.
age
Age code.
(blank)
Unknown age class.
a
Adult.
i
Immature, subadult, or juvenal plumage.
p
Female or immature. Yes, female is a sex and not an age class, but we arbitrary place
this category under age.
sex
Sex code.
(blank)
Sex unknown.
m
Male.
f
Female.
plus
Count-week indicator (see Section 3.7, “Count week birds” (p. 5)). Normally blank; contains “+”
for count-week birds that were not seen count day.
q
Questionable ID flag. Normally blank; contains “q” if the editor indicated some doubt as to
whether this species occurred in the circle at all. Records that are in questions due to abnormally
high numbers are not flagged here.
23
24
http://www.nmt.edu/~shipman/z/cbc/proof.html
http://www.nmt.edu/~shipman/xnomo
10
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
census
Number of individuals. This is often encoded as “-1” when the number is unknown. Audubon's
version of the database uses zero when the number is unknown. For count week birds, this attribute
is generally -1, but a few such records contain actual numbers.
4.2. SQL considerations
Each of the major tables shown in the diagram in Section 4, “General design notes” (p. 5) will become
a table in the SQL representation.
Two more tables are required to manage many-to-many relationships.
• Table cir_reg will represent the many-to-many relation between circles and region codes. The lefthand key, linking to a circle, is lat + lon; the right-hand key, linking to a region, is reg_code.
Furthermore, the relation includes ordering information: for a circle that overlaps multiple regions,
there is a primary region, a secondary region, and possibly a tertiary region.
Hence, the intermediate table that defines this relation must carry an additional column: 0 for the
primary region, 1 for secondary region, and 2 for the tertiary region. If anyone ever censuses a circle
that overlaps four or more states, we can use the numbers 3 or more. We'll call this column reg_pos,
the region's position.
The primary key for this table will be the concatenation of lat + lon + reg_pos, so that retrieval
will produce the region codes in the correct order.
We'll need to support two other queries on this table.
• In order to produce a report showing all the circles that are listed primarily under a given state,
we'll index on reg_code + reg_pos.
• That same index, with reg_pos a wild card, will produce a report showing all the circles that occur
even partially in a given state.
• Table cir_physio will represent the many-to-many relation between circles and physiographic
strata. The left-hand key is lat + lon and the right-hand key is physio_code.
Again, there is an ordering: some circles have two stratum codes, but the first one is the principal
stratum code. We will add a column named physio_pos to indicate the position of the
physiographic stratum code for a given circle, with values of 0 or 1. The primary key for this table
will be lat + lon + physio_pos, which produces the physiographic stratum codes with the
primary code first.
Here is a revision of our original entity-relationship model showing the final tables and their relations.
Primary key columns are indicated with an asterisk “*”, and the arrows show the foreign key relations.
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
11
For the censuses table we have a choice of two unique keys. The concatenation of lat + lon +
year_no + year_key + seq_no is the unique primary key.
Note
It would be nice if every circle were counted exactly once in a given year number, but there are hundreds
of exceptions. Audubon stipulated starting in the mid-1930s that counts within one year number should
not overlap, but exceptions persisted until the 55th year. This is why the year_key must be part of the
primary key.
We'll need another index to search for kinds of birds. Within a given circle-year, we must concatenate
seven columns to insure uniqueness: form + rel + alt_form + age + sex + plus + q. So
the secondary index will include these seven fields, plus lat + lon + year_no + year_key +
seq_no.
5. Using the pycbc interface
To access this database from a Python script, import the pycbc module and call the CBCData() constructor like this:
12
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
import pycbc
db = pycbc.CBCData(password)
If you pass the correct database password as an argument, you will be granted read-write access; otherwise any attempt to retrieve data will fail.
5.1. The CBCData class
Here are the attributes and methods on this class.
.engine
The sqlalchemy.engine.Engine instance connecting to the Postgresql engine where the CBC
database lives.
.meta
The schema as a sqlalchemy.schema.MetaData instance.
This is not documented in the SQLAlchemy reference materials, but a MetaData instance contains
a .tables attribute that is a dictionary whose keys are the names of mapped tables, and each
corresponding value is the actual Table instance for that table.
.Session
A constructor for an SQLAlchemy session.
.s
A Session instance. Use this only for single-threaded applications. For multi-threaded applications,
create a new Session for each thread.
.Nation
The mapped class for the nations table. See Section 5.2, “The Nation class” (p. 14).
.Region
The mapped class for the regions table. See Section 5.3, “The Region class” (p. 15).
.Physio
The mapped class for the physios table. See Section 5.4, “The Physio class” (p. 15).
.Circle
The mapped class for the circles table. See Section 5.5, “The Circle class” (p. 15).
.Effort
The mapped class for the efforts table. See Section 5.6, “The Effort class” (p. 16).
.Census
The mapped class for the censuses table. See Section 5.7, “The Census class” (p. 17).
.nations_table, .regions_table, .physios_table, .circles_table, .efforts_table,
.census_table
The actual Table instances.
.genNations()
Generate a sequence of the Nation instances in ascending order by nation name.
.getNation(nationCode)
Return the Nation instance with a given nation code. Will raise KeyError if there is no such code.
.genRegions()
Generate a sequence of the Region instances in ascending order by region name.
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
13
.getRegion(regionCode)
Return the Region instance with a given region code, or raise KeyError if there is no such code.
.genPhysios()
Generate the Physio instances in self, in ascending order by code.
.getPhysio(code)
Return the Physio instance for a given physiographic stratum code, or raise KeyError if it is not
found.
.getRegionCircle(regionCode, cirName)
If there is a circle whose name is exactly cirName and occurs all or partly in the region whose code
is regionCode, return the corresponding Circle object, otherwise raise KeyError.
.genCircles()
Generates the Circle instances in ascending order by latitude + longitude.
.genCirclesByName(prefix)
Generates all the Circle instances whose names begin with the given prefix, in ascending order
by circle name.
.genRegionCircles(regionCode)
Generate Circle instances for all the circles that occur in any part of the region with a given code,
in ascending order by circle name.
.genPrimaryRegionCircles(regionCode)
Generate Circle instances for all the circles that are listed under the region with a given code; that
is, the circles for which the given region code is the first one displayed. To find all the circles that
occur even partially in a given region, use the .circles attribute of a Region instance.
.genCirclesByPhysio(physioCode)
Generate all the Circle instances that are associated with the physiographic stratum whose code
is physioCode.
.getCircle(lat, lon)
Given a latitude as a string "ddmm" and a longitude as "dddmm", return the corresponding Circle
instance. If that center is not in the database, the method will raise a KeyError.
.genEfforts()
Generate all the Effort records in self in primary key order.
.overlappers(fromCircle)
Use this method to find other circles that overlap a given Circle instance fromCircle. The return
value is a list of tuples (pct, c) where pct is the percentage of their areas that overlap, in the
open interval (0.0,100.0), and c is the overlapping Circle instance, and the list is in descending
order by the pct value.
.getEffort(year_no, year_key)
Returns the Effort instance for the given year number and year key.
5.2. The Nation class
An instance of this class represents one nation.
.nation_code
National code, e.g., “USA”.
14
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
.nation_name
Full name of the nation, e.g., “United States of America”.
.regions
An iterator that produces all the Region instances in this nation, in ascending order by region code.
5.3. The Region class
An instance of this class represents one state or province or the islands of St. Pierre et Miquelon.
.reg_code
Region code, e.g., “NM” for New Mexico.
.reg_name
Region name, e.g., “New Mexico”.
.nation
The Nation instance for the nation containing this region.
.circles
An iterator that produces all the circles that have at least some area in this region.
5.4. The Physio class
Mapped to the table of physiographic strata.
.physio_code
Two-digit code for this stratum, as a string.
.physio_name
Full description of the stratum, e.g., “Closed Boreal Forest”.
.circles
An iterator that produces Circle instances for all the circles that containing the stratum.
5.5. The Circle class
Each instance represents a circle with its center at a specific latitude and longitude, to the nearest minute.
.lat
Latitude as a string of four digits, "ddmm".
.lon
Longitude as a string of five digits, "dddmm".
.water
Water-body code: " ", "p", "o", or "e".
.odd
Odd-shape code, " ", "p", or "x".
.cir_name
Full name of the circle, as Unicode.
.regions
An iterator that produces the Region instances for this circle's regions, in standard order as published.
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
15
.physios
An iterator that produces the Physio instances for this circle's physiographic strata. If there are
two, the first is the major stratum and the second is the minor stratum.
.efforts
An iterator that produces the Effort instances for the circle-years this circle was counted, in
chronological order.
.allRegions()
Returns a string of the form “R0[-R1[-R2]]”, where each Ri is one of the region codes for this
circle.
.fullName()
This method returns a string of the form “ddmmn dddmmw R0[-R1[-R2]]: cir_name ”. Example:
"3843n 08528w IN-KY: Hanover-Madison".
.unicode()
Returns a string like “38°43′N 085°28′W IN-KY: Hanover-Madison”.
5.6. The Effort class
There is one instance for each circle-year.
.lat, .lon
These two fields are the composite key used to relate a circle-year to a circle.
.year_no
The year number as a string of three digits with left zero fill, e.g., “008”. See Section 3.3, “Year
number” (p. 4).
.year_key
The year key for this circle-year, left-justified and blank-filled to length 5. S see the discussion of
the year_key attribute in Section 4.1.7, “Attributes of the efforts table” (p. 8). Example values
(where _ represents a space):
0001_
0027_
0027b
NMZU_
.yyyymmdd
Date as a datetime.date object.
.n_obs
Number of observers as an int.
.ph_tot, .ph_foot, .ph_car, .ph_other, .h_fd, .h_owl, .pm_tot, .pm_foot, .pm_car,
.pm_other, and .m_owl
All these hour- and mile-based quantities use Python's decimal.Decimal type25, with a precision
of one digit after the decimal point.
Note that quantities in this type can be formatted using a “%f” format, as in this conversational
example.
25
http://docs.python.org/library/decimal.html
16
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
>>> d1=decimal.Decimal('1.4')
>>> d2=decimal.Decimal('3.47')
>>> d3=d1+d2
>>> d3
Decimal('4.87')
>>> "%6.1f" % d3
'
4.9'
>>> "%6.3f" % d3
' 4.870'
.circle
The Circle instance for to this circle-year.
.censuses
An iterator that produces all the Census instances for this circle-year.
5.7. The Census class
Each instance represents one kind of bird seen within a circle-year.
.lat, .lon, .year_no, .year_key
The concatenation of these columns relates each instance to the effort table.
.seq_no
An integer value that orders census records within a year according to their published order.
.form, .rel, .alt_form
These three values identify the type of bird. See Section 3.6, “Kind of bird” (p. 5). The .form value
is always uppercase and right-blank-padded to a length of six. The .rel value is " ", "x", or "/".
If the .rel is not blank, the .alt_form attribute is the second bird code, also uppercased and
right-blank-padded to six characters.
.age
Age code: " ", "a", or "i".
.sex
Sex code: " ", "m", or "f".
.plus
Count-week flag: " " or "+".
.q
Questionable status: " " or "q.
.census
Count of birds as an int. The value may be −1 to signify an unknown count. There used to be at
least one row in this table that had a zero census, which is an error. (Audubon's database, by contrast,
uses zero for count-week birds and never shows a value of −1.)
6. The SQL schema
Here we begin the actual code inside pycbc.py. The code is presented in lightweight literate programming26 style, with the name of the destination file displayed above the top right side of each code block.
26
http://www.nmt.edu/~shipman/soft/litprog/
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
17
6.1. pycbc.py: Prologue
The pycbc.py file starts with a brief comment that points back at this documentation.
pycbc.py
'''pycbc.py: SQLAlchemy Postgres model for Christmas Bird Count
For complete documentation:
http://www.nmt.edu/~shipman/z/cbc/pycbc/
'''
6.2. Imports
pycbc.py
# - - - - -
I m p o r t s
SQLAlchemy has a number of sub-modules that we'll need.
•
•
•
•
•
The sqlalchemy.schema module supplies schema classes such as Table and Column.
The sqlalchemy.types module defines column types.
The sqlalchemy.orm module controls the object-relational mapper (ORM).
The sqlalchemy.exc module defines the exception classes thrown by SQLAlchemy.
The sqlalchemy.engine module has the create_engine() function necessary for connecting
to the database backend.
pycbc.py
from sqlalchemy import schema, types, orm, engine, exc
To handle geographical calculations, we will use the author's Python mapping package27, as well as the
standard math package for trig functions.
pycbc.py
import math
import terrapos
6.3. Manifest constants
pycbc.py
# - - - - -
M a n i f e s t
c o n s t a n t s
6.3.1. WATER_BLANK
The value of the Circle.water field when no ocean is involved.
pycbc.py
WATER_BLANK = ' '
6.3.2. WATER_PELAGIC
Value of Circle.water for purely pelagic transects.
27
http://www.nmt.edu/~john/tcc/python/mapping/doc/
18
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
pycbc.py
WATER_PELAGIC = 'p'
6.3.3. WATER_OCEAN
Value of Circle.water when some open ocean is included.
pycbc.py
WATER_OCEAN = 'o'
6.3.4. WATER_ESTUARY
Value of Circle.water when some salt-water estuary is included but no open ocean.
pycbc.py
WATER_ESTUARY = 'e'
6.3.5. ODD_BLANK
The value of Circle.odd when the circle has a normal or unknown shape and size.
pycbc.py
ODD_BLANK = ' '
6.3.6. ODD_PELAGIC
Value of Circle.odd for pelagic-only transects.
pycbc.py
ODD_PELAGIC = 'p'
6.3.7. ODD_NONSTANDARD
Value of Circle.odd for non-pelagic circles not having a standard size and shape, where this is known.
pycbc.py
ODD_NONSTANDARD = 'x'
6.3.8. AGE_UNK
The value of the Census.age field when the age class is unknown.
pycbc.py
AGE_UNK = ' '
6.3.9. AGE_ADULT
Value of Census.age for adults.
pycbc.py
AGE_ADULT = 'a'
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
19
6.3.10. AGE_IMM
Value of Census.age for immatures.
pycbc.py
AGE_IMM = 'i'
6.3.11. AGE_PHI
Value of Census.age for the female or immature age class.
pycbc.py
AGE_PHI = 'p'
6.3.12. SEX_UNK
The value of the Census.sex field when the sex is unkown.
pycbc.py
SEX_UNK = ' '
6.3.13. SEX_M
Value of Census.sex for males.
pycbc.py
SEX_M = 'm'
6.3.14. SEX_F
Value of Census.sex for females.
pycbc.py
SEX_F = 'f'
6.3.15. PLUS_CW'
The value of the Census.plus field for count-week records.
pycbc.py
PLUS_CW = '+'
6.3.16. Q_Q
The value of the Census.q field for questionable records.
pycbc.py
Q_Q = '?'
6.3.17. URL_FORMAT
SQLAlchemy uses a URL to connect to the database engine.
20
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
pycbc.py
URL_FORMAT = "%s://%s:%s@%s/%s"
#
^
^ ^ ^ ^
#
|
| | | +-- Database name
#
|
| | +-- Host name
#
|
| +-- Password
#
|
+-- User name
#
+-- Protocol
#
6.3.18. PROTOCOL
The protocol part of the URL for Postgresql.
pycbc.py
PROTOCOL = "mysql"
6.3.19. DB_USER
Database user name.
pycbc.py
DB_USER = "john"
6.3.20. DB_HOST
pycbc.py
DB_HOST = "dbhost.nmt.edu"
6.3.21. DB_NAME
Name of the database.
pycbc.py
DB_NAME = "john"
6.3.22. DEGREE
Unicode for the degree (°) symbol.
pycbc.py
DEGREE = u'\u00b0'
6.3.23. PRIME
Unicode for the prime (′) symbol, used for minutes in latitudes and longitudes.
pycbc.py
PRIME = u'\u2032'
6.3.24. CIRCLE_DIAMETER
Diameter of a CBC circle in miles.
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
21
pycbc.py
CIRCLE_DIAMETER = 15.0
6.3.25. FEET_PER_MILE
pycbc.py
FEET_PER_MILE = 5280.0
6.3.26. OVERLAP_MINUTES
This constant defines how close two circles have to be to each other, in minutes of latitude or longitude,
before they are in danger of overlapping. This constant is used in Section 6.30, “CBCData.overlappers(): Find overlapping circles” (p. 35).
pycbc.py
OVERLAP_MINUTES = 14
6.4. class CBCData: The database interface
The actual class declaration begins here. All the metadata—schema, tables, and object-relational
mappings—are declared inside this class. The mapped classes are also inside this class; the author found
this somewhat disturbing at first, but it makes perfect sense. For example, if a user has an instance db
of class CBCData, they can refer to the circles table as db.circles_table or as CBCData.circles_table. Similarly, they can refer to the constructor the class mapped to that table
either as db.Circle or CBCData.circle.
pycbc.py
# - - - - -
c l a s s
C B C D a t a
class CBCData(object):
'''Represents the entire Christmas Bird Count database.
'''
6.5. The nations table
This table defines the nation codes used in Section 6.6, “The regions table” (p. 23). The script used to
load (or reload) this table is shown in Section 7, “The staticloader script: Populate the static
tables” (p. 40).
Next we create an instance of the MetaData class to hold all the schema definitions.
pycbc.py
#================================================================
# Table declarations
#---------------------------------------------------------------meta = schema.MetaData()
nations_table = schema.Table('nations', meta,
schema.Column('nation_code', types.CHAR(3), primary_key=True),
schema.Column('nation_name', types.VARCHAR(30)))
22
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
class Nation(object):
'''Class within a class: mapped class for the nations table.
'''
def __init__(self, nation_code, nation_name):
self.nation_code = nation_code
self.nation_name = nation_name
def __repr__(self):
return ( "<Nation(%s: %s)>" %
(self.nation_code, self.nation_name) )
6.6. The regions table
For each region code, this table gives the full name of the associated state or province, as well as the
nation code, which is defined in Section 6.5, “The nations table” (p. 22).
pycbc.py
regions_table = schema.Table('regions', meta,
schema.Column('reg_code',
types.CHAR(2), primary_key=True),
schema.Column('reg_nation', types.CHAR(3),
schema.ForeignKey(nations_table.c.nation_code)),
schema.Column('reg_name',
types.VARCHAR(30)))
class Region(object):
def __init__(self, reg_nation, reg_code, reg_name):
self.reg_nation = reg_nation
self.reg_code = reg_code
self.reg_name = reg_name
def __repr__(self):
return ( "<Region(%s(%s)%s)>" %
(self.reg_code, self.reg_nation, self.reg_name) )
6.7. The physios table
This table defines the codes for physiographic strata.
pycbc.py
physios_table = schema.Table('physios', meta,
schema.Column('physio_code', types.CHAR(2), primary_key=True),
schema.Column('physio_name', types.VARCHAR(30)))
class Physio(object):
def __init__(self, physio_code, physio_name):
self.physio_code = physio_code
self.physio_name = physio_name
def __repr__(self):
return ( "<Physio(%s=%s)>" %
(self.physio_code, self.physio_name) )
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
23
6.8. The circles table
This table's name will be circles_table. For attributes, see Section 4.1.3, “Attributes of the circles
table” (p. 7).
pycbc.py
circles_table = schema.Table('circles', meta,
schema.Column('lat',
types.CHAR(4)),
schema.Column('lon',
types.CHAR(5)),
schema.Column('water',
types.CHAR(1)),
schema.Column('odd',
types.CHAR(1)),
schema.Column('cir_name', types.VARCHAR(80), nullable=False),
schema.PrimaryKeyConstraint('lat', 'lon'))
Instances of class Circle will represent these rows in SQLAlchemy.
pycbc.py
class Circle(object):
def __init__(self, lat, lon, water, odd, cir_name):
self.lat = lat
self.lon = lon
self.water = water
self.odd = odd
self.cir_name = cir_name
def __repr__(self):
return ( "<Circle(%sn %sw %s)>" %
(self.lat, self.lon, self.cir_name) )
def __cmp__(self, other):
return cmp(self.cir_name, other.cir_name)
The .fullName() function returns a string representation with the region codes filled in. Note that
the region codes are available only through the .regions attribute that is added in Section 6.13, “Objectrelational mapping” (p. 28). Just the region part is available as .allRegions(), and a Unicode rendering
of the full name, with degree and prime symbols, is available as .unicode().
pycbc.py
def allRegions(self):
return '-'.join ( [ reg.reg_code
for reg in self.regions ] )
def fullName(self):
return ( "%sn %sw %s: %s" %
(self.lat, self.lon, self.allRegions(),
self.cir_name) )
def unicode(self):
return ( u"%s%s%s%sN %s%s%s%sW %s: %s" %
(self.lat[:2], DEGREE, self.lat[2:], PRIME,
self.lon[:3], DEGREE, self.lon[3:], PRIME,
self.allRegions(), self.cir_name) )
6.9. The cir_reg table
This table is the intermediate table in the many-to-many relation between circles and regions.
24
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
pycbc.py
cir_reg_table = schema.Table('cir_reg', meta,
schema.Column('lat',
types.CHAR(4)),
schema.Column('lon',
types.CHAR(5)),
schema.Column('reg_pos',
types.SMALLINT, nullable=False),
schema.Column('reg_code', types.CHAR(2),
schema.ForeignKey("regions.reg_code", name="cir_reg_reg_x")),
schema.PrimaryKeyConstraint('lat', 'lon', 'reg_pos'),
schema.ForeignKeyConstraint(('lat', 'lon'),
('circles.lat', 'circles.lon')))
class CirReg(object):
def __init__(self, lat, lon, reg_pos, reg_code):
self.lat = lat
self.lon = lon
self.reg_pos = reg_pos
self.reg_code = reg_code
def __repr__(self):
return ( "<CirReg(%sn %sw [%d] %s)>" %
(self.lat, self.lon, self.reg_pos, self.reg_code) )
6.10. The cir_physio table
This table is the intermediate table in the many-to-many relation between circles and physios.
pycbc.py
cir_physio_table = schema.Table('cir_physio', meta,
schema.Column('lat',
types.CHAR(4)),
schema.Column('lon',
types.CHAR(5)),
schema.Column('physio_pos', types.SMALLINT, nullable=False),
schema.Column('physio_code', types.CHAR(2),
schema.ForeignKey('physios.physio_code')),
schema.PrimaryKeyConstraint('lat', 'lon', 'physio_code'),
schema.ForeignKeyConstraint(('lat', 'lon'),
('circles.lat', 'circles.lon')))
class CirPhysio(object):
def __init__(self, lat, lon, physio_pos, physio_code):
self.lat = lat
self.lon = lon
self.physio_pos = physio_pos
self.physio_code = physio_code
def __repr__(self):
return ( "<CirPhysio(%sn %sw[%d]%s)>" %
(self.lat, self.lon, self.physio_pos,
self.physio_code) )
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
25
6.11. The efforts table
Each row in this table represents a published count for a specific circle in a specific year number. The
lat and lon columns have a foreign-key relation to the circles table.
Many of the columns will have null values, especially in the early years. However, the year number
(year_no), date (yyyymmdd), and number of observers (n_obs) are always present.
• The measures of effort are variously prefixed with “ph_” for party-hours, “h_” for hours, “pm_” for
party-miles, and “m_” for miles.
• Column suffixes are “tot” for total, “foot” for observers on foot, “car” for observers in vehicles,
“other” for observers not on foot or in vehicles; “fd” for feeder watchers; and “owl” for nocturnal
birding.
• Quantities in hours or miles used fixed-precision representations with one digit to the right of the
decimal point.
pycbc.py
efforts_table = schema.Table('efforts', meta,
schema.Column('lat',
types.CHAR(4)),
schema.Column('lon',
types.CHAR(5)),
schema.Column('year_no',
types.CHAR(3), nullable=False),
schema.Column('year_key',
types.CHAR(5), nullable=False),
schema.Column('yyyymmdd',
types.DATE,
nullable=False),
schema.Column('as_lat',
types.CHAR(4)),
schema.Column('as_lon',
types.CHAR(5)),
schema.Column('as_name',
types.VARCHAR(80)),
schema.Column('n_obs',
types.SMALLINT, nullable=False),
schema.Column('ph_tot',
types.NUMERIC(6,2)),
schema.Column('ph_foot',
types.NUMERIC(6,2)),
schema.Column('ph_car',
types.NUMERIC(6,2)),
schema.Column('ph_other',
types.NUMERIC(6,2)),
schema.Column('h_fd',
types.NUMERIC(6,2)),
schema.Column('h_owl',
types.NUMERIC(6,2)),
schema.Column('pm_tot',
types.NUMERIC(6,2)),
schema.Column('pm_foot',
types.NUMERIC(6,2)),
schema.Column('pm_car',
types.NUMERIC(6,2)),
schema.Column('pm_other',
types.NUMERIC(6,2)),
schema.Column('m_owl',
types.NUMERIC(6,2)),
schema.PrimaryKeyConstraint('lat', 'lon', 'year_no', 'year_key'),
schema.ForeignKeyConstraint(('lat', 'lon'),
('circles.lat', 'circles.lon')))
schema.Index('eff_key_x',
efforts_table.c.year_no, efforts_table.c.year_key)
class Effort(object):
def __init__(self, lat, lon, year_no, year_key, yyyymmdd,
as_lat, as_lon, as_name, n_obs,
ph_tot=None, ph_foot=None, ph_car=None, ph_other=None,
h_fd=None, h_owl=None,
pm_tot=None, pm_foot=None, pm_car=None, pm_other=None,
m_owl=None):
self.lat = lat
26
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
self.lon = lon
self.year_no = year_no
self.year_key = year_key
self.as_lat = as_lat
self.as_lon = as_lon
self.as_name = as_name
self.yyyymmdd = yyyymmdd
self.n_obs = n_obs
self.ph_tot = ph_tot
self.ph_foot = ph_foot
self.ph_car = ph_car
self.ph_other = ph_other
self.h_fd = h_fd
self.h_owl = h_owl
self.pm_tot = pm_tot
self.pm_foot = pm_foot
self.pm_car = pm_car
self.pm_other = pm_other
self.m_owl = m_owl
def __repr__(self):
return ( "<Effort(%sn %sw[%s-%s])>" %
(self.lat, self.lon, self.year_no, self.year_key) )
6.12. The censuses table
Represents a number of birds of the same kind seen in a circle-year.
pycbc.py
censuses_table = schema.Table('censuses', meta,
schema.Column('lat',
types.CHAR(4)),
schema.Column('lon',
types.CHAR(5)),
schema.Column('year_no',
types.CHAR(3), nullable=False),
schema.Column('year_key',
types.CHAR(5), nullable=False),
schema.Column('seq_no',
types.SMALLINT, nullable=False),
schema.Column('form',
types.CHAR(6), nullable=False),
schema.Column('rel',
types.CHAR(1)),
schema.Column('alt_form',
types.CHAR(6)),
schema.Column('age',
types.CHAR(1)),
schema.Column('sex',
types.CHAR(1)),
schema.Column('plus',
types.CHAR(1)),
schema.Column('q',
types.CHAR(1)),
schema.Column('census',
types.INTEGER, nullable=False),
schema.PrimaryKeyConstraint('lat', 'lon', 'year_no',
'year_key', 'seq_no'),
schema.ForeignKeyConstraint(
('lat', 'lon', 'year_no', 'year_key'),
('efforts.lat', 'efforts.lon', 'efforts.year_no',
'efforts.year_key'),
name='cen_eff_x'))
schema.Index('cen_form_x',
censuses_table.c.form, censuses_table.c.rel,
censuses_table.c.alt_form)
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
27
class Census(object):
def __init__(self, lat, lon, year_no, year_key, seq_no,
form, rel, alt_form, age, sex, plus, q, census):
self.lat = lat
self.lon = lon
self.year_no = year_no
self.year_key = year_key
self.seq_no = seq_no
self.form = form
self.rel = rel
self.alt_form = alt_form
self.age = age
self.sex = sex
self.plus = plus
self.q = q
self.census = census
def __repr__(self):
formKey = ( self.form + (self.rel or "") +
(self.alt_form or "") )
suffixes = ((self.age or "") + (self.sex or "") +
(self.plus or "") + (self.q or ""))
return ( "<Census(%sn %sw[%s-%s.%s] %s%d%s)>" %
(self.lat, self.lon, self.year_no, self.year_key,
self.seq_no, formKey, self.census, suffixes) )
6.13. Object-relational mapping
This section of the pycbc.py file describes the mapping between objects and database tables, and the
relations between the tables.
A number of new attributes are added to the mapped classes in this section:
Table
Attribute
Function
nations
.regions
Iterates over the Region instances for this nation.
regions
.nation
The related Nation instance for this region.
.circles
Iterates over the Circle instances that are all or partly within
this region.
.cir_regs
Iterates over the CirReg instances that are related to this region.
This attribute is necessary because the CirReg instance has a
column reg_pos that is not found in either the circles or
region table.
.region
The related Region instance for this row.
.circle
The related Circle instance for this row.
.circles
Iterates over the circles that are related to this stratum.
.cir_physios
Iterates over the CirPhysio instances for this stratum.
.circle
The related Circle.
.physio
The related Physio.
cir_reg
physios
cir_physio
28
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
Table
Attribute
Function
circles
.regions
Iterates over the Region instances for this circle.
.cir_regs
Iterates over the CirReg instances for this circle.
.physios
Iterates over the Physio instances for this circle.
.cir_physios
Iterates over the CirPhysio instances for this circle.
.efforts
Iterates over the Effort instances for this circle.
The first table to be mapped is the nations table, which has a one-to-many relation with the regions
table.
pycbc.py
#================================================================
# Mapper configuration
#---------------------------------------------------------------orm.mapper(Nation, nations_table,
properties={
'regions': orm.relation(Region, backref='nation')})
The regions table has a many-to-many relation to the circles table, through the secondary table
cir_reg.
pycbc.py
orm.mapper(Region, regions_table,
properties={
'cir_regs': orm.relation(CirReg, backref='region'),
'circles': orm.relation(Circle,
secondary=cir_reg_table, backref='regions')})
orm.mapper(CirReg, cir_reg_table)
The physios table has a many-to-many relation with the circles table, through the secondary table
cir_physio.
pycbc.py
orm.mapper(Physio, physios_table,
properties={
'circles':
orm.relation(Circle,
secondary=cir_physio_table, backref='physios'),
'cir_physios': orm.relation(CirPhysio, backref='physio')})
orm.mapper(CirPhysio, cir_physio_table)
The circles table has a one-to-many relation with the efforts table. (Its two many-to-many relations were
set up above.)
pycbc.py
orm.mapper(Circle, circles_table,
properties={
'cir_regs':
orm.relation(CirReg, backref='circle'),
'cir_physios': orm.relation(CirPhysio, backref='circle'),
'efforts':
orm.relation(Effort, backref='circle')})
The one-to-many relation between the effort table and the censuses table is mapped here.
pycbc.py
orm.mapper(Effort, efforts_table,
properties={
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
29
'censuses': orm.relation(Census, backref='effort')})
orm.mapper(Census, censuses_table)
6.14. CBCData.__init__(): Constructor
Here is the constructor for the CBCData class, which establishes a connection to the Postgresql engine.
pycbc.py
# - - C B C D a t a . _ _ i n i t _ _
def __init__ ( self, password ):
'''Constructor: connect to the database.
'''
#-- 1 -# [ password is a string ->
#
if the Postgresql server is available ->
#
self.engine := an sqlalchemy.engine.Engine
#
instance connected to that engine
#
CBCData.meta := CBCData.meta bound to that engine
#
else -> raise an sqlalchemy.exc.SQLAlchemyError ]
url = ( URL_FORMAT %
(PROTOCOL, DB_USER, password, DB_HOST, DB_NAME) )
self.engine = engine.create_engine ( url )
CBCData.meta.bind = self.engine
Then we create the Session constructor. The autoflush=True option forces a flush of operations to
the database on a commit. The autocommit=False means no commit is done after a flush. The expire_on_commit=True causes cached values to be invalidated after an update so that subsequent
operations go out to the database.
pycbc.py
#-- 2 -# [ session := a class constructor that creates a
#
new session using self.engine
#
s := an instance of that class ]
self.Session = orm.sessionmaker(bind=self.engine,
autoflush=True, autocommit=False, expire_on_commit=True )
self.s = self.Session()
6.15. CBCData.genNations()
pycbc.py
# - - -
C B C D a t a . g e n N a t i o n s
def genNations ( self ):
'''Generate the nations in self, ascending by name.
'''
#-- 1 -for row in self.s.query(self.Nation).order_by(
self.Nation.nation_name):
yield row
30
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
6.16. CBCData.getNation()
pycbc.py
# - - -
C B C D a t a . g e t N a t i o n
def getNation(self, nationCode):
'''Look up a nation_code.
'''
try:
result = self.s.query(self.Nation).one()
return result
except exc.SQLAlchemyError, detail:
raise KeyError("No such nation code, '%s': %s" %
(nationCode, detail))
FIXME: The current SQLAlchemy exception raised in this case is exc.NoResultFound. However, on
infohost, an older version is there that does not have that exception, so we use the more generic
exc.SQLAlchemyError, which is less specific. Fix this also in Section 6.18, “CBCData.getRegion()” (p. 31) and Section 6.20, “CBCData.getPhysio()” (p. 32).
6.17. CBCData.genRegions()
pycbc.py
# - - -
C B C D a t a . g e n R e g i o n s
def genRegions ( self ):
'''Generate the physiographic strata in self, ascending by code.
'''
#-- 1 -for row in self.s.query(self.Region):
yield row
6.18. CBCData.getRegion()
pycbc.py
# - - -
C B C D a t a . g e t R e g i o n
def getRegion(self, regionCode):
'''Look up a region code.
'''
try:
result = self.s.query(self.Region).filter_by(
reg_code=regionCode).one()
return result
except exc.SQLAlchemyError, detail:
raise KeyError("No such region code, '%s': %s" %
(regionCode, detail))
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
31
6.19. CBCData.genPhysios()
pycbc.py
# - - -
C B C D a t a . g e n P h y s i o s
def genPhysios ( self ):
'''Generate the physiographic strata in self, ascending by code.
'''
#-- 1 -for row in self.s.query(self.Physio):
yield row
6.20. CBCData.getPhysio()
pycbc.py
# - - -
C B C D a t a . g e t P h y s i o
def getPhysio(self, physioCode):
'''Look up a physiographic stratum code.
'''
try:
result = self.s.query(self.Physio).filter_by(
physio_code=physioCode).one()
return result
except exc.SQLAlchemyError, detail:
raise KeyError("No such physio code, '%s': %s" %
(physioCode, detail))
6.21. CBCData.getRegionCircle()
pycbc.py
# - - -
C B C D a t a . g e t R e g i o n C i r c l e
def getRegionCircle(self, regionCode, cirName):
'''Get the circle in the given region with the given name.
'''
circleList = [ circle
for circle in self.genCirclesByName(cirName) ]
for circle in circleList:
regionList = [ region
for region in circle.regions ]
for region in regionList:
if region.reg_code == regionCode:
return circle
raise KeyError("No circle named '%s' in region '%s'." %
(cirName, regionCode))
32
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
6.22. CBCData.genCircles()
pycbc.py
# - - -
C B C D a t a . g e n C i r c l e s
def genCircles(self):
'''Generate all the circles.
'''
for row in self.s.query(self.Circle):
yield row
6.23. CBCData.genCirclesByName()
pycbc.py
# - - -
C B C D a t a . g e n C i r c l e s B y N a m e
def genCirclesByName(self, prefix):
'''Generate circles whose names starts with (prefix).
'''
q = self.s.query(self.Circle).filter(
self.Circle.cir_name.like("%s%%" % prefix))
for row in q:
yield row
6.24. CBCData.genRegionCircles()
pycbc.py
# - - -
C B C D a t a . g e n R e g i o n C i r c l e s
def genRegionCircles(self, regionCode):
'''Generate circles that use the given regionCode
'''
This query requires that we refer to columns in a related table, cir_regs, and filter out circles that are
not related to the given regionCode. The .join() method adds the columns from the cir_reg table,
and the .filter_by() method includes only those rows in the joined table whose reg_code column
equals regionCode. The .order_by() method sorts the resulting rows by circle name.
pycbc.py
q = ( self.s.query(self.Circle)
.join(self.Circle.cir_regs)
.filter_by(reg_code=regionCode)
.order_by('cir_name') )
for circle in q:
yield circle
6.25. CBCData.genPrimaryRegionCircles()
pycbc.py
# - - -
C B C D a t a . g e n P r i m a r y R e g i o n C i r c l e s
def genPrimaryRegionCircles(self, regionCode):
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
33
'''Generate circles that have the given regionCode first.
'''
This method is only slightly different from Section 6.25, “CBCData.genPrimaryRegionCircles()” (p. 33): it finds only those circles for which the given regionCode is the first listed region.
For example, if regionCode is "KY", will find circle “KY-TN-VA: Cumberland Gap”, but not “IN-ILKY: Posey County”.
In database terms, we need to know if the cir_reg row that relates a region to a circle has a reg_pos
(region position) value 0. For example, there are two records for “IN-KY: Evansville”. The first has
reg_pos==0 and reg_code=="IN"; the second has reg_pos==1 and reg_code=="KY".
pycbc.py
q = ( self.s.query(self.Circle)
.join(self.Circle.cir_regs)
.filter_by(reg_code=regionCode)
.filter_by(reg_pos=0)
.order_by('cir_name') )
for circle in q:
yield circle
6.26. CBCData.genCirclesByPhysio()
pycbc.py
# - - -
C B C D a t a . g e n C i r c l e s B y P h y s i o
def genCirclesByPhysio(self, physioCode):
'''Find all circles that contain a given stratum.
'''
q = self.s.query(self.CirPhysio).filter_by(
physio_code=physioCode)
for cirPhysio in q:
yield cirPhysio.circle
6.27. CBCData.getCircle(): Retrieve a specific circle
pycbc.py
# - - -
C B C D a t a . g e t C i r c l e
def getCircle(self, lat, lon):
'''Retrieve a specific circle row.
'''
row = self.s.query(self.Circle).get((lat, lon))
if row is None:
raise KeyError("Unknown circle center: %sn %sw" %
(lat, lon))
else:
return row
34
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
6.28. CBCData.genEfforts()
pycbc.py
# - - -
C B C D a t a . g e n E f f o r t s
def genEfforts(self):
'''Generate all the effort records in primary key order.
'''
q = self.s.query(self.Effort)
for eff in q:
yield eff
6.29. CBCData.getEffort(): Retrieve a specific effort record
pycbc.py
# - - -
C B C D a t a . g e t E f f o r t
def getEffort(self, year_no, year_key):
'''Retrieve one effort record.
'''
try:
row = (self.s.query(self.Effort)
.filter_by(year_no=year_no, year_key=year_key)
.one())
except exc.NoResultFound:
raise KeyError("Unknown effort: %s-%s" % (year_no, year_key))
except exc.MultipleResultsFound:
raise KeyError("Not unique: %s-%s" % (year_no, year_key))
return row
6.30. CBCData.overlappers(): Find overlapping circles
pycbc.py
# - - -
C B C D a t a . o v e r l a p p e r s
def overlappers(self, fromCircle):
'''Find circles that overlap fromCircle.
[ fromCircle is a Circle instance ->
return a list of tuples (pct,c) representing all the
circles in self that overlap fromCircle, such that c is
the overlapping Circle instance and pct is the
percentage of their areas that overlap in the interval
(0.0, 100.0) ]
'''
For performance reasons, we would like to avoid comparing the fromCircle with every circle in the
database. To reduce the number of candidates, we will start by considering how big a difference in latitude or longitude is necessary to guarantee no overlap, that is, how many minutes of latitude or longitude
are guaranteed to be greater than 15 miles at any location on the globe.
A given difference in degrees of latitude is always the same distance in miles. However, for a given
difference in degrees of longitude, the distance in miles is largest at the equator. The author has written
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
35
a package to perform mapping calculations; see A Python mapping package28 for particulars. Here is a
conversational session using this package. Point zero is the intersection of the 0°meridian and the
equator. Point e15 is fifteen miles east along the equator, and point n15 is fifteen miles north along the
meridian. The .offsetFeet() method produces the new location along a given bearing (the first argument, in radians), and the second argument is the distance along that bearing in feet.
>>> from math import pi
>>> from terrapos import *
>>> zero=LatLon(0.0, 0.0)
>>> fifteen=5280.0 * 15
# Fifteen miles in feet
>>> e15=zero.offsetFeet(pi/2, fifteen) # Fifteen miles east
>>> print e15.lonDeg*60
# Longitude in minutes
13.025057227
>>> n15=zero.offsetFeet(0, fifteen)
# Fifteen miles north
>>> print n15.latDeg*60
# Latitude in minutes
13.025057227
Hence, we can guarantee that any circle whose latitude or longitude is 14 or more minutes away cannot
overlap fromCircle. This value is defined in Section 6.3.26, “OVERLAP_MINUTES” (p. 22).
To convert this condition into a range test of latitude and longitude in our database query, we must
find the circles whose latitude and longitude are within range. See Section 6.31, “CBCData.degMinAdd(): Lat/long arithmetic” (p. 37) for the method that does arithmetic on latitudes and longitudes in
character form, which always returns the new quantity as “dddmm”.
pycbc.py
#-- 1 -# [ loLat := fromCircle.lat minus OVERLAP_MINUTES, as "ddmm"
#
loLon := fromCircle.lon minus OVERLAP_MINUTES, as "dddmm"
#
hiLat := fromCircle.lat plus OVERLAP_MINUTES, as "ddmm"
#
hiLon := fromCircle.lon plus OVERLAP_MINUTES, as "dddmm" ]
loLat = self.degMinAdd(fromCircle.lat, -OVERLAP_MINUTES)[1:]
loLon = self.degMinAdd(fromCircle.lon, -OVERLAP_MINUTES)
hiLat = self.degMinAdd(fromCircle.lat, OVERLAP_MINUTES)[1:]
hiLon = self.degMinAdd(fromCircle.lon, OVERLAP_MINUTES)
Now we can do a query on the circle table and constrain it as a range of latitudes and longitudes.
pycbc.py
#-- 2 -# [ candidates := Circle instances in self whose lat and lon
#
are in the range [loLatLon, hiLatLon]
#
result := a new, empty list ]
candidates = ( self.s.query(self.Circle)
.filter(self.Circle.lat >= loLat)
.filter(self.Circle.lon >= loLon)
.filter(self.Circle.lat <= hiLat)
.filter(self.Circle.lon <= hiLon)
.all() )
result = []
Next we add to result tuples (pct, c) for circles that actually overlap. This check is performed in
Section 6.32, “CBCData.overlapCheck(): Do these circles overlap?” (p. 38).
28
http://www.nmt.edu/~john/tcc/python/mapping/py-mapping.pdf
36
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
pycbc.py
#-- 3 -# [ result +:= tuples (pct, c) for circles in candidates
#
that overlap fromCircle to any nonzero degree ]
for toCircle in candidates:
#-- 3 body -# [ if toCircle and fromCircle are distinct circles that
#
overlap to any nonzero degree ->
#
result +:= a tuple (pct, toCircle) where pct is the
#
percentage area overlap
#
else -> I ]
if fromCircle is not toCircle:
pct = self.overlapCheck(fromCircle, toCircle)
if pct > 0.0:
result.append ( (pct, toCircle) )
Finally, sort the list in descending order by percent overlap, and return it.
pycbc.py
#-- 4 -result.sort()
result.reverse()
return result
6.31. CBCData.degMinAdd(): Lat/long arithmetic
pycbc.py
# - - -
C B C D a t a . d e g M i n A d d
def degMinAdd(self, ddmm, plusMinutes):
'''Arithmetic on 'ddmm' and 'dddmm' quantities.
[ (ddmm is a string of four or five digits such that
the last two are minutes and the rest are degrees) and
(plusMinutes is an int) ->
return a string of the form "dddmm" representing
ddmm + plusMinutes as degrees and minutes
'''
This function is used by Section 6.30, “CBCData.overlappers(): Find overlapping circles” (p. 35)
to compute the range of latitudes and longitudes that are within a given distance of a circle's center.
pycbc.py
#-- 1 -# [ dd := degrees part of ddmm as an int
#
mm := minutes part of ddmm as an int ]
dd = int(ddmm[:-2])
mm = int(ddmm[-2:])
#-- 2 -# [ minutes := sum of dd degrees, mm minutes, and plusMinutes ]
minutes = dd*60 + mm + plusMinutes
#-- 3 --
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
37
# [ return minutes as degrees and minutes in "dddmm" form ]
ddNew, mmNew = divmod(minutes, 60)
return "%03d%02d" % (ddNew, mmNew)
6.32. CBCData.overlapCheck(): Do these circles overlap?
pycbc.py
# - - -
C B C D a t a . o v e r l a p C h e c k
def overlapCheck ( self, fromCircle, toCircle):
'''Do these circles overlap?
[ fromCircle and toCircle are Circle instances ->
if the circles overlap ->
return the percentage of area that they overlap
in (0.0, 100.0)
else -> return 0.0 ]
'''
This will require a bit of applied geometry. We don't care about the diameter of the circles, just the degree
to which they overlap, so we'll use unit circles of radius 1. Here is a picture of two overlapping circles.
F
C
E
D
G
In this figure, C and D are the centers of the two circles of radius 1. If the length of CD is 2 or greater,
there is no overlap. If there is overlap, the area of overlap is twice the area of the shaded portion of the
figure. This area is called a segment of a circle, meaning the area bounded by a chord and the circle's
perimeter. The CRC Standard Mathematical Tables gives this formula for the area of the segment subtended
by a given angle θ.
(1)
A=
1 2
R (θ − sin θ)
2
Here R is 1 by definition, so this formula simplifies to:
(2)
A=
1
(θ − sin θ)
2
The angle θ is angle FCG in the figure, which is twice angle FCE. We know that length CF is 1 because
it is the radius of a unit circle. We also know that length CE is exactly half of length CD. By simple trig,
the cosine of angle FCE is the adjacent side (CE) divided by the hypotenuse, which is 1. So, if S is the
separation between the circles in diameters, θ is given by:
38
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
(3)
θ = 2 cos S
Now, on to the code. First, find the separation between the two circles in terms of the standard circle
diameter. This is handled in Section 6.33, “CBCData.__circleSep(): Compute the separation of two
circles” (p. 39). The result is expressed in diameters, so a result of 1.0 or greater means no overlap.
pycbc.py
#-- 1 -# [ sep := separation between fromCircle and toCircle
#
as a fraction of CIRCLE_DIAMETER ]
sep = self.__circleSep ( fromCircle, toCircle )
#-- 2 -if sep >= 1.0:
return 0.0
The value of sep is the S in the formula above, so now we can compute θ, then the area of the segment.
The area of the overlap is twice the area of the segment, which we then convert to a percentage by
multiplying by 100.
pycbc.py
#-- 3 -# [ theta := angle subtending the segment where two
#
unit circles overlap if their separation is
#
(sep) diameters ]
theta = 2.0 * math.acos(sep)
#-- 4 -# [ overlapArea := twice the area of the segment of a
#
unit circle subtended by angle theta ]
overlapArea = theta - math.sin(theta)
Finally, the percentage of overlap is computed as the area of the overlap, divided by the area of a unit
circle, which is exactly π, and convert to a percentage by multiplying by 100.
pycbc.py
#-- 5 -return 100.0 * overlapArea / math.pi
6.33. CBCData.__circleSep(): Compute the separation of two circles
pycbc.py
# - - -
C B C D a t a . _ _ c i r c l e S e p
def __circleSep ( self, fromCircle, toCircle):
'''How many circle diameters separate these circles?
[ fromCircle and toCircle are Circle instances ->
return the distance between their centers as a
fraction of CIRCLE_DIAMETER ]
'''
For the interface to the author's mapping package, refer to Section 6.2, “Imports” (p. 18). The terrapos.LatLon() constructor accepts two arguments, each of which is a tuple (D, M) where D is degrees
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
39
and M is minutes. For the conversion from string coordinates to positions in the terrapos package,
see Section 6.34, “CBCData.__terraCircle(): Convert a circle center to a terrestrial position” (p. 40).
pycbc.py
#-- 1 -# [ fromPos := fromCircle's center as a terrapos.LatLon
#
toPos := toCircle's center as a terrapos.LatLon ]
fromPos = self.__terraCircle ( fromCircle )
toPos = self.__terraCircle ( toCircle )
In the terrapos package, the method LatLon.crowFeet() gives the distance in feet between two
positions. We divide that by the number of feet in a circle diameter to normalize the value to diameters.
pycbc.py
#-- 2 -# [ sep := (distance between fromPos and toPos in miles) /
#
CIRCLE_DIAMETER ]
sep = (fromPos.crowFeet(toPos) / FEET_PER_MILE)/CIRCLE_DIAMETER
#-- 3 -return sep
6.34. CBCData.__terraCircle(): Convert a circle center to a terrestrial
position
pycbc.py
# - - -
C B C D a t a . _ _ t e r r a C i r c l e
def __terraCircle(self, circle):
'''Convert a circle center to a terrestrial position.
[ circle is a Circle instance ->
return circle's center as a terrapos.LatLon instance ]
'''
latDeg = int(circle.lat[:2])
latMin = int(circle.lat[2:])
lonDeg = int(circle.lon[:3])
lonMin = int(circle.lon[3:])
return terrapos.LatLon ( (latDeg, latMin), (lonDeg, lonMin) )
7. The staticloader script: Populate the static tables
Warning
This script will destroy the entire database and rebuild it. Exercise caution with live databases!
This standalone script starts out by dropping all the tables in the Postgresql database and recreating
them according to the new schema. Then it loads up the nations, regions, and physios tables from
the static files displayed in Section 10, “Static data files” (p. 61).
40
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
Warning
The current design assumes that these tables are essentially static. However, keep in mind that one
province, Nunavut, was added as recently as 1999.
Once the rest of the database has been loaded, this script cannot be rerun: dropping the nations and
regions tables would break foreign key constraints. If a nation, region or physiographic stratum must
be added or changed, it will be necessary to write either a quick one-off script to do that, or perhaps
write a GUI application to maintain these tables.
7.1. staticloader: Prologue
The script starts off with the usual line to make it self-executing under Linux. The sys module is imported
for input and output, and the pycbc module to connect to the database.
staticloader
#!/usr/bin/env python
#================================================================
# staticloader: Load the 'nations' and 'regions' tables.
#
For documentation, see:
#
http://www.nmt.edu/~shipman/z/cbc/pycbc/
#---------------------------------------------------------------#================================================================
# Imports
#---------------------------------------------------------------from timer import Timer
t0 = Timer('Imports')
import sys
import pycbc
print t0
Constants include the name of the file where the password lives, and the name of the data files for the
nations and regions.
staticloader
#================================================================
# Manifest constants
#---------------------------------------------------------------PASS_FILE
= 'pspass'
NATIONS_FILE = 'nationlist'
REGIONS_FILE = 'regionlist'
PHYSIOS_FILE = 'physiolist'
7.2. staticloader: main()
staticloader
# - - -
m a i n
def main():
'''Main program.
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
41
[ (the Postgresql server is available) and
(PASS_FILE names a readable file containing the database
password for the CBC database) and
(NATIONS_FILE names a readable, valid data file for
the nations table) and
(REGIONS_FILE names a readable, valid data file for
the regions table whose nation codes are all defined
in NATIONS_FILE) ->
that database := that database with the nations
table and regions table dropped and recreated
with data from NATIONS_FILE and REGIONS_FILE,
respectively ]
'''
The main program starts by connecting to the database. The password is stored in a file readable only
by the author, so that it does not appear here.
staticloader
#-- 1 -# [ (the Postgresql server is available) and
#
(PASS_FILE names a readable file containing the database
#
password for the CBC database) ->
#
db := a pycbc.py.CBCData instance connected
#
to that database ]
t0 = Timer('Connecting')
passFile = file ( PASS_FILE )
password = passFile.readline().strip()
passFile.close()
db = pycbc.CBCData(password)
print t0
Next we drop all the tables and recreate them according to the schema.
staticloader
#-- 2 -# [ db := db with all tables dropped ]
t1 = Timer('Dropping and recreating tables')
db.meta.drop_all(checkfirst=True)
#-- 3 -# [ db := db with all tables created according to db.meta ]
db.meta.create_all()
print t1
Loading of the nations file is handled in Section 7.3, “staticloader: loadNations()” (p. 43); for
the regions file, see Section 7.5, “staticloader: loadRegions()” (p. 44).
staticloader
#-- 4 -# [ db := db with the nations table populated from the
#
file named by NATIONS_FILE ]
t2 = Timer('Loading static tables')
loadNations ( db )
#-- 5 -# [ db :=
42
db with the regions table populated from the file
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
#
named by REGIONS_FILE ]
loadRegions ( db )
#-- 6 -# [ db := db with the physios table populated from the file
#
named by PHYSIOS_FILE ]
loadPhysios ( db)
print t2
To check that the database was properly loaded, Section 7.9, “staticloader: check()” (p. 46) prints
a report showing all the nations and regions.
staticloader
#-- 7 -# [ sys.stdout
#
check ( db )
+:=
report showing nation and region
tables from db ]
7.3. staticloader: loadNations()
This function handles loading of the nations table. It is pretty straightforward: it opens the input file,
converts each line of that file into a Nation object, and adds it to the database.
staticloader
# - - -
l o a d N a t i o n s
def loadNations ( db ):
'''Load the nations table.
[ (db is a CBCData instance) and
(NATIONS_FILE names a readable, valid data file for
the nations table) ->
db := db with the nations table populated from
the file named by NATIONS_FILE ]
'''
#-- 1 -# [ inFile := a readable file for NATIONS_FILE ]
inFile = file ( NATIONS_FILE )
#-- 2 -# [ db.s +:= new nations rows made from the lines of inFile ]
for rawLine in inFile:
#-- 2 body -# [ rawLine is a valid nations file line ->
#
db.s +:= a new nations row made from rawLine ]
addNation ( db, rawLine )
#-- 3 -# [ db := db with the transaction in db.s committed ]
db.s.commit()
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
43
7.4. staticloader: addNation
staticloader
# - - -
a d d N a t i o n
def addNation ( db, rawLine ):
'''Add one row to the nations table.
[ (db is a CBCData instance) and
(rawLine is a valid nations file line) ->
db.s +:= a new nations row made from rawLine ]
Line format:
0
1
2
3
0123456789012345678901234567890
CAN Canada
'''
#-- 1 -# [ nation_code := the nation code field from rawLine
#
nation_name := the nation name field from rawLine ]
nation_code = rawLine[:3]
nation_name = unicode(rawLine[4:].strip())
#-- 2 -# [ db.s +:= a new Nation row added using nation_code and
#
nation_name ]
db.s.add ( db.Nation ( nation_code, nation_name) )
7.5. staticloader: loadRegions()
staticloader
# - - -
l o a d R e g i o n s
def loadRegions ( db ):
'''Load the regions table.
[ (db is a CBCData instance) and
(REGIONS_FILE names a readable, valid data file for
the regions table) ->
db := db with the regions table populated from the file
named by REGIONS_FILE ]
'''
Quite similar to Section 7.3, “staticloader: loadNations()” (p. 43).
staticloader
#-- 1 -# [ inFile := a readable file for REGIONS_FILE ]
inFile = file ( REGIONS_FILE )
#-- 2 -# [ db.s +:= new regions rows made from the lines of inFile ]
for rawLine in inFile:
#-- 2 body --
44
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
# [ rawLine is a valid regions file line ->
#
db.s +:= a new regions row made from rawLine ]
addRegion ( db, rawLine )
#-- 3 -# [ db := db with the transaction in db.s committed ]
db.s.commit()
inFile.close()
7.6. staticloader: addRegion()
Similar to Section 7.4, “staticloader: addNation” (p. 44).
staticloader
# - - -
a d d R e g i o n
def addRegion ( db, rawLine ):
'''Add one row to the regions table.
[ (db is a CBCData instance) and
(rawLine is a valid regions file line) ->
db.s +:= a new regions row made from rawLine ]
Line format:
0
1
2
3
0123456789012345678901234567890
CAN NT North West Territories
'''
#-- 1 -# [ reg_nation := the nation code field from rawLine
#
reg_code := the region code field from rawLine
#
reg_name := the region name field from rawLine ]
reg_nation = rawLine[:3]
reg_code = rawLine[4:6]
reg_name = unicode(rawLine[7:].strip())
#-- 2 -# [ db.s +:= a new Region row added using region_code
#
and region_name ]
db.s.add ( db.Region ( reg_nation, reg_code, reg_name) )
7.7. staticloader: loadPhysios()
staticloader
# - - -
l o a d P h y s i o s
def loadPhysios ( db ):
'''Load the table of physiographic strata.
[ (db is a CBCData instance) and
(PHYSIOS_FILE names a readable, valid data file for
the physios table) ->
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
45
db := db with the physios table populated from that file ]
'''
#-- 1 -# [ inFile := a readable file for PHYSIOS_FILE ]
inFile = file ( PHYSIOS_FILE )
#-- 2 -# [ db.s +:= new physios rows made from the lines of inFile ]
for rawLine in inFile:
#-- 2 body -# [ rawLine is a valid physios file line ->
#
db := db with a new physios row made from rawLine ]
addPhysio ( db, rawLine )
#-- 3 -# [ db := db with the transaction in db.s committed ]
db.s.commit()
7.8. staticloader: addPhysio()
staticloader
# - - -
a d d P h y s i o
def addPhysio ( db, rawLine ):
'''Add one row to the physios table.
[ (db is a CBCData instance) and
(rawLine is a valid physios file line) ->
db := db with a new physios row made from rawLine ]
Line format:
0
1
2
012345678901234567890
--------------------05 Mississippi Alluvial Plain
'''
#-- 1 -# [ physio_code := the physio code field from rawLine
#
physio_name := the physio name field from rawLine ]
physio_code = rawLine[:2]
physio_name = unicode(rawLine[3:].strip())
#-- 2 -# [ db.s +:= a new Physio row added using physio_code and
#
physio_name ]
db.s.add ( db.Physio ( physio_code, physio_name ) )
7.9. staticloader: check()
staticloader
# - - -
46
r e p o r t
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
def check ( db ):
'''Display regions by nation.
[ sys.stdout
+:=
report showing nation and region
tables from db ]
'''
for nation in db.genNations():
print "\n=====", nation.nation_name
for region in nation.regions:
print "
", region.reg_code, region.reg_name
print "\n===== physiographic strata"
for physio in db.genPhysios():
print physio.physio_code, physio.physio_name
7.10. staticloader: Epilogue
The last lines of the script initiate execution of the main program.
staticloader
#================================================================
# Epilogue
#---------------------------------------------------------------if __name__ == "__main__":
main()
8. Conversion from the old MySQL database
The previous database schema has been unchanged since 1998. This database currently exists as a
MySQL database, and will provide the initial values for the current (Postgresql) version.
The initial setup of the Postgresql database proceeds in these steps.
1.
The script described in Section 7, “The staticloader script: Populate the static tables” (p. 40)
drops any existing tables, recreates the database, and then populates the nations and regions
tables.
2.
Section 9, “transloader: Copy over the MySQL database” (p. 55) describes the script that populates the rest of the tables from the MySQL database.
8.1. Schema of the 1998 database
The old MySQL database29 had a very different structure, and was poorly normalized. In particular,
there was an unnecessary level of relation in the stnd table, which mapped a key called count ID to a
circle, and three other tables used the count ID to link to the stnd table, rather than directly to the circle
table. The count ID is a composite of two different fields, and has two different formats depending on
the route the data took to get into the database originally:
29
http://www.nmt.edu/~shipman/z/cbc/db_spec.html
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
47
Count year
Count ID format
001–090
YYYNNNNX
091–present
YYYSSKK
The YYY portion is the count year, with left zero fill. The rest of this field is the “year key” discussed in
Section 3.5, “Year key” (p. 5).
The part of this key after the YYY part is preserved in the new database, in the year_key column of
the efforts table.
Here, then, is an entity-relationship model for the old database. The count ID is used as the key for all
the relations shown here, except that the relation from stnd to cir uses the lat_lon column.
1
cir
1
1,n
stnd
1
aspub
1
1
0,n
eff
1
cen
Note the two one-to-one relationships. There is one stnd row for each circle-year, and there is no
compelling reason to distribute the attributes of a circle-year over two other tables (aspub and eff).
So, converting the old schema to the new will lump the old stnd, aspub, and eff tables into the new
efforts table.
Refer to the 1998 database specification30 for a general description of the older schema. Here is a table
showing the actual MySQL column names and types.
Table name
Column name
Column type
cir
lat_lon
CHAR(9)
physio
VARCHAR(4)
water
CHAR(1)
odd
CHAR(1)
regions
VARCHAR(6)
name
VARCHAR(80)
lat_lon
CHAR(9)
count_id
CHAR(8)
count_id
CHAR(8)
as_lat_lon
CHAR(9)
as_regions
VARCHAR(6)
as_name
VARCHAR(80)
count_id
CHAR(8)
yyyymmdd
CHAR(8)
stnd
aspub
eff
30
http://www.nmt.edu/~shipman/z/cbc/db_spec.html
48
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
Table name
cen
Column name
Column type
n_obs
INT
ph_tot
DECIMAL(5,1)
ph_foot
DECIMAL(5,1)
ph_car
DECIMAL(5,1)
ph_o
DECIMAL(5,1)
h_fd
DECIMAL(5,1)
h_owl
DECIMAL(5,1)
pm_tot
DECIMAL(5,1)
pm_f
DECIMAL(5,1)
pm_c
DECIMAL(5,1)
pm_o
DECIMAL(5,1)
m_owl
DECIMAL(5,1)
count_id
CHAR(8)
seq_no
CHAR(3)
form
CHAR(6)
rel
CHAR(1)
alt_form
CHAR(6)
age
CHAR(1)
sex
CHAR(1)
plus
CHAR(1)
q
CHAR(1)
census
INT
8.2. mycbc.py: Interface to the 1998 database
Here we begin file mycbc.py, a module that interfaces to the 1998 MySQL database. The interface is
generally similar to the new interface described in Section 5, “Using the pycbc interface” (p. 12), so
the comments here will be minimal. The main difference is that the schema is determined through reflection, that is, letting SQLAlchemy probe that database for its structure.
mycbc.py
'''mycbc.py: Interface to the 1998-format CBC database
For complete documentation:
http://www.nmt.edu/~shipman/z/cbc/pycbc/
'''
#================================================================
# Imports
#---------------------------------------------------------------from sqlalchemy import schema, types, orm, engine
The protocol is MySQL instead of Postgresql, and the host is the TCC general database host.
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
49
mycbc.py
#================================================================
# Manifest constants
#---------------------------------------------------------------# The engine is mysql at dbhost.nmt.edu.
#-URL_FORMAT = "%s://%s:%s@%s/%s"
PROTOCOL = "mysql"
DB_USER = "john"
DB_HOST = "dbhost.nmt.edu"
DB_NAME = "john"
#-# Names of the tables.
#-CIR_NAME = "cir"
STND_NAME = "stnd"
AS_PUB_NAME = "aspub"
EFF_NAME = "eff"
CEN_NAME = "cen"
8.3. class MyCBC: Interface to the old database
In this database, all the exported attributes are the same as in Section 5, “Using the pycbc interface” (p. 12), except for the table names. Also, because it is intended only for use in one single-threaded
application, the Session() class constructor is not exported.
mycbc.py
# - - - - -
c l a s s
M y C B C
class MyCBC(object):
'''Interface to the 1998 MySQL CBC database
Exports:
MyCBC(password):
[ password is a string ->
if password is the MySQL CBC database password ->
return a new MyCBC instance giving read-write
access to that database
else ->
return a new MyCBC instance giving read-only
access to that database ]
.engine:
[ an sqlalchemy.engine.Engine instance connected to
the database ]
.meta:
[ the metadata as sqlalchemy.schema.MetaData instance ]
.s:
[ a Session connected to self.engine ]
.Cir:
[ class mapped to the cir table ]
.Stnd:
[ class mapped to the stnd table ]
.AsPub:
[ class mapped to the aspub table ]
50
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
.Eff:
[ class mapped to the eff table ]
.cir_table, .stnd_table, .aspub_table, .eff_table:
[ the actual Table instances for these classes ]
Since the only purpose of this module is to drive the extraction of data from the old database described
in Section 9, “transloader: Copy over the MySQL database” (p. 55), rather than set up table relations
in the orm, we'll just define a few methods that run simple queries that generate the circle records and
then dig down to retrieve all the related rows from the other tables. Note that all these retrieval methods
do no error checking, on the assumption that all the foreign key constraints on the MySQL database are
true. This database was built when MySQL had no foreign key constraints, but the software that loaded
it insured them.
mycbc.py
.genCirs():
[ generate a sequence of Cir instances representing
to the rows of the cir table ]
.genStnds(lat_lon):
[ lat_lon is a lat_lon column value ->
generate a sequence of Stnd instances that use
that lat_lon ]
.getEff(count_id):
[ count_id is a count_id column value ->
return the Eff instance for that count_id ]
.getAsPub(count_id):
[ count_id is a count_id column value ->
return the AsPub instance for that count_id ]
.genCens(count_id):
[ (count_id is a count_id column value) ->
generate the Cen instances for count_id ]
'''
Here are the definitions of the tables and mapped classes, which are all inside the MyCBC class.
mycbc.py
#================================================================
# Tables and mapped classes
#---------------------------------------------------------------meta = schema.MetaData()
class Cir(object):
def __init__(self, lat_lon, physio, water, odd, regions, name ):
self.lat_lon = lat_lon
self.physio = physio
self.water = water
self.odd = odd
self.regions = regions
self.name = name
def __repr__(self):
return ( "<Cir(%s %s: %s)>" %
(self.lat_lon, self.regions, self.name) )
class Stnd(object):
def __init__(self, lat_lon, count_id):
self.lat_lon = lat_lon
self.count_id = count_id
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
51
def __repr__(self):
return ( "<Stnd(%s=%s)>" %
(self.lat_lon, self.count_id))
class AsPub(object):
def __init__(self, count_id, as_lat_lon, as_regions, as_name):
self.count_id = count_id
self.as_lat_lon = as_lat_lon
self.as_regions = as_regions
self.as_name = as_name
def __repr__(self):
return ( "<AsPub(%s %s %s: %s)>" %
(self.count_id, self.as_lat_lon,
self.as_regions, self.as_name) )
class Eff(object):
def __init__(self, count_id, yyyymmdd, n_obs,
ph_tot, ph_foot, ph_car, ph_o, h_fd, h_owl,
pm_tot, pm_f, pm_c, pm_o, m_owl):
self.count_id = count_id
self.yyyymmdd = yyyymmdd
self.n_obs = n_obs
self.ph_tot = ph_tot
self.ph_foot = ph_foot
self.ph_car = ph_car
self.ph_o = ph_o
self.h_fd = h_fd
self.h_owl = h_owl
self.pm_tot = pm_tot
self.pm_f = pm_f
self.pm_c = pm_c
self.pm_o = pm_o
self.m_owl = m_owl
def __repr__(self):
return ( "<Eff(%s %s %d)>" %
(self.count_id, self.yyyymmdd, self.n_obs) )
class Cen(object):
def __init__(self, count_id, seq_no, form, rel, alt_form,
age, sex, plus, q, census):
self.count_id = count_id
self.seq_no = seq_no
self.form = form
self.rel = rel
self.alt_form = alt_form
self.age = age
self.sex = sex
self.plus = plus
self.q = q
self.census = census
52
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
8.4. MyCBC.__init__()
mycbc.py
# - - -
M y C B C . _ _ i n i t _ _
def __init__(self, password):
'''Constructor for MyCBC
'''
#-- 1 -# [ if MySQL database is available and accepts
#
password (password) ->
#
self.engine := an sqlalchemy.engine.Engine instance
#
that connects to the database with (password) ]
#
else -> raise sqlalchemy.exc.SQLAlchemyError ]
url = ( URL_FORMAT %
(PROTOCOL, DB_USER, password, DB_HOST, DB_NAME) )
self.engine = engine.create_engine(url)
#-- 2 -# [ self.meta := metadata reflected from self.engine
#
as a schema.MetaData instance ]
self.meta = schema.MetaData(bind=self.engine, reflect=True)
#-- 3 -# [ self.Session := a constructor for sessions using
#
self.engine
#
self.s := an instance of that constructor ]
self.Session = orm.sessionmaker ( bind=self.engine,
autoflush=True, autocommit=False, expire_on_commit=True )
self.s = self.Session()
The constructor for this class is generally similar to the one in Section 6.14, “CBCData.__init__():
Constructor” (p. 30). The primary difference is that the schema is obtained by reflection, so we don't
need to declare the Table instances.
Once the metadata is connected to the engine with the reflect=True option, we will use an undocumented feature of SQLAlchemy: the MetaData instance has an attribute .tables, which is a dictionary
whose keys are the names of the reflected tables, and each related value is the corresponding Table
instance. See Section 8.5, “MyCBC.__mapTable: Locate and bind a table” (p. 54).
mycbc.py
#-- 4 -# [ tables CIR_NAME, STND_NAME, AS_PUB_NAME, EFF_NAME,
#
and CEN_NAME are in self.meta ->
#
self.Cir := a class mapped to table CIR_NAME
#
self.Stnd := a class mapped to table STND_NAME
#
self.AsPub := a class mapped to table AS_PUB_NAME
#
self.Eff := a class mapped to table EFF_NAMe
#
self.Cen := a class mapped to table CEN_NAME ]
self.__mapTable(CIR_NAME, self.Cir)
self.__mapTable(STND_NAME, self.Stnd)
self.__mapTable(AS_PUB_NAME, self.AsPub)
self.__mapTable(EFF_NAME, self.Eff)
self.__mapTable(CEN_NAME, self.Cen)
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
53
8.5. MyCBC.__mapTable: Locate and bind a table
mycbc.py
# - - -
M y C B C . _ _ m a p T a b l e
def __mapTable(self, tableName, className):
'''Map one table class
[ (tableName is a table name found in self.meta) and
(className is a Class) ->
orm := orm with className mapped to that table ]
'''
#-- 1 -# [ if self.meta.tables has a key (tableName) ->
#
table := the related value
#
self.(tableName) := the related value
#
else ->
#
raise IOError ]
try:
table = self.meta.tables[tableName]
setattr(self, tableName, table)
except KeyError, detail:
raise IOError("MySQL database does not contain a table "
"named %s: %s." % (tableName, detail) )
#-- 2 -# [ orm := orm with class (classname) mapped to table ]
orm.mapper(className, table)
8.6. MyCBC.genCirs(): Generate all circles
The technique used in this and the remaining methods are all pretty basic use of the SQLAlchemy Query
class.
mycbc.py
# - - -
M y C B C . g e n C i r s
def genCirs(self):
'''Generate all circles, ascending by lat-lon.
'''
#-- 1 -# [ q := a Session.Query to retrieve all rows of self.Cir ]
q = self.s.query(self.Cir)
#-- 2 -# [ generate the rows in q ]
for row in q:
yield row
54
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
8.7. MyCBC.genStnds(): Generate all the circle-years for a given circle
mycbc.py
# - - -
M y C B C . g e n S t n d s
def genStnds(self, lat_lon):
'''Generate all stnd rows for a given lat-lon.
'''
q = self.s.query(self.Stnd).filter_by(lat_lon=lat_lon)
for row in q:
yield row
8.8. MyCBC.getEff(): Retrieve the eff row for a given circle-year
mycbc.py
# - - -
M y C B C . g e t E f f
def getEff(self, count_id):
'''Retrieve the eff row for count_id.
'''
eff = self.s.query(self.Eff).get(count_id)
return eff
8.9. MyCBC.getAsPub(): Retrieve the aspub row for a circle-year
mycbc.py
# - - -
M y C B C . g e t A s P u b
def getAsPub(self, count_id):
'''Retrieve the aspub row for a circle-year.
'''
asPub = self.s.query(self.AsPub).get(count_id)
return asPub
8.10. MyCBC.genCens(): Generate census records for one circle-year
mycbc.py
# - - -
M y C B C . g e n C e n s
def genCens(self, count_id):
'''Generate the census rows for one circle-year.
'''
q = self.s.query(self.Cen).filter_by(count_id=count_id)
for row in q:
yield row
9. transloader: Copy over the MySQL database
This script is run after staticloader (see Section 7, “The staticloader script: Populate the static
tables” (p. 40)) to convert the MySQL database to the new form.
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
55
9.1. transloader: Prologue
transloader
#!/usr/bin/env python
# transloader: Copy MySQL CBC database to Posgresql
#
For documentation, see:
#
http://www.nmt.edu/~shipman/z/cbc/pycbc/
Input is from the mycbc module: see Section 8.2, “mycbc.py: Interface to the 1998 database” (p. 49).
Output is to the pycbc module; see Section 5, “Using the pycbc interface” (p. 12).
transloader
#================================================================
# Imports
#---------------------------------------------------------------from timer import Timer
t0 = Timer('Imports')
import sys
import mycbc, pycbc
print t0
The passwords are kept in external files readable only by the author, so the actual passwords don't need
to appear here.
transloader
#================================================================
# Manifest constants
#---------------------------------------------------------------MY_PASS = "mypass"
# MySQL password file
PS_PASS = "pspass"
# Postgresql password file
9.2. transloader: main()
transloader
# - - - - -
m a i n
def main():
'''Main program: Copy MySQL CBC database to Postgresql.
[ (MY_PASS and PS_PASS name files containing passwords) and
(MySQL and Postgresql CBC databases are available) and
(all Postgresql tables except regions and nations are
empty) ->
Postgresql CBC database := MySQL CBC database ]
'''
For the function that reads passwords from a file, see Section 9.3, “transloader: readPassword()” (p. 57).
transloader
#-- 1 -# [ myPassword := first line of file MY_PASS, stripped
#
psPassword := first line of file PS_PASS, stripped ]
tTotal = Timer("Entire database loaded")
t0 = Timer("Connect to mysql")
56
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
myPassword = readPassword(MY_PASS)
psPassword = readPassword(PS_PASS)
#-- 2 -# [ my := a mycbc.MyCBC instance representing the MySQL
#
CBC database with password (myPassword)
#
db := a pycbc.CBCData instance representing the
#
Postgresql CBC database with password (psPassword) ]
my = mycbc.MyCBC(myPassword)
print t0
t1 = Timer("Connect to postgresql")
db = pycbc.CBCData(psPassword)
print t1
The main copying and reformatting logic is in Section 9.4, “transloader: dbCopy()” (p. 57).
transloader
#-- 3 -# [ db := db with all data added from my ]
dbCopy(my, db)
print tTotal
9.3. transloader: readPassword()
transloader
# - - -
r e a d P a s s w o r d
def readPassword(fileName):
'''Read a password from a named file
[ fileName is a string naming a readable file ->
return the first line of that file, stripped ]
'''
passFile = file(fileName)
password = passFile.readline().strip()
passFile.close()
return password
9.4. transloader: dbCopy()
transloader
# - - -
d b C o p y
def dbCopy(my, db):
'''Copy all the MySQL data to Postgresql.
[ (my is a MyCBC instance) and
(db is a CBCData instance) ->
db := db with all data added from my ]
'''
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
57
This process is driven by the old cir file. For each circle, we add a new row to the circles table. Then,
for each region, we'll add a cir_reg row relating the circle and region, and for each physiographic
stratum code, add a cir_physio row relating the circle and physiographic region.
There is one additional wrinkle: the original cir table has a dummy entry for lat-long “000000000”,
which should not be propagated to the new database.
transloader
cirCount = 0
for cir in my.genCirs():
cirCount += 1
if cir.lat_lon != '000000000':
copyCir(my, db, cir)
9.5. transloader: copyCir(): Copy data for one circle
transloader
# - - -
c o p y C i r
def copyCir(my, db, cir):
'''Copy the data for one circle.
[ (my is a MyCBC instance) and (db is a CBCData instance) and
(cir is a MyCBC.Cir instance) ->
db := db + (all data for cir) ]
'''
To preserve all the data for one circle in the old database, we need to add rows to up to five tables:
circles, cir_reg, cir_physio, efforts. The first three are built in Section 9.6, “transloader:
addCircle()” (p. 59),
transloader
#-- 1 -# [ db.s +:= a circle row and related cir_reg and cir_physio
#
rows added, made from cir ]
t0 = Timer('Adding circle %s' % cir.lat_lon)
addCircle(my, db, cir)
That takes care of all the one-per-circle items. Items related to circle-years are processed in Section 9.7,
“transloader: addCircleYear()” (p. 60).
transloader
#-- 2 -# [ db.s +:= all circle-year data from db for cir ]
for stnd in my.genStnds(cir.lat_lon):
addCircleYear(my, db, stnd)
So far all those added rows are in the session, db.s; now commit them.
transloader
#-- 3 -# [ db := db with transactions in db.s committed ]
db.s.commit()
print t0
sys.stdout.flush()
58
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
9.6. transloader: addCircle()
transloader
# - - -
a d d C i r c l e
def addCircle(my, db, cir):
'''Add rows to the circles, cir_reg, and cir_physio tables.
[ (my is a MyCBC instance) and (db is a CBCData instance) and
(cir is a db.Cir instance) ->
db := db with a circle row and related cir_reg
and cir_physio rows added, made from cir ]
'''
First we assemble a db.Circle instance. For the constructor, see Section 6.8, “The circles
table” (p. 24).
transloader
#-- 1 -# [ db +:= a db.Circle instance made from cir ]
lat = cir.lat_lon[:4]
lon = cir.lat_lon[4:]
name = unicode(cir.name)
circle = db.Circle(lat, lon, cir.water, cir.odd, name)
db.s.add ( circle )
db.s.commit()
Next, create cir_reg instances for each region code. The number of region codes is the length of the
old database's regions field divided by two. For the constructor, see Section 6.9, “The cir_reg
table” (p. 24).
transloader
#-- 2-# [ db +:= db.CirReg instances made from cir's regions,
#
if any ]
nRegs = len(cir.regions.strip())/2
for regx in range(nRegs):
regCode = cir.regions[regx*2:regx*2+2]
cirReg = db.CirReg(lat, lon, regx, regCode)
db.s.add ( cirReg )
Similarly, create cir_physio instances for the physiographic strata, if any. For the CirPhysio constructor, see Section 6.10, “The cir_physio table” (p. 25).
transloader
#-- 3 -# [ db.s +:= db.CirPhysio instances made from cir's
#
physio codes, if any ]
nPhysios = len(cir.physio.strip())/2
for physx in range(nPhysios):
physioCode = cir.physio[physx*2:physx*2+2]
cirPhysio = db.CirPhysio(lat, lon, physx, physioCode)
db.s.add ( cirPhysio )
#-- 4 -db.s.commit()
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
59
9.7. transloader: addCircleYear()
transloader
# - - -
a d d C i r c l e Y e a r
def addCircleYear(my, db, stnd):
'''Copy all the data for a given circle-year.
[ (my is a MyCBC instance) and (db is a CBCData instance) and
(stnd is a my.Stnd instance) ->
db := db + (all circle-year data for stnd) ]
'''
This function handles the copying of new rows to the efforts and censuses tables.
The first order of business is to retrieve the Eff and AsPub instances for this count ID.
transloader
#-- 1 -# [ lat := latitude from stnd
#
lon := longitude from stnd
#
year_no := year number from stnd
#
year_key := year key from stnd
#
eff := Eff instance from (my) for stnd.count_id
#
asPub := AsPub instance from (my) for stnd.count_id ]
lat = stnd.lat_lon[:4]
lon = stnd.lat_lon[4:]
year_no = stnd.count_id[:3]
year_key = stnd.count_id[3:]
eff = my.getEff(stnd.count_id)
asPub = my.getAsPub(stnd.count_id)
Now create the new Effort instance. For the constructor, see Section 6.11, “The efforts table” (p. 26).
transloader
#-- 2 -# [ db.s +:= a new db.Effort instance representing eff and asPub ]
asLat = asPub.as_lat_lon[:4]
asLon = asPub.as_lat_lon[4:]
effort = db.Effort(lat, lon, year_no, year_key, eff.yyyymmdd,
asLat, asLon, asPub.as_name, eff.n_obs,
eff.ph_tot, eff.ph_foot, eff.ph_car, eff.ph_o,
eff.h_fd, eff.h_owl,
eff.pm_tot, eff.pm_f, eff.pm_c, eff.pm_o, eff.m_owl)
db.s.add ( effort )
Copying of records to the censuses table is handled in Section 9.8, “transloader: addCensus()” (p. 61).
transloader
#-- 3 -# [ db.s +:= new db.Census instances representing rows from
#
the cen table in my for count_id (count_id) and
#
year (year_no) ]
for cen in my.genCens(stnd.count_id):
#-- 3 body --
60
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
# [ db.s +:= a new db.Census instance representing cen ]
addCensus(db, lat, lon, year_no, year_key, cen)
9.8. transloader: addCensus()
transloader
# - - -
a d d C e n s u s
def addCensus(db, lat, lon, year_no, year_key, cen):
'''Add one row to the censuses table.
[ (db is a CBCData instance) and
(lat is a latitude as 'ddmm') and
(lon is a longitude as 'dddmm') and
(year_no is a year number as 'nnn') and
(year_key is a year key) and
(cen is a MyCBC.Cen instance) ->
db.s +:= a new db.Census instance representing cen ]
'''
For the Census constructor, see Section 6.12, “The censuses table” (p. 27).
transloader
census = db.Census(lat, lon, year_no, year_key, cen.seq_no,
cen.form, cen.rel, cen.alt_form, cen.age, cen.sex,
cen.plus, cen.q, cen.census)
db.s.add ( census )
9.9. transloader: Epilogue
transloader
#================================================================
# Epilogue
#---------------------------------------------------------------if __name__ == "__main__":
main()
10. Static data files
Here are the files used by Section 7, “The staticloader script: Populate the static tables” (p. 40) to
populate the nation and regions table. For downloads, see the links in Section 2, “Downloadable
files” (p. 3).
The nationlist file:
nationlist
FRA France
CAN Canada
USA United States of America
The regionlist file:
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
61
regionlist
FRA
CAN
CAN
CAN
CAN
CAN
CAN
CAN
CAN
CAN
CAN
CAN
CAN
CAN
CAN
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
62
FR
AB
BC
MB
NB
NF
NL
NS
NT
NU
ON
PE
QC
SK
YT
AL
AK
AZ
AR
CA
CO
CT
DE
DC
FL
GA
HI
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NE
NV
NH
NJ
NM
NY
NC
ND
OH
OK
Saint Pierre et Miquelon
Alberta
British Columbia
Manitoba
New Brunswick
Newfoundland
Newfoundland and Labrador
Nova Scotia
North West Territories
Nunavut
Ontario
Prince Edward Island
Quebec
Saskatchewan
Yukon Territory
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
District of Columbia
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
Here is the physiolist file. Codes 43 and 44 were added because there are old circle files that use
them.
physiolist
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Subtropical
Floridian
Coastal Flatwoods
Upper Coastal Plain
Mississippi Alluvial Plain
Coastal Prairies
South Texas
East Texas Prairies
Glaciated Coastal Plain
Northern Piedmont
Southern Piedmont
Southern New England
Ridge and Valley
Highland Rim
Lexington Plain
Great Lakes Plain
Driftless Area
St. Lawrence River Plain
Ozark-Ouachita Plateau
Great Lakes Transition
Cumberland Plateau
Ohio Hills
Blue Ridge Mountains
Allegheny Plateau
Open Boreal Forest
Adirondack Mountains
Northern New England
N. Spruce-Hardwoods
Closed Boreal Forest
Aspen Parklands
Till Plains
Dissected Till Plains
Osage Plain-Cross Timbers
High Plains Border
Rolling Red Prairies
New Mexico Tech Computer Center
pycbc: Python interface for the CBC database
63
36
37
38
39
40
43
44
53
54
55
56
61
62
63
64
65
66
67
68
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
98
99
64
High Plains
Drift Prairie
Glaciated Missouri Plateau
Great Plains Roughlands
Black Prairie
Unknown stratum 43
Unknown stratum 44
Edward's Plateau
Rolling Red Plains
Staked Plains
Chihuahuan Desert
Black Hills
Southern Rockies
Fraser Plateau
Central Rockies
Dissected Rockies
Sierra Nevada
Cascade Mountains
Northern Rockies
Great Basin Deserts
Mexican Highlands
Sonoran Desert
Mojave Desert
Pinyon-Juniper Woodlands
Pitt-Klamath Plateau
Wyoming Basin
Intermountain Grasslands
Basin and Range
Columbia Plateau
S. California Grasslands
Central Valley
California Foothills
S. Pacific Rainforests
N. Pacific Rainforests
Los Angeles Ranges
S. Alaska Coast
Willamette Lowlands
Tundra
pycbc: Python interface for the CBC database
New Mexico Tech Computer Center