Thursday, August 14, 2008

Data Issues: Crosslisted courses

We're actually dealing with this one right now. There's a meeting scheduled on Monday for it. The problem is the difficulty of identifying crosslisted courses and then knowing which department is the primary sponsor of the crosslisted course.

A crosslisted course is a single class that students can register for using two different CRNs (Course Reference Numbers). For example, Asian Studies 608 and Political Science 645C are the same class, with the same teacher, hours, etc. Two students registering one under AS and the other under PS will find themselves in the same class.

The problem we are encountering for eCAFE is that there doesn't seem to be a way to distinctly identify a crosslist course. There is a field in the database named crosslist_group which gives a two character code. This code consists of either two letters or a letter and a number. The code is used to group sections together as either crosslisted or concurrent (another story). We were initially told that two letters meant that a course was crosslisted and a character-letter code meant it was concurrent. This turned out not to be true. Apparently other campuses use a different convention, and to top it off, the outreach college at UHM has their own convention as well.

So we asked if there was another method of getting this data, and were told that there was. We were given access to an internal-use table that contained the information we needed. However, when I went to look at it, there was no information for the semester that starts in a few weeks. Upon further inquiry, it turns out that table isn't filled with the current semester's data until a month into the semester. Far too late for our needs. Blocked again.

Let's hope the meeting on Monday provides some illumination to the problem.

Data Issues: changing ids

After discussing my woes about ODS (where our class/student/instructor data comes from), a developer pointed out I should be noting all this stuff down. It's a good reference for explaining why our project took some of the turns that it did, and it may save another developer some headaches. So, here's the first of these.

When we first started designing eCAFE, we were told that the person_uid and id_number fields were the primary values used to identify a person in the database. I don't recall if someone told us this explicitly, or we just assumed it, but it was our belief that the numbers were unchanging. This turned out not to be the case.

While either number was fairly constant, during our updates each semester, we would inevitably find a few that changed. Since we built our tables using person_uid as the foreign key that all other tables referenced, this caused big problems. I finally had to write a script to do the following:

1.) Remove the unique index from the person table on the id_number field.
2.) Add a new record with the new person_uid (with same id_number, hence #1).
3.) Go to all dependent tables and change the person_uid to the new value.
4.) Remove the original record in the person table.
5.) Restore the unique index.

What a pain. In the redesigned eCAFE we autogenerate a unique key for each person and use that as the FK in the dependent tables. Now we can change the person_uid field with a simple update statement.