05 January 2006

Occam's Razor

Gah! Just Gah! Data access on our campus is somewhat of a sticky question. It's not that we don't want people to be able to get into data they need and create their own reports, it's that our database is truly not relational or normalized in *any* way. It has.. personality. *charming smile* It's a mainframe database originally coded in COBOL that got wrapped up in enough SQL and bubble gum to dump into an Oracle database structure some four or five years ago.

But even though it's in a relational database structure, it's *not* relational. There are no common primary keys. There are no automatic joins between tables. Some tables that should be able to talk to each other, can't. Fields can be named identically in different tables and mean entirely different things, and fields that are named entirely differently in different tables might contain the same data. Codes that might have meant one thing ten years ago have been co-opted and used to mean different things now, often without documentation (or consistency, but that's a data-entry problem, not a data-structure problem). To add to the confusion, it's an operational database designed to support operational business functions, not a datamart or datawarehouse designed to support analytical functions.

This is no secret. I was warned when I interviewed that the database was only so loosely speaking. I've been here slightly over a year and I'm only just scratching the surface of learning the quirks and ins-and-outs of our database. And I work *daily* with our data and our database, which ought to say something about the level of complexity we're dealing with.

So when people on campus want to get direct query access to our database so they don't have to go through one of the gatekeepers, it causes all kinds of hullaballoo. Trust me, I'd *love* to be able to set up access for common requesters of data and teach them to fetch their own data. It would save me quite a bit of time and make them happier to not have to wait (even though the average wait time on a routine data request tends to be less than 48 hours, but that's an issue for another day). But given the complexity of our data structure, it's not feasible.

The support required to train them on even the basics would be beyond our current training capacity, to say nothing of trying to support them on more complex queries, which they'd inevitably want to dive into right away. And trying to explain to people that while they might be getting *an* answer, because of the intricacies of our database it may not the *right* answer just results in unnecessary confusion and an even larger expenditure of time to untangle, compounded exponentially if they've gone crying to the four winds that they've got *the* answer.

(Data use, in general, is also something that the majority of campus could use some education on. For instance, when a form asks for the number of first-time, first-degree seeking new freshmen for a given year, the answer will *never* be over 7,000 for our campus. Never.)

So the answer when someone - even someone who could otherwise legitimately have data access - requests carte blanche data access is almost invariably "no". Which usually results in a backlash of conspiracy theories about how the administration is hiding large pools of money (it's all always about money, isn't it?) and they won't let people access the data because someone will find them out. *sigh*

This has been my week. Happy New Year..?

(Two work-related posts in as many days, with no new knitting/spinner/fiber content in either. Sorry about that!)


mamacate said...

Yeesh, that's bad. I mean, no worse than I've seen elsewhere (ahem), but bad, yeah. Have you thought about setting up a mini-data mart? (Data 7-11?) Like create tables in excel of IPEDS populations at the census date for each of the last 5 years or something, and then create a freeze file every 10/15 going forward, and then people can do pivot tables off that data? You could do something similar with admission data (cycle apps/accepts/matrics as of 9/1 or whenever you calculate admission numbers). Despite converting to Datatel at one job, we wound up doing this because of the transactional/reporting data issue. And actually at the other place, a Banner shop, my predecessor had IR "census files" that were basically the same thing. The FOCUS programs that created those files from 10/1 freeze files are now missing. Ask me how fun THAT is. :)

Teej said...

Actually, some of that's been done, but it's not publicly accessible. Census files exist (at the moment on my harddrive (which is backed up to a separate drive every week) and as soon as I remember when I have time, will also be burned to CDs by fiscal year) as raw data flat files that I could dump into Excel or Access; I've created historical flat files from these in SPSS that I use for data requests all the time. Making them available wouldn't be hard, but would require more clean up and time than I have for this type of stop-gap project at the moment.

(I'll rant about *my* data vs. IPEDS data another time.. suffice it to say for now that we're part of a System that "cleans" our data before submitting to IPEDS on our behalf.. yeah..)

I will be working with an software engineering student next term who's doing an independent study with one of our faculy in database design to layout a design for an IR datamart, which is the first step in what will hopefully be a fruitful project. The hardest part there is the ETL step - the student can't really do that and our IT shop is stretched pretty thin already, so it may take a couple years to get there.

So yeah, we are working on solutions, it's just not happening fast enough for the impatient folks (myself included) on campus. *shrug*