CGenericXSLT channels, parameters, and Local Connection Contexts

I’m a bit strapped for time today, but I did take a quick look at this, to see if it looks doable. In a nutshell.. I’d like to use a local connection context to do legacy authentication and obtain an encr string to pass to various legacy backed services. This would allow me to create RSS-type channels that link to authenticated services, so I don’t need to use web proxy channels for everything. Initially I’d use it to connect to external services like MAP/DN, but eventually I could actually have the legacy perl code handle the rendering for stuff like registration, and just re-skin it to look like uPortal.

I started out by seeing if I could pass an “encr” into the RSS and have it display conditionally somehow (we don’t necessarily want it appended to every link in the RSS feed). I came up with the somewhat hackish idea of using the RSS <category> element. If I give the item a category of “myumbcauth”, I can tweak the XSLT to look for that and append extra data to the link. Then, I can pass the actual encrypted string into the XSLT using a stylesheet parameter. This all works fine. The next challenge is getting the portal to set the appropriate parameter in the stylesheet. It looks like all of the channel runtime parameters are also passed in as stylesheet parameters (and in fact I was able to read one of them, baseActionURL), the question is, can I somehow add my own arbitrary param in there? Obviously this would have to be done somehow in the local connection context code. Anyhow, I got as far as that and now I have to run off and fight other fires, so I’ll have to come back to this later.

Today’s database tweak..

Well, one thing our ongoing uPortal launch has illustrated, is that contrary to popular belief, our Oracle database server does not have unlimited resources. To that end, a lot of my recent efforts have been geared towards making our installation more “database friendly”. The centerpiece of this is the connection pooling we set up on Monday. Of course, once you’ve got a nice, manageable connection pooling setup, you want to use it whenever possible. And until today, there was one big piece of the portal that still wasn’t using the pool: the “glue” that interfaces the uPortal web proxy channels to the legacy portal’s authentication scheme. uPortal calls this a local connection context, and ours goes by org.jasig.portal.security.UmbcLegacyLocalConnectionContext. The legacy portal’s session information is all database driven, so this code needs to connect to the database and create a valid legacy portal session for the user, so the web proxy channels will work and the kiddies can see their schedules and drop all their classes. This code was doing an explicit connect to the ‘myumbc’ user in the UMBC instance. Each channel needs to do it, and some of our portal tabs contain several of this type of channel. I’m not sure exactly how many times this code was getting invoked, or how many connections it was generating, etc. because I didn’t do any profiling. But it definitely had an impact.

Anyhow, I’ve modified the code so that it pulls a connection from the pool (using RDBMServices.getConnection) and uses that instead. I needed to modify the LegacyPortalSession code a bit to support this. Also, since our connection pool uses the ‘uportal’ user (not ‘myumbc’), I needed to get our DBA to do a couple of grants so that ‘uportal’ would have access to the tables it needs.

For better or for worse, it’s in production now, so we’ll see how it goes.

The plan for tomorrow: Fix all of the missing or broken links that people have reported. Create a new channel exclusively for DN/MAP. And, look into local connection context usage with CGenericXSLT type channels. I recently discovered that this type of channel can use a local connection context. Depending on how it works, I may be able to use it to eliminate a couple more web proxy channels and replace them with RSS type channels. We’ll see.

Legacy myUMBC ACLs as PAGS Groups

I think I’ve found a way (two ways, actually) to import program ACLs (from the BRCTL.PROG_USER_XREF SIS table) into uPortal as PAGS groups, so that we can publish uPortal channels with the exact same access lists as the respective areas in the legacy myUMBC. This would be a big win, particularly for an app like Degree Navigation/MAP. In the old portal, we control access to DN/MAP using a big, looong list of individual usernames. If the user isn’t on the list, they don’t even see a link to DN/MAP. However, with uPortal, we currently don’t have access to this list, so we have to present the DN/MAP link to a much larger set of users (basically anyone who is faculty or staff), or we’re faced with totally replicating the access list in uPortal, and maintaining two lists. Not what we want.

Fortunately, we designed the old portal with a bit of forward thinking, and made its ACL mechanism totally database driven. That is, all ACL info is stored in the Oracle database, so some future portal could theoretically extract that data and use it down the road. The challenge, then, is to figure out how to get uPortal to do that.

uPortal provides a very nice groups manager called PAGS, which allows us to create arbitrary groups based on what uPortal calls Person Attributes. It can extract Person Attributes directly from LDAP, as well as extracting them from the results of an arbitrary RDBM query. It then presents this group of attributes as a seamless collection, regardless of the actual backend datasource for each individual attribute. It’s really very nice.

My first thought, then, was to just have uPortal query the legacy myUMBC ACL table to get a list of each app a particular user can access, and map the results to “Person Attributes”. I tested this and it works just fine, but there’s one problem: The legacy ACL table is indexed by UMBC username, but the way we have uPortal configured, it’s currently using the LDAP GUID to do its queries. So, to do this the right way (that is, without hacking the uPortal code), we’d need a table that maps the GUID to the username, so that we could do a join against it to get our results. Currently, we don’t have LDAP GUID data anywhere in our Oracle database. Now, I don’t think getting it there would be a huge issue (we’re already doing nightly loads of usernames from LDAP to Oracle), but it still needs to happen before we could use this method.

The second method would be to import the user’s legacy ACL data into the LDAP database as an additional attribute. Then I could just pull the data directly out of LDAP, without having to worry about an RDBM query at all. This seems like a simpler solution, if it’s possible. More later..

Note: Configuration of Person Attributes is done in the file /properties/PersonDirs.xml. When specifying an RDBM attributes query, the SQL statement must include a bind variable reference, or the code will crap out. I learned this when I tried to remove the bind variable and hardcode my own username.. no dice. To test this stuff out, subscribe to the “Person Attributes” channel, which is under the “Development” group. Then look for the attributes you defined in the config file. If they’re there, it worked. If not, not.

Connection pooling crash course

Just spent the whole day tweaking our new uPortal installation and trying to get it to stay up reliably under load. It’s coming along, but not quite there yet. First lesson: Under any kind of load, you must, absolutely must, enable database connection pooling. That’s because if you don’t, it will open enough database connections to, let’s just say, really screw things up. Now, setting up connection pooling is not supposed to be that hard. But in our case, it was a huge pain. The default uPortal 2.4.3 configuration, includes a file uPortal.xml which is used to specify the connection pooling info to Tomcat. Great, I set it up with our connection parameters, and tried it out. Hmm, doesn’t seem to work. Look a little further.. Apparently in portal.properties, I need to set the flag org.jasig.portal.RDBMServices.getDatasourceFromJndi to “true”, or it bypasses the whole connection pooling thing and just opens direct connections. I set it, and tried again. Major bombage. More poking around and I found this page describing the mechanics of Tomcat connection pooling. Apparently, the config file format (as well as the factory class name) changed from Tomcat 5.0.x to Tomcat 5.5.x. We’re running 5.5.x, and the uPortal distro’s config file in the 5.0.x format. So, I updated the config file. Plus as a good measure, I dropped a copy of the Oracle JDBC jar file into tomcat-root/common/lib. Not sure if it really needs to be there or not. But, once I jumped through all those hoops, the connection pooling finally seems to work.

Now, we’re dealing with memory issues causing slowness, as well as a couple lingering database issues with logins to the ‘myumbc’ user…

I hope I don’t have too many more days like this…

Update 1/12/2006: Well, it appears that the connection pooling breaks any ant targets that use the database: This includes pubchan as well as pubfragments, etc. This is kinda bogus, but rather than tweaking portal.properties every time I want to publish a channel or fragment, it looks like I can just run these from the test tree (which uses the same set of database tables).

Big Portal Launch Today..

Today’s the day where we launch our new myUMBC web portal, essentially turning it loose on the unwashed masses and making the world (well, the campus at least) our big, happy beta-test community. As part of this, we’re kindly leaving the old portal around for awhile, because we anticipate stuff will be broken. The new portal will live at http://my.umbc.edu, which (for now) the old portal currently occupies. That means that if we want to keep the old portal running, we have to move it to an alternate URL.

Now, our old portal has been active since 1999, at its current URL. It’s a big, old, bloated beast, and it’s very happy staying where it is. Getting this thing moved is somewhat akin to booting a 35-year-old freeloading kid out of the house. That is, you can be sure it will resist.

In this case, it was a tedious matter of chasing down all the references to the portal’s top URL, and making sure it gets changed everywhere it needs to. Then restart, wonder why it doesn’t work, and determine that the web server no longer has read access to the Webauth cookie. Then fix logout (it’s absolutely mandatory, when making any change like this, that logout stops working. It’s like death and taxes).

Great news is, it appears to work now. Off to fix some other stuff.

FastCGI Weirdness

Getting some strange behavior from FastCGI regarding signal handling..

Platform is SunOS 5.10 on Intel, Perl 5.8.6, mod_fastcgi version 0.67. Seems like the FastCGI accept routine is somehow blocking the delivery of signals. If I set a handler for SIGTERM, then call FCGI::accept(), the signal is ignored until the accept routine exits (which happens when a new request comes in). So basically, when I send SIGTERM to the process, it ignores the signal until I go to my browser and hit the app URL. Then, the signal handler is invoked.

The consequence of this is, basically, none of my shutdown scripts are working right, because they all work by sending SIGTERM to the FastCGI processes.

The really weird thing here is, if I don’t set a signal handler at all, the SIGTERM immediately terminates the process. It’s only when a handler is set, that I have problems. I’ve tried a couple ways of coding the FastCGI loop:

while (FCGI::accept() >= 0) { ... }

vs.

my $request = FCGI::Request();
while ($request->Accept() >= 0) { ... }

Same results with either method. I have no problems using an old-and-crusty version of FastCGI (0.49) on our old-and-crusty SGI hardware. I’ve glanced at the new code that does the accept, and there’s nothing there that looks like it’s holding or blocking signals. Could this be an OS thing? I dunno, but if I can’t fix it I’m going to have to come up with some kind of workaround to kill and restart the processes..

My Fun Day.

Today was lotsa fun. It started out with the National Student Clearinghouse. I decided to get a “real” development instance going where I could connect to them as a student, demo it to Academic Services, etc. I ended up wrestling with their stupid referrer-based security scheme again. I took my existing clearinghouse script, which was working fine, and added Webauth authentication to it. I figure I’ll have the script verify the user’s Webauth credentials, then do an LDAP query to get the student ID, then pass that to the remote site. That way, my local script will have some authentication built in. Well, that broke it. On the initial authentication attempt, Webauth adds a query parameter called WebAuthExtAction (which the client is supposed to decode, and use the result to set a cookie). Great, but that changes the HTTP Referrer string, which breaks the clearinghouse crap. Hey, but they changed their site so it actually tells you what’s going on now, rather than just booting you out. Have to at least give them props for that, it saved me some head-scratching. OK, first attempt at fixing this: I’ll check for a WebAuthExtAction parameter, and if it exists, I’ll append it to the initial referrer string that I send them. Nope, that makes the referrer string too long, and the clearinghouse code can’t deal with it. Second attempt: look for the WebAuthExtAction parameter, and if it’s there, redirect the browser back to the same script, omitting the parameter. Bloody convoluted, but it works. Fortunately, in production, we won’t have to deal with this, because the prod code will run from the same web server as the portal, and the user will always have valid creds when they come to the site. Aargh.

Then there was fun with myUMBC itself. In an attempt to speed things up on the myUMBC web server, I decided to redo the Webauth ticket-logout script that it was using, and make it part of the myUMBC app itself. That way, logouts will go to the FastCGI processes, reducing overhead (the script needs to connect to the database, among other things) and hopefully speeding the machine up. This actually worked OK eventually, but of course, it broke things at first. Turns out I was short-circuiting the FastCGI loop without resetting certain global variables, which of course is a big no-no. But, that was good for a few choice expletives.

When does Christmas break start again?

Shedding the Schedule of Classes Page Albatross…

I’m working on redoing UMBC’s Online Schedule of Classes page. The current version is generated by a big, messy Perl script that reads the raw data uploaded from the HP3000, formats it, generates pages for all the individual disciplines, and then generates the top-level page (which contains links for each semester along with links to various informational pages). What’s the problem, you ask? Well, all of the HTML (except for a few PHP includes for headers, footers, and style info) is hardcoded into the Perl script. You can make modifications to the generated HTML pages, but they get immediately overwritten the next morning when the script runs to regenerate them. Any time someone needs a permanent change (which happens frequently, because the Academic Services folks are always tweaking their informational links, particularly during advance registration), it has to go through me. This is a hassle both for me, and for the folks who need the changes made.

The Perl script is old. It dates to 1996 or so. The model it uses is outdated. It needs to be rewritten so that the Academic Services folks can manage the content themselves. The whole thing is just begging to be rewritten in PHP or some other embedded language, but unfortunately I don’t have time (or staff) to sit down and make a major project out of this right now. So, for now I’ll settle for slow, incremental improvements.

My first tweak was to rework the script so that it uses PHP includes to read the auxiliary links for the top-level page. That way, the AS folks can manipulate auxiliary links themselves, and they will show up instantly (rather than having to wait until the script runs overnight). This is a quick win, and should eliminate the bulk of the busy-work I have been doing to support this. However, I’m not sure it’ll work perfectly, because the AS staff are using Dreamweaver to edit all of their HTML content, and I’m not sure if Dreamweaver can edit partial HTML files, or if it’ll try to add its own tags all over the place. As with everything around here, time will tell, and we’ll tweak things down the road as necessary.

Fun with the National Student Clearinghouse

Can I just say that I hate configuring passthrough authentication to outsourced sites on remote web servers?

Rule 1: The company’s step-by-step setup instructions will not apply to your particular system configuration.
Rule 2: It never works the first time.
Rule 3: See rule 2.
Rule 4: Debugging is impossible, because the error logs are on the remote site.

It’s kind of like playing pin-the-tail-on-the-donkey: Put on blindfold, spin around a few times, try some stuff, and maybe you can get it to work. If not, call the rep. The rep is invariably non-technical and will have to pass the request on to a developer. The developer may or may not get back to you with useful info. Or maybe an error log. Or something. Then, you try again.

Oh, and I forgot….
Rule 5: Once you eventually do get it working, it will work for awhile, then break at the most inconvenient time possible, when the vendor decides to make some change to the remote site without telling anyone. Thus, it becomes a perpetual maintenance-hassle-waiting-to-happen that hangs over your head for, well, forever.

I just, finally, got the passthrough authentication to work with the National Student Clearinghouse. Basically, their stuff works like this: Post several magic variables to their web server. Server sends back a form that includes an encrypted token. Form uses javascript to auto-post itself back to NSC web server, which brings up the student’s clearinghouse view.

To make a long story short: It’s very picky about the HTTP_REFERER. We gave them the URL of our development server ahead of time. In the initial request, the referrer string that the script sends must begin with the URL we provided to them. Fine, I can use LWP::UserAgent to post the initial request, and set the referrer to whatever I want. I was still missing one piece though: In the second request (the auto-post form which is sent to the browser), the referrer must exactly match the referrer that I sent in the first request. Of course, NSC’s documentation conveniently doesn’t mention this.

Example: If I give them ‘http://devel.umbc.edu’ as the development URL, then my initial referrer must begin with that. I can send ‘http://devel.umbc.edu/cgi-bin/blah.pl’ and it will accept that. On the second request, I have to send it exact the same referrer string as I did in the first request, or it won’t accept it, even if it matches the development URL. Got that?

I probably made this harder on myself than it needed to be, by trying to get cute and fudge the referrer header from a URL other than the development URL I gave them. But it’s a bit annoying to have to jump through these hoops to make the thing happy about the referrer, given that it’s a really bad idea to rely on the referrer for any kind of security purposes in the first place.

Oh well, let’s hope I don’t have to do this again for awhile.