Code From Away: November 2009

Monday, November 30, 2009

Setting Up Redmine in A Shared Hosting Environment

Let's face it: providing issue tracking and a nice looking web interface to your code is a must if you want to get your clients involved in your development process.
Redmine is an awesome issue tracker that's written in Ruby on Rails: it's extremely pretty (which everyone loves) and has some fantastic plugins to integrate with all sorts of things, including your source control system (more on that some other time!) While Ruby is not a language that I'm familiar with, I have never been one to turn my back on a solution solely for the language that it was written in. To be perfectly honest, if I did this I probably never would have gotten into PHP in the first place!

Redmine is very similar to Trac (if you have any experience with it), except for the fact that it supports multiple projects. When I first saw Redmine I was working with a shared hosting service and had never deployed a Ruby application in my life. What follows is essentially a step-by-step process for setting this up in your own shared hosting environment. It discusses version 0.8.5, but mainly because that was the version that was available when I created my notes: I doubt newer versions are massively different to set up.

Prerequisites
Unfortunately there are a couple of prerequisites for you to get Redmine working with your hosting environment. Firstly, and probably most naturally, it needs to support Ruby on Rails applications. It also needs access to some sort of source control system: I have my installation linked to Subversion, but it does support a number of systems and so you should be able to get something working, at least. Finally, you need to have SSH access to your account. Luckily, my hosting company provided all three of these features with my account, so I was really lucky :)

Finally, there is the prerequisite that at the end of this process, you want to use a subdomain to host your Redmine installation. I have no doubt it's simple to set this up in a subdirectory instead, but the point is that this was my goal and this was what I did to get there.

Anyway, with that out of the way, let's get started...

Getting Started...
We start our process in the cPanel interface. Head into the database management area:
1) create a database for Redmine. As per the usual cPanel result, I called my database 'redmine' and ended up with a database named 'myusername_redmine'.
2) create a database user. Again, I named my 'redmine', which resulted in a user actually named 'myusername_redmine'.
3) give the new user full rights to the new database
4) in your domain management area, create a subdomain (I used code.mydomain.com, which resulted in a directory being created at ~/public_html/code)
5) in your Rails applications area, create a Rails application named 'redmine'
6) if the newly created application is already running, stop it
7) download the file redmine-0.8.5.tar.gz from http://rubyforge.org/frs/?group_id=1850
8) upload the redmine-0.8.5.tar.gz file to your home directory on the hosting server

Cracking the shell...
Open your SSH interface and keep going...
9) switch to the ~/etc/rails_apps/redmine directory
10) delete everything in the directory: rm -rf *
11) copy redmine-0.8.5.tar.gz from your home directory to ~/etc/rails_apps/redmine
12) extract the file: tar -zxvf redmine-0.8.5.tar.gz
13) switch into the redmine-0.8.5 directory and move everything to the parent directory: mv * ..
14) switch back to the parent directory and remove the now empty redmine-0.8.5 directory: rmdir redmine-0.8.5
15) copy the example database YML file: cp config/database.yml.example config/database.yml
16) open config/database.yml in your editor of choice
17) Find the section that looks like:
production:
adapter: mysql
database: redmine
host: localhost
username: redmine
password: my_password
and set the correct value for the 'database' line, the 'username' line and the 'password' line.
18) save the file and exit back to the command line
19) create the database structure: rake db:migrate RAILS_ENV="production"
20) add the default configuration data: rake redmine:load_default_data RAILS_ENV="production"

Back to cPanel
Head back to the cPanel interface for a moment.
21) add a rewrite for the Redmine application: on the Rails application page, click 'Create a Rewrite' for the Redmine application. On the next page, change the dropdown so that it has the correct subdomain selected. Leave the text box empty and click 'Save'.
22) start the Redmine application

Back to SSH One Last Time...
Back in your SSH interface:
23) edit the .htaccess file in the directory that was originally created for your subdomain. In my case, this is the ~/public_html/code/.htaccess file
24) after the line:
RewriteRule ^/?$ "http\:\/\/127\.0\.0\.1\:12012%{REQUEST_URI}" [P,QSA,L]

add the line:
RewriteRule ^(.*)$ "http\:\/\/127\.0\.0\.1\:12012%{REQUEST_URI}" [P,QSA,L]
NOTE: in the URLs above there is a port number: that port points to the Rails app and it may be different for your particular Redmine installation. Maybe it would be better to say copy the first line, paste it and then change the '^/?$' to '^(.*)$' in that newly pasted line. (Thanks to memebenegas for pointing this out).
25) save the file and exit back to the command line
26) send your browser to the subdomain you created and see your Redmine application running. Default username is 'admin', default password is 'admin'.

And that's that: you now have your own installation of Redmine. Your co-developers will appreciate you and your clients will love you.

Tuesday, November 17, 2009

Iterating with mysql_data_seek()

Looking back, it makes sense... but I've only just realised that the mysql_data_seek() function can be a real drag on your code.

Sometimes things just hit you like a brick: this happened to me just recently while looking through some code. Before I go too far, though, maybe a little back story is necessary.

There comes a time, it seems, that every PHP developer wants to write a DB access class. It almost seems to be a right of passage: everyone codes one and eventually ends up using another library written by someone else.

I was working on a piece of code that used such a class. For the longest time, this class worked well. It worked across a number of projects and became one of those "pieces of code that you just use all the time". In a relatively intense piece of code, we were butting up against a pretty heavy time cost: we were looping through a little over 100,000 rows, but this loop was taking around 15 minutes to complete.

We looked to make sure there was nothing wrong with the inner workings of the loop. Were we doing some other loop inside it, even accidentally? Maybe there was a function call that was taking a long time to complete? We worked through each possibility, ignoring the way we were actually grabbing the data.

The way we would grab the data is similar to what (I believe) a lot of other, homegrown access classes does: pass in a row number, seek to that row and fetch the result. This would let us skip rows, short circuit out of a loop or jump results more easily if the PHP logic needed it. The way we would loop over it would be to get a row count at the start then do a typical, for ($i = 0; ($i < $row_count); $i++) loop, calling a GetDataRow($i) within the loop. As I mentioned earlier, GetDataRow($i) would call mysql_data_seek($i) then mysql_fetch_assoc().

As we drilled down into our code, we started to realise our problem: things were just taking longer and longer to process, the further into the row count we went. This is probably why we never noticed anything: we had just never called this function on such a large result set.

Just to be sure, I tried starting the loop with $i equal to 50,000. The loop ran a bit quicker, but still that last 50,000 took a lot longer than the first 50,000. We had found our problem.

But what to do?

I did the same thing any sane PHP developer would do: hit the documentation at php.net. One, maybe two minutes later, I had all the data I needed to at least test something. It turns out that mysql_fetch_assoc() automatically moves the row pointer ahead one. Instead of explicitly calling the mysql_data_seek($i) for each $i, we only needed to continually call mysql_fetch_assoc() until it returned false.

While I found that interesting I couldn't see why this would be a problem. I mean surely the mysql_data_seek call would have a pretty good Big-O implementation... especially if we were meant to just be seeking to the row we were already on!

Unfortunately the data proved otherwise. Eventually it became obvious that in reality, we have a resource totally separated from the PHP process itself: there is no real way for us to just jump into a particular row.

So what kind of increase in performance did we see? To our absolute dismay, the 15 minute loop completed in approximately 4.5 seconds.

I know that that sounds incredible, but it's true. I've attached some code that should help to prove my point. If, however, you'd rather just see the numbers for yourself, they're here, too :)

Let me explain the numbers in the following table: we iterate through 100,000 rows, so l only show times for every 10,000 rows. Each number in the table is a count in seconds, from the start. That is to say, the number at the 30,000 row point is how long it took to get to those 30,000 rows. To see how long it took to get from one row to the next, you can see the number with the plus sign. The three different counts we do represent the three different tests we ran. "Iterating" is the time it took to get through the rows, without running mysql_data_seek for each row. "Seeking" is what happened when we did call mysql_data_seek and "Seeking (starting at 50000)" is, unsurprisingly, the times it took when calling mysql_data_seek, but starting at 50,000 rows. This test was run to see if the time was related to how many times we had called mysql_data_seek, or if it was related to which row number we were actually seeking to. As you can see, it's the latter.

At row number...	Iterating	Seeking	Seeking (starting at 50000)
10000	0.029 +0.029	0.358 +1.157	N/A
20000	0.058 +0.029	1.514 +3.481	N/A
30000	0.087 +0.029	4.995 +7.563	N/A
40000	0.116 +0.029	12.558 +10.838	N/A
50000	0.145 +0.029	23.396 +12.973	N/A +12.445
60000	0.175 +0.031	36.369 +14.185	12.445 +14.425
70000	0.206 +0.029	50.554 +16.671	26.869 +17.510
80000	0.235 +0.030	67.225 +19.212	44.379 +20.559
90000	0.264 +0.030	86.437 +21.122	64.938 +21.273
100000	0.294	107.559	86.211

If you're more of a visual person, here's a really basic graph:

For what it's worth, yes, the "Iterating" line is essentially flat. That's because of how much faster it was.

If you'd like to test this yourself, I've uploaded some code to http://files.codefromaway.net/mysql_data_seek.tar.gz : it can prepare some random data for you then run the tests so that you can compare the numbers yourself.

So that's that, and that's why I don't think I'll ever stop learning: it doesn't matter what you take for granted, some things will always work differently to what you expect...