Skip to main content
Home
The Baheyeldin Dynasty
The journey for wisdom starts with knowledge
  • Home
  • About
  • Site Map
  • Contact

Solution To Drupal 4.6 Site Slowness Caused By URL Aliases

  1. Home

By Khalid on 2006/04/22 - 11:17, last updated 2006/04/24 - 09:53

A friend's site that runs Drupal 4.6 is becoming more popular by the month. With increased popularity, comes increased traffic, and the need for more resources.

Problem: Slow Site

After moving to a dedicated server, things were better for a while, then in late March to early April, users were reporting page load times that are too slow. I mean really slow, as in 25 seconds per page per request.

Hardware is not the issue here. This is a Debian Linux server with a dual Pentium 4 2.8GHz with 2GB or RAM.

Investigation: CPU Bound

After logging in and doing some investigation, I saw the following output from vmstat:

 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
10  0 126820 380848 127940 938080    0    0   110   239  386   498 97  3  0  0
 8  0 126820 357348 127976 938544    0    0    25   133  340   412 97  3  0  0
11  0 126820 362136 128004 938680    0    0     7    91  283   280 98  2  0  0
 5  0 126820 389988 128032 939192    0    0    32   137  342   392 98  2  0  0
 6  0 126820 417436 128060 939504    0    0    15   112  293   391 93  2  5  0
 2  0 126820 400424 128088 939816    0    0     9   133  327   444 98  2  0  0
 6  0 126820 365852 128132 940684    0    0    71   111  324   410 97  3  0  0
 0  0 126820 382328 128160 941568    0    0    70   121  345   368 89  2  8  0
 
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 5  0 126820 362820 129520 967064    0    0    73   149  395   324 97  3  0  0
 5  0 126820 348348 129572 967500    0    0    35    93  331   268 86  3 11  0
 1  0 126820 378004 129596 968004    0    0    36   128  326   299 83  3 14  0
 1  0 126820 408256 129632 968580    0    0    46    94  343   281 78  2 21  0
 3  0 126820 418516 129660 968840    0    0    16   107  392   339 66  3 30  0
 7  0 126820 348000 129696 969572    0    0    56   106  369   418 95  3  1  0
 5  0 126820 355796 129720 969840    0    0    13   109  307   259 93  3  5  0
 3  0 126820 387436 129748 970480    0    0    51   126  335   333 76  3 21  0
 8  0 126820 364508 129772 970908    0    0    33    96  335   281 96  3  1  0

As you can see, there is a large number of processes that are in the run queue, waiting for the CPU to be available. Meanwhile, the CPU idle time is close to non-existent.

This is confirmed by the output from this ps command:

 ps -el| grep '^. R' 
  R   999 26312 10623  2  78   0 -  6489 -      ?        00:00:08 libhttpd.ep
5 R   999 26349 10623  4  80   0 -  6761 -      ?        00:00:12 libhttpd.ep
5 R   999 26432 10623 11  80   0 -  6559 -      ?        00:00:20 libhttpd.ep
5 R   999 26487 10623  2  78   0 -  6800 -      ?        00:00:03 libhttpd.ep
5 R   999 26553 10623  2  78   0 -  6419 -      ?        00:00:01 libhttpd.ep
5 R   999 26559 10623 10  80   0 -  6191 -      ?        00:00:06 libhttpd.ep
5 R   999 26573 10623  8  79   0 -  6443 -      ?        00:00:04 libhttpd.ep
5 R   999 26579 10623 11  79   0 -  6481 -      ?        00:00:06 libhttpd.ep
5 R   999 26583 10623 10  80   0 -  6493 -      ?        00:00:05 libhttpd.ep

After I tailed the Apache log, I found that the amount of hits to node/feed and other */feed is amazing (several a second). This is due to the fact that the site's audience are mainly Mac users,  and they use all kinds of widgets to subscribe to RSS feeds. These widgets poll the site every so often, and when you have a lot of users doing this, the load on the site increases dramatically.

Step 1: Disabling Feeds

So, the first thing I did was modify modules/node.module to force node_feed() to return early, thus saving the load from the feeds. This improved things a bit, but not by a whole lot. Here is the vmstat output.

Tue Apr 18 20:31:08 PDT 2006
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0 101232 708352 147608 663464    0    0     0    96  217   155  8  0 92  0
 1  0 101232 708148 147612 663548    0    0     6    61  212   172 20  1 80  0
 1  0 101232 718048 147612 663632    0    0     2    70  207   185 28  0 71  0
 1  0 101232 710224 147612 663732    0    0     0    97  224   207 69  2 30  0
 1  0 101232 689008 147612 663836    0    0     0    88  229   294 56  1 43  0
 0  0 101232 646712 147620 663920    0    0     0    88  269   278 46  2 52  0
 0  0 101232 680540 147624 664016    0    0     0    91  215   240 52  1 47  0
 0  0 101232 701752 147624 664096    0    0     0    94  225   223 48  1 51  0
 3  0 101232 701444 147624 664144    0    0     0   111  220   204 36  1 64  0

Step 2: Zeroing In On the Real Issue

Next, we noticed that another site using Drupal 4.6 on the same server is running relatively fast, so it had to be something specific to the slow site (either code or database), and not hardware, operating system, or system software.

Then, we noticed that a local copy of the same Drupal installation with the same database on a single user machine is also very slow. So, the problem is not load related (number of users accessing the site simultaneously).

When using the devel.module on a local install using a Pentium III 550 MHz, it took the front page 25 to 30 seconds to load on a local LAN. The inner pages took 13 to 19 seconds. This is page generation time as reported by devel.module.

Step 3: Lots of Aliases

Upon checking the database, I found that there were 8479 aliases in the url_alias table.

The problem is that the url_alias table, which contains the search engine friendly URLs for all nodes and feeds (e.g. the_article_title instead of node/3), was getting loaded in memory and indexed as an associative array for each request.

This is fine when one has only a few hundred nodes (my sites have 600+ and 700+ with no problems). However, in this case, the 8400+ aliases in the table are too much processing per request. For every page load,  those were a) read from the database, and b) indexed as an array in memory, which caused the severe slow down for each  request.

There is an enhancement that was done for 4.7 to not load all the aliases, but get them from the database on-demand. This was backported to 4.6 in this issue, and specifically in this patch by Matt Westgate.

After applying this patch, which applies cleanly to 4.6.6 with an offset, the site was flying fast again. On the test Pentium III, the front page took 2.2 to 2.8 seconds, and inner pages took 1.2 to 1.5 seconds.

I applied the patch to the live site, and asked the site owner to run update.php.

Here is the vmstat output after the patch on the live server (dual Pentium 4).

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0 214300 133892 124580 1309908    1    3     2     2    4     1 41  1 58  0
 2  0 214300 129780 124592 1309976    0    0     2    82  230   387 17  1 82  0
 0  0 214300 120592 124596 1310036    0    0     1    77  215   420 20  1 79  0
 0  0 214300 117864 124596 1310104    0    0     0   103  242   486  7  1 92  0
 0  0 214300 117932 124604 1310148    0    0     0    64  202   328 16  1 82  0
 0  0 214300 107216 124608 1310184    0    0     0   102  223   272 12  1 87  0
 0  0 214300 109792 124608 1310188    0    0     0    98  208    77  0  0 99  0
 0  0 214300 109776 124612 1310200    0    0     0    39  202    94  1  0 99  0
 0  0 214300 117316 124616 1310204    0    0     0    37  204    58  0  0 99  0
 1  0 214300 123028 124616 1310212    0    0     0    35  198   125 13  0 87  0


procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0 214300 119920 124692 1311136    1    3     2     2    4     1 41  1 58  0
 0  0 214300 119852 124692 1311168    0    0     0    61  258   237  4  0 96  0
 0  0 214300 145376 124692 1311188    0    0     0    38  292   169  3  0 96  0
 0  0 214300 132276 124692 1311228    0    0     0    94  256   239 34  1 66  0
 1  0 214300 119664 124700 1311268    0    0     0    76  229   289  5  1 94  0
 1  0 214300 110760 124708 1311344    0    0     0   208  339   451 11  1 88  0
 0  0 214300 103772 124712 1311404    0    0     0    67  246   447  7  1 92  0
 1  0 214300  97004 124716 1311448    0    0     0    59  250   318 12  1 88  0
 1  0 214300 106728 124720 1311508    0    0     0    78  274   374  8  1 91  0
 0  0 214300 121628 124720 1311556    0    0     0    91  248   264 12  0 87  0

And the ps command shows no processes at all in the run queue:

 ps -el | grep "^. R"
0 R 59556 21340 3669 0 76 0 - 745 - pts/0 00:00:00 ps

Voila! Success at last!

Conclusion

If we ever roll Drupal 4.6.7, then this patch should be part of that release. It seems that  4.6 will live on for sometime, even after the imminent release of 4.7, perhaps in a few weeks.

Contents: 
Drupal
  • Add comment

Comments

adrian (not verified)

Definitely should be rolled in.

Mon, 2006/04/24 - 05:31

We patched our system at Bryght with this before we even released the first 4.6 version.

The only way we could concievably enable pathauto for our clients.

  • reply

Ramdak (not verified)

Rolling the patch into a 4.6 release

Tue, 2006/04/25 - 10:12

Well, despite several requests, Killes is on record saying he will not roll it in for 4.6.

Is this one of those things that requires a greater level of consensus to happen? I am not trying to stir the pot or any such thing, but it is really unfortunate this was deemed not necessary for 4.6.6.

  • reply

greggles (not verified)

Disabling feeds

Mon, 2006/04/24 - 09:42

Was disabling feeds was that just a test and you have re-enabled them or did you change the function in a way that makes it faster while keeping the functionality? If you just disable feeds I imagine that some of the site's users would be upset.

Great investigation and great article!

  • reply

Khalid

Thanks

Mon, 2006/04/24 - 09:56

Thanks for the typo. I fixed the article, and removed your comment about it so no one would get confused.

As for the feeds, I did not make it faster. I just disabled it. If I could make it faster, I would have contributed a patch. I think we should think of flat file caching for feeds or something of that sort. It is too much work done several times a second on some sites.

The site owner was desparate at that point, and we announced on the front page that RSS feeds will be disabled for a while while we trouble shoot the problem.

  • reply

mike3k (not verified)

Redirecting feeds

Mon, 2006/04/24 - 11:16

I use FeedBurner for my feeds. Over 75% of my traffic comes from RSS feeds, since NetNewsWire includes my feed as one of the defaults. I ended up creating a private feed URL which calls node_feed() and which only FeedBurner subscribes to, while node/feed is redirected to the feedburner feed.

At first I had node.module do the redirect itself, but I found I could reduce the load even further by using a redirect permanent in .htaccess.

  • reply

Current

Pandemic

  • COVID-19
  • Coronavirus

Search

Site map

Contents

  • Family
    • Khalid
    • Ancestry
    • Extended
  • Friends
  • Nokat نكت
  • Writings
    • Cooking
    • Culture
    • Science
    • History
    • Linguistics
    • Media
    • Literature
    • Politics
    • Humor
    • Terrorism
    • Business
    • Philosophy
    • Religion
    • Children
  • Technology
    • Linux
    • Arabization
    • Drupal
      • Association
    • Software
    • Internet
    • Technology in Society
    • Digital Archeology
    • NCR History
    • MidEast Internet
    • Programming
    • Saudi ISPs
    • Miscellaneous
  • Places
    • Canada
      • Weather
    • Egypt
      • Cuisine
      • Alexandria
      • E.G.C.
    • USA
    • Saudi Arabia
  • Interests
    • Astronomy
    • Fishing
    • Photography
    • Snorkeling
    • Nature
    • Photomicroscopy
  • Miscellany

In Depth

  • al-Hakim bi Amr Allah: Fatimid Caliph of Egypt الحاكم بأمر الله
  • Alexandria, Egypt
  • Arabic on the Internet
  • Articles on the history of Muslims and Arabs in the Iberian Peninsula تاريخ المسلمين و العرب في الأند
  • DIY GOTO Telescope Controller With Autoguiding and Periodic Error Correction
  • E.G.C. English Girls College in Alexandria, Egypt
  • Egyptian Cuisine, Food and Recipes مأكولات مصرية
  • George Saliba: Seeking the Origins of Modern Science?
  • Internet Scams and Fraud
  • Mistaken for an Arab or Muslim: Absurdities of being a victim in the War on Terror
  • Mistaken Identity: How some people confuse my site for others
  • One People's Terrorist Is Another People's Freedom Fighter
  • Overview of Google's Technologies
  • Photomicroscopy
  • Pseudoscience: Lots of it around ...
  • Resources for using Google Adsense with Drupal
  • Rockwood Conservation Area, Southern Ontario
  • Selected Symbolic Novels And Movies
  • Snorkeling the Red Sea near Jeddah
  • Updates and Thoughts on the Egyptian Revolution of 2011

Recent Content

Most recent articles on the site.

  • Origin Of COVID-19: Natural Spillover, Lab Leak Or Biological Weapon?
  • Kamal Salibi and the "Israel from Yemen" theory
  • How To Upgrade HomeAssistant Core In A Python Venv Using uv
  • Ancestry - Paternal Side
  • Review of Wait Water Saver For Whole House Humidifiers
more

Most Comments

Most commented on articles ...

  • Another scam via Craigslist: offering more than asking price
  • Warning to female tourists thinking of marrying Egyptians
  • Craigslist classified for used car: Cheque fraud scam
  • Winning the lottery scam email: World Cup South African lottery
  • Email Scam: BMW 5 Series car and lottery winning
more

About Khalid

Various little bits of information ...

  • Khalid Baheyeldin: brief biography
  • Presentations and Talks
  • Youtube Videos
  • GitHub Projects
  • Drupal.org Profile
  • Astrophotography @ Flickr

Sponsored Links

Your Link Ad Here

Tags

Android Mobile Ubuntu Sony OnStep OpenWRT Router Ericsson COVID-19 Rogers Coronavirus Arabic Kubuntu Home Assistant GSM Telescope tablet Spectrum Scam Python 419 Laptop Firefox DIY CPU Conspiracy Comet Balkanization backup App
More

© Copyright 1999-2025 The Baheyeldin Dynasty. All rights reserved.
You can use our content under the Terms of Use.
Please read our privacy policy before you post any information on this site.
All posted articles and comments are copyright by their owner, and reflect their own views and opinions, which may not necessarily be consistent with the views and opinions of the owners of The Baheyeldin Dynasty.

Web site developed by 2bits.com Inc.