Optimized for Speed

posted on May 5th, 2009 by Greg in Greg's Posts on Fedorable

News travels fast. This has always been the rule, even when fast was “on horseback.”

Now, fast is measured in milliseconds.

To keep up with our recent traffic increase, I’ve been taking steps to make the site continue to load as fast as possible. Yahoo has developed a firefox extension called YSlow. It analyzes all of the traffic from a website and gives a score on a few categories where improvements can be made.

I decided to start with making the style sheets and javascript files load faster since they are an easy target for optimization. The framework that we use for IvyLees is django and a few people have created an application that we can plug in to help us compress some files.

After setting up django-compress, a website will have css and js files that are minified (excess white space and characters are removed to reduce file size). The application will also give the files version numbers so that they can be cached by the web browser and won’t need to be downloaded again until a change is made and a new version of the file is created.

For the site, this means faster general usage. For news releases, it means they’ll load even faster– but every one after the first will be even faster than that.

We’ve also upgraded and installed some things on the server to increase performance. I’ve written about the technical details of setting up nginx, memcached, and using gzip compression over at Code Spatter, so take a look over there more tech-heavy info.

Designing a web framework: Django’s design decisions

posted on May 4th, 2009 by PyromanX in Greg's Bookmarks on Delicious

Pinax: a platform for rapidly developing websites

posted on May 4th, 2009 by PyromanX in Greg's Bookmarks on Delicious

Multistage Django deployments: Part 1 – sharebear.co.uk

posted on May 4th, 2009 by PyromanX in Greg's Bookmarks on Delicious

Testmaker .002 (Even easier automated testing in Django) | Surfing in Kansas

posted on May 4th, 2009 by PyromanX in Greg's Bookmarks on Delicious

Big list of Django tips (and some python tips too) | Surfing in Kansas

posted on May 4th, 2009 by PyromanX in Greg's Bookmarks on Delicious

How to Speed up Your Django Sites with NginX, Memcached, and django-compress

posted on April 23rd, 2009 by Greg Allard in Greg's Posts on Code Spatter

A lot of these steps will speed up any kind of application, not just django projects, but there are a few django specific things. Everything has been tested on
IvyLees which is running in a Debian/Ubuntu environment.

These three simple steps will speed up your server and allow it to handle more traffic.

Reducing the Number of HTTP Requests

Yahoo has developed a
firefox extension called
YSlow. It analyzes all of the traffic from a website and gives a score on a few categories where improvements can be made.

It recommends reducing all of your css files into one file and all of your js files into one file or as few as possible. There is a pluggable, open source django application available to help with that task. After setting up
django-compress, a website will have css and js files that are minified (excess white space and characters are removed to reduce file size). The application will also give the files version numbers so that they can be cached by the web browser and won’t need to be downloaded again until a change is made and a new version of the file is created.
How to setup the server to set a far future expiration is shown below in the lightweight server section.

Setting up Memcached

Django makes it really simple to set up caching backends and memcached is easy to install.

sudo aptitude install memcached, python-setuptools

We will need setuptools so that we can do the following command.

sudo easy_install python-memcached

Once that is done you can start the memcached server by doing the following:

sudo memcached -d -u www-data -p 11211 -m 64

-d will start it in daemon mode, -u is the user for it to run as, -p is the port, and -m is the maximum number of megabytes of memory to use.

Now open up the settings.py file for your project and add the following line:

CACHE_BACKEND = 'memcached://127.0.0.1:11211/'

Find the MIDDLEWARE_CLASSES section and add this to the beginning of the list:

    'django.middleware.cache.UpdateCacheMiddleware',

and this to the end of the list:

    'django.middleware.cache.FetchFromCacheMiddleware',

For more about caching with django see the
django docs on caching. You can reload the server now to try it out.

sudo /etc/init.d/apache2 reload

To make sure that memcached is set up correctly you can telnet into it and get some statistics.

telnet localhost 11211

Once you are in type stats and it will show some information (press ctrl ] and then ctrl d to exit). If there are too many zeroes, it either isn’t working or you haven’t visited your site since the caching was set up. See
the memcached site for more information.

Don’t Use Apache for Static Files

Apache has some overhead involved that makes it good for serving php, python, or ruby applications, but you do not need that for static files like your images, style sheets, and javascript. There are a few options for lightweight servers that you can put in front of apache to handle the static files.
Lighttpd (lighty) and
nginx (engine x) are two good options. Adding this layer in front of your application will act as an application firewall so there is a security bonus to the speed bonus.

There is this guide to
install a django setup with nginx and apache from scratch. If you followed
my guide to set up your server or already have apache set up for your application, then there are a few steps to get nginx handling your static files.

sudo aptitude install nginx

Edit the config file for your site (sudo nano /etc/apache2/sites-available/default) and change the port from 80 to 8080 and change the ip address (might be *) to 127.0.0.1. The lines will look like the following

NameVirtualHost 127.0.0.1:8080
<VirtualHost 127.0.0.1:8080>

Also edit the ports.conf file (sudo nano /etc/apache2/ports.conf) so that it will listen on 8080.

Listen 8080

Don’t restart the server yet, you want to configure nginx first. Edit the default nginx config file (sudo nano /etc/nginx/sites-available/default) and find where it says

        location / {
               root   /var/www/nginx-default;
               index  index.html index.htm;
        }

and replace it with

location / {
    proxy_pass http://192.168.0.180:8080;
    proxy_redirect off;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    client_max_body_size 10m;
    client_body_buffer_size 128k;
    proxy_connect_timeout 90;
    proxy_send_timeout 90;
    proxy_read_timeout 90;
    proxy_buffer_size 4k;
    proxy_buffers 4 32k;
    proxy_busy_buffers_size 64k;
    proxy_temp_file_write_size 64k; 
}
location /files/ {
    root /var/www/myproject/;
    expires max;
}

/files/ is where I’ve stored all of my static files and /var/www/myproject/ is where my project lives and it contains the files directory.

Set static files to expire far in the future

expires max; will tell your users’ browsers to cache the files from that directory for a long time. Only use that if you are use those files won’t change. You can use expires 24h; if you aren’t sure.

Configure gzip

Edit the nginx configuration to use gzip on all of your static files (sudo nano /etc/nginx/nginx.conf). Where it says gzip on; make sure it looks like the following:

    gzip  on;
    gzip_comp_level 2;
    gzip_proxied any;
    gzip_types      text/plain text/html text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;

The servers should be ready to be restarted.

sudo /etc/init.d/apache2 reload
sudo /etc/init.d/nginx reload

If you are having any problems I suggest reading through
this guide and seeing if you have something set up differently.

Speedy Django Sites

Those three steps should speed up your server and allow for more simultaneous visitors. There is a lot more that can be done, but getting these three easy things out of the way first is a good start.

Related posts:

  1. Static Files in Django on Production and Development Update 2009-03-25 I realize why this isn’t needed. If your…
  2. Python Projects in Users’ Home Directories with wsgi Letting users put static files and php files in a…
  3. Setting up Apache2, mod_python, MySQL, and Django on Debian Lenny or Ubuntu Hardy Heron Both Debian and Ubuntu make it really simple to get…

How to Add Locations to Python Path for Reusable Django Apps

posted on April 10th, 2009 by Greg Allard in Greg's Posts on Code Spatter

In my
previous post I talk about reusable apps, but I don’t really explain it that much. If you have an app that might be useful in another project, it’s best to not refer to the project name in the application so you don’t have to search and remove it when adding to another project. To never refer to your project name in your app’s code, you will need to put your app on the python path. I usually do project_folder/apps/app_folder so apps will need to be a location that python is checking when you are importing so that importing looks like the following:

from appname.filename import foo

There are a few places you might need to add an apps folder to the pythonpath.

Add to settings.py

This will add the apps directory in your project directory to the beginning of the path list. This will allow manage.py syncdb and manage.py runserver to know that the apps folder should be added.

import os
import sys
PROJECT_ROOT = os.path.dirname(__file__)
sys.path.insert(0, os.path.join(PROJECT_ROOT, "apps"))

That should be all you need to do to get most django projects working with your reusable apps, but if for any reason you need add to the path with mod python or mod wsgi, the following should work.

Apache mod_python

In the
setting-up-everything post I show an example httpd.conf file. In your apache configuration you will probably see something similar to what is below. To add the location /var/www/myproject/apps to the PythonPath I added it in the list.

SetHandler python-program
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE myproject.settings
PythonOption django.root /myproject
PythonDebug On
PythonPath "['/var/www','/var/www/myproject/apps'] + sys.path"

Apache mod_wsgi

If you use mod wsgi instead of mod python, your apache config will be loading a wsgi file with a line like this WSGIScriptAlias /var/www/myproject/myproject.wsgi. You will need to edit that file to add to the path (django’s site has an
example file).

sys.path.append('/var/www/myproject/apps')

You never know when you might want to use an app in another project, so always try to keep from mentioning the project name anywhere in the applications.

Related posts:

  1. Python Projects in Users’ Home Directories with wsgi Letting users put static files and php files in a…
  2. How to Write Reusable Apps for Pinax and Django Pinax is a collection of reusable django apps that…
  3. Getting Basecamp API Working with Python I found one library that was linked everywhere, but it…

Using Sitemaps to Help Google Index new Pages

posted on April 1st, 2009 by Greg in Greg's Posts on Fedorable

Google finds new pages by following links when it crawls a currently indexed page. To get new pages showing in Google search results faster, websites can provide a sitemap.xml file that provides a link to every page on the site that should be in google’s index. In addition to that, websites can ping Google whenever the sitemap file is updated so that Google will know to check back and update its index.

We use the framework Django for ivylees.com and it provides an easy way to create sitemap files. Since the framework knows about all of our pages already, we only need to add a little bit of code to tell it how to generate the sitemap.xml file for us automatically. Django also makes it simple to ping google when there are updates.

With django we were able to make it as simple as possible for google to index new news releases as soon as possible.

A URL Shortening Service for UCF

posted on January 27th, 2009 by Greg in CDWS Projects

The idea for this site came when some co-workers and I were collecting our W2’s and found letters attached with some HR information. One of them had a ridiculously long URL at the bottom of it (…Enroll%20in%20UCF%20Retirement%20Plans…). Before seeing that URL I hadn’t thought of the convenience of services like TinyURL outside of the internet. We realized it would be simple enough to write one that is specific to UCF.

We decided on a few simple features and jumped into pair programming the site with django.  Since Jason was new to Django, he obvserved while I drove. It took less than the rest of the day to finish the site.

Features

Only allows domains that we specify

We created a custom Form Field to accomplish this and made it able to accept a tuple of allowed domains. If anyone needs this on their site they can use the following code.

from django import forms
 
class URLDomainField(forms.URLField):
    domain = ''
 
    def __init__(self, *args, **kwargs):
        # set domain to passed value or default
        self.domain = kwargs.get('domain', ('gregallard.com', 'isthemarketdown.com', 'codespatter.com'))
 
        # remove from list if exists
        try:
            del kwargs['domain']
        except:
            pass
 
        # call parent init
        super(URLDomainField, self).__init__(*args, **kwargs)
 
    def clean(self, value):
        # call parent clean
        value = super(URLDomainField, self).clean(value)
 
        from urlparse import urlparse
        o = urlparse(value)
 
        # endswith accepts tuples and will try them all and will return false if none match
        if not o.hostname.endswith(self.domain):
            raise forms.ValidationError('%s is not a valid url! %s domains only.' % (value, self.domain))
 
        return value

The code to use this would look like this:

class LinkForm(forms.Form):
    url   = URLDomainField(domain=('ivylees.com', 'ucf.edu'))

Automatically creates 5 character alphanumeric string

This method in the model creates a string and makes sure it isn’t in use yet:

    def make_short(self):
        from random import Random
        import string
        cool = False
        while not cool:
            self.short = ''.join( Random().sample(string.letters+string.digits, 5) )
            try:
                r = Link.objects.get(short=self.short)
            except Link.DoesNotExist:
                if self.short != "admin" and self.short != "thank":
                    cool = True

Allows for custom strings

By default it will create a 5 character alphanumeric string to go at the end of the URL, however we added a form field to allow users to specify their own string so that the URL might have more meaning. To strip non alphanumeric characters, we created a simple clean method in the model:

    def clean_short(self):
        import re
        # ^ as first character inside [] negates the set
        # find everything that isn't alphanumeric or a -
        self.short = re.sub('[^\w|\-]', '_', self.short)

Won’t create more short links

If a URL has been submitted before, the site will not create an extra URL for it, instead it will return the existing one to the user. To do this, we added some functionality to the save method:

    def save(self, **kwargs):
        link = Link.objects.filter(url=self.url)[:1]
 
        # if one exists, return it, otherwise save it
        if link:
            # there should be a better way to do this
            # but self = link doesn't work
            self.url   = link[0].url
            self.short = link[0].short
            self.created = link[0].created
            self.id = link[0].id
        else:
            if self.short == '':
                self.make_short()
            else:
                self.clean_short()
            super(Link, self).save(**kwargs)

Just a Prototype

We just wanted to create something simple as a prototype so that hopefully some of the higher-ups will like the idea and we can put it into production.

Pair Programming

This was the first time I had any experience with pair programming and I definitely think it’s a great idea. Jason learned a lot about django, caught my mistakes, and pointed out other things. I solidified my knowledge by showing him what I knew and we both learned some valuable things. For example: using print foo will be displayed in the command window when you are using the django development server. I foresee more pair programming in my future.

Update 2009-01-28

Tim recommended that I remove the chance of profanity to be automatically generated for the url and suggested removing all vowels so that no words will be there. This is the line I added to achieve that.

letters = re.sub('a|e|i|o|u|A|E|I|O|U', '', string.letters)