Author Archive

Relentlessly Resourceful

posted on March 9th, 2009 by Greg in Personal Projects

I read Paul Graham’s essay Be Relentlessly Resourceful and liked the term so much I bought relentlesslyresourceful.com and pointed it here. I don’t think that is an example of what he meant by relentlessly resourceful. It’s more like an example of a domain hoarder. I have way too many.

A URL Shortening Service for UCF

posted on January 27th, 2009 by Greg in CDWS Projects

The idea for this site came when some co-workers and I were collecting our W2’s and found letters attached with some HR information. One of them had a ridiculously long URL at the bottom of it (…Enroll%20in%20UCF%20Retirement%20Plans…). Before seeing that URL I hadn’t thought of the convenience of services like TinyURL outside of the internet. We realized it would be simple enough to write one that is specific to UCF.

We decided on a few simple features and jumped into pair programming the site with django.  Since Jason was new to Django, he obvserved while I drove. It took less than the rest of the day to finish the site.

Features

Only allows domains that we specify

We created a custom Form Field to accomplish this and made it able to accept a tuple of allowed domains. If anyone needs this on their site they can use the following code.

from django import forms
 
class URLDomainField(forms.URLField):
    domain = ''
 
    def __init__(self, *args, **kwargs):
        # set domain to passed value or default
        self.domain = kwargs.get('domain', ('gregallard.com', 'isthemarketdown.com', 'codespatter.com'))
 
        # remove from list if exists
        try:
            del kwargs['domain']
        except:
            pass
 
        # call parent init
        super(URLDomainField, self).__init__(*args, **kwargs)
 
    def clean(self, value):
        # call parent clean
        value = super(URLDomainField, self).clean(value)
 
        from urlparse import urlparse
        o = urlparse(value)
 
        # endswith accepts tuples and will try them all and will return false if none match
        if not o.hostname.endswith(self.domain):
            raise forms.ValidationError('%s is not a valid url! %s domains only.' % (value, self.domain))
 
        return value

The code to use this would look like this:

class LinkForm(forms.Form):
    url   = URLDomainField(domain=('ivylees.com', 'ucf.edu'))

Automatically creates 5 character alphanumeric string

This method in the model creates a string and makes sure it isn’t in use yet:

    def make_short(self):
        from random import Random
        import string
        cool = False
        while not cool:
            self.short = ''.join( Random().sample(string.letters+string.digits, 5) )
            try:
                r = Link.objects.get(short=self.short)
            except Link.DoesNotExist:
                if self.short != "admin" and self.short != "thank":
                    cool = True

Allows for custom strings

By default it will create a 5 character alphanumeric string to go at the end of the URL, however we added a form field to allow users to specify their own string so that the URL might have more meaning. To strip non alphanumeric characters, we created a simple clean method in the model:

    def clean_short(self):
        import re
        # ^ as first character inside [] negates the set
        # find everything that isn't alphanumeric or a -
        self.short = re.sub('[^\w|\-]', '_', self.short)

Won’t create more short links

If a URL has been submitted before, the site will not create an extra URL for it, instead it will return the existing one to the user. To do this, we added some functionality to the save method:

    def save(self, **kwargs):
        link = Link.objects.filter(url=self.url)[:1]
 
        # if one exists, return it, otherwise save it
        if link:
            # there should be a better way to do this
            # but self = link doesn't work
            self.url   = link[0].url
            self.short = link[0].short
            self.created = link[0].created
            self.id = link[0].id
        else:
            if self.short == '':
                self.make_short()
            else:
                self.clean_short()
            super(Link, self).save(**kwargs)

Just a Prototype

We just wanted to create something simple as a prototype so that hopefully some of the higher-ups will like the idea and we can put it into production.

Pair Programming

This was the first time I had any experience with pair programming and I definitely think it’s a great idea. Jason learned a lot about django, caught my mistakes, and pointed out other things. I solidified my knowledge by showing him what I knew and we both learned some valuable things. For example: using print foo will be displayed in the command window when you are using the django development server. I foresee more pair programming in my future.

Update 2009-01-28

Tim recommended that I remove the chance of profanity to be automatically generated for the url and suggested removing all vowels so that no words will be there. This is the line I added to achieve that.

letters = re.sub('a|e|i|o|u|A|E|I|O|U', '', string.letters)

How I Made isthemarketdown.com

posted on January 21st, 2009 by Greg in Personal Projects

The idea came on inauguration day when someone was wondering how the stock market was doing. I had been to http://isobamapresident.com/ earlier in the day and the single serving site idea for the stock market came to mind.

It only took about 3 hours to create after I had the idea.

First 30 minutes: Find free data. Is there a feed or API?
Next 30: I couldn’t find anything useful so I decided to scrape a website using BeautifulSoup.
Next 60: I set up a Django project to put it all together. Created a model to hold the scraped data so I wouldn’t do it every page load. Created a simple view and layed out a template file and simple stylesheet.
Next 15: I found out how to add a method to manage.py so that I could call the scraping from the command line. python manage.py market_parse
Next 15: Debugging and adjusting. I hadn’t looked at the site yet, but there weren’t that many bugs.
Last 30 minutes: add crontab for every 20 minutes on weekdays from 9-5. It took a while to get going because I needed the full python path in the crontab.

So, is the market down?

Using Beautiful Soup for Screen Scraping

posted on November 12th, 2008 by Greg in Personal Projects

I’ve been curious to learn more about screen scraping for some time. And then I heard about a python script that is great for parsing html. Since I’ve also been learning python, I thought now was the perfect time to explore some scraping.

In the past I had some trouble with using php to parse the magic the gathering official site for new card info when working on my mtg card database. I didn’t spend much time trying to figure that out, but using python I didn’t have a problem.

After copying Beautiful Soup to my python path I started typing in some python at the command line.

from BeautifulSoup import BeautifulSoup as BSoup
import urllib
url  = 'http://ww2.wizards.com/gatherer/Index.aspx?setfilter=Shards%20of%20Alara&output=Spoiler'
html = urllib.urlopen(url).read()
soup = BSoup(html)
for tr in soup.fetch('tr'):
    if tr.td:
        print tr.td.string

This would output all of the magic card names on the page (and some other stuff). Here is another example: getting image urls when knowing the value of the id attribute on the img tags.

url  = 'http://ww2.wizards.com/gatherer/CardDetails.aspx?&id=175000'
html = urllib.urlopen(url).read()
soup = BSoup(html)
for img in soup.findAll(id='_imgCardImage'):
    print img['src']

With a little more time I could get all the cards and their images and fill up my database. I just have to find the time now.

Update 12/28/8

I just heard about Scrapy. Now I need to try it out with a project.

Going with Slicehost Instead of AWS EC2

posted on October 14th, 2008 by Greg in Personal Projects

I ran into some trouble with python2.4 and the django code I was using. The previous server had 2.5 and I didn’t notice any problems, so I tried upgrading to 2.5 and changing which version of python Debian uses as default (this was on Debian Etch). I was having some difficulty getting a few of the site-packages to work with 2.5 by default (like mod_python), so I decided to move to Debian Lenny even though it isn’t as supported. While doing that I ran into a problem where it doesn’t work well with xfs and Amazon’s Elastic Block Store. They are looking into the matter, but while trying to figure that out, I realized that AWS doesn’t come with support. There is an extra package you have to purchase which starts at $100 a month.

That made Amazon look less awesome since I know I am going to need some support at some point. I decided to compare prices and features around again. I ended up revisiting Slicehost since I knew a lot more about setting up a server than I did before.

I posted the steps that I took to set up apache, mysql, django, and a few other things on a clean ubuntu machine on Code Spatter.

Now I have a WebFaction account for testing and subversion hosting and I’m using the Slicehost account for the live version of the site.

Subversion makes it easy to commit on one server and update on the other once it is stable. I should explore a distributed version control system like git since it might help out with this in the future.

Update October 21, 2008

The AWS developer community seems to be a good alternative to having direct support from amazon. The people there are knowledgeable and amazon reps post frequently. Here is a quote from someone at amazon about the issue I was having

We are still investigating the issue and will post an analysis a little later and a workaround.  Basically the problem revolves around the interaction between very specific kernel versions, XFS and our version of Xen.

Even though my slice is running fine, I will still be keeping AWS in mind.

Update May 7, 2009

Some people have posted some solutions on the developer community. I haven’t tested them, but I will look into it if I need to use Amazon again.

Main Page Updater for Emergencies

posted on October 3rd, 2008 by Greg in CDWS Projects

At a large institution like UCF, it is good to have a plan for emergencies. I set up a simple form that will update the main page at http://ucf.edu in an emergency so that important information can be realeased as fast as possible.

The main page is an html file that is copied every few minutes from our database driven application. This speeds up the website and cuts down on processor utilization considerably. A simple update to our cron job was added that checks if the site is in emergency mode and pulls from our other emergency page. This emergency page is created with a simple form and simple template file.

I created this page updater to be reliable and simple so that there is little turn arround time from emergency situation to information available. The form edits files in the filesystem instead of using a database that would require more complexity. There is a place to update the important information. That info is then put into the pre-built template when the user hits preview. Once the user is satisfied with the way it looks, there is a button to enable/disable the page. It updates the status that the cron job looks for and the main page will change in under a minute.

AWS, EBS, S3, EC2, Debian, Django, Apache, and mod_python

posted on September 23rd, 2008 by Greg in Personal Projects

Yesterday I dove into amazon’s web services to check it out as a solution for a project I’m working on. I followed a guide to setup django development server on a default amazon machine image to start off. Then I decided to go with a debian AMI and do a full production server. I used apt-get to install the newest versions of apache, python, mysql, mod_python, svn, and some others. Debian turned out to be a lot easier than some other flavors of linux I have used.

After getting the instance configured the way I wanted it, I saved an image of it to my storage bucket so I could bring it up at any time instead of paying ten cents an hour until I need it.

A recent post updates the Amazon Adventure.

Social Network Built with Django

posted on August 27th, 2008 by Greg in Personal Projects

I was learning python and django earlier to build a social network. So far, I have created the ability for users to

  • create an account with e-mail activation
  • login/out
  • add other users as friends and confirm friendship that other users requested
  • send/reply/forward messages

This was the base for a niche social network to be built upon.

Soon after completing those features, I discovered elgg. It’s an open source social network written in php. It can do all of those features and more. I am now looking into using that and modifying it for the original goal.

We’ve gone back to django since elgg wasn’t the easiest thing to modify. I was hoping they might have used a common php framework like cake or code igniter. More on the django developments in another post soon. On CodeSpatter I have posted about what I learned about Python, PIL, and Django working together.

Update November 12, 2008

If you are looking for an Open Source Social Network written in Django, Pinax is looking really good right now. They have combined many reusable django apps into one slick project. Cloud27 is set up as an example of all the features included in Pinax. The contact importing feature is one that I will be adding to my social app that I built before having knowledge of Pinax.

UCF.edu v4 Middle End

posted on June 23rd, 2008 by Greg in CDWS Projects

I say middle end (even though it’s not an end) since I didn’t work on the front-end skin or on the content-management back end. The new site was launched a few days ago (6/21/8) and uses an installment of InQuira Information Manager as the content management system. With the CMS comes a JSP tag library. I used the tag library to extract data from the CMS and format it in the front-end layout. I was also responsible for designing the structure of the channels, categories, and schema in InfoManager. Read the rest of this entry »

Information and Knowledge Engineering 2008

posted on April 16th, 2008 by Greg in Class Projects

With the help of Dr. Orooji, I have written a paper about the programming team website I created. It has been accepted to the Information and Knowledge Engineering Conference which is a part of the larger World Congress in Computer Science, Computer Engineering, and Applied Computing Conference (WorldComp). This year’s event (2008) will be held in Las Vegas from July 14 to July 17.

Along with getting the paper published, I will have a 20 minute slot for a presentation.

Links and more information will be posted here as it develops.

Updates

The times for the presentations have been set.  There is a pdf file containing all of the different conferences and their presentations. I am on page 132 which shows me presenting on the first day at 4pm.