Archive for the 'Personal Projects' Category
The idea came on inauguration day when someone was wondering how the stock market was doing. I had been to http://isobamapresident.com/ earlier in the day and the single serving site idea for the stock market came to mind.
It only took about 3 hours to create after I had the idea.
First 30 minutes: Find free data. Is there a feed or API?
Next 30: I couldn’t find anything useful so I decided to scrape a website using BeautifulSoup.
Next 60: I set up a Django project to put it all together. Created a model to hold the scraped data so I wouldn’t do it every page load. Created a simple view and layed out a template file and simple stylesheet.
Next 15: I found out how to add a method to manage.py so that I could call the scraping from the command line. python manage.py market_parse
Next 15: Debugging and adjusting. I hadn’t looked at the site yet, but there weren’t that many bugs.
Last 30 minutes: add crontab for every 20 minutes on weekdays from 9-5. It took a while to get going because I needed the full python path in the crontab.
I’ve been curious to learn more about screen scraping for some time. And then I heard about a python script that is great for parsing html. Since I’ve also been learning python, I thought now was the perfect time to explore some scraping.
In the past I had some trouble with using php to parse the magic the gathering official site for new card info when working on my mtg card database. I didn’t spend much time trying to figure that out, but using python I didn’t have a problem.
After copying Beautiful Soup to my python path I started typing in some python at the command line.
from BeautifulSoup import BeautifulSoup as BSoup import urllib url = 'http://ww2.wizards.com/gatherer/Index.aspx?setfilter=Shards%20of%20Alara&output=Spoiler' html = urllib.urlopen(url).read() soup = BSoup(html) for tr in soup.fetch('tr'): if tr.td: print tr.td.string
This would output all of the magic card names on the page (and some other stuff). Here is another example: getting image urls when knowing the value of the id attribute on the img tags.
url = 'http://ww2.wizards.com/gatherer/CardDetails.aspx?&id=175000' html = urllib.urlopen(url).read() soup = BSoup(html) for img in soup.findAll(id='_imgCardImage'): print img['src']
With a little more time I could get all the cards and their images and fill up my database. I just have to find the time now.
I just heard about Scrapy. Now I need to try it out with a project.
I ran into some trouble with python2.4 and the django code I was using. The previous server had 2.5 and I didn’t notice any problems, so I tried upgrading to 2.5 and changing which version of python Debian uses as default (this was on Debian Etch). I was having some difficulty getting a few of the site-packages to work with 2.5 by default (like mod_python), so I decided to move to Debian Lenny even though it isn’t as supported. While doing that I ran into a problem where it doesn’t work well with xfs and Amazon’s Elastic Block Store. They are looking into the matter, but while trying to figure that out, I realized that AWS doesn’t come with support. There is an extra package you have to purchase which starts at $100 a month.
That made Amazon look less awesome since I know I am going to need some support at some point. I decided to compare prices and features around again. I ended up revisiting Slicehost since I knew a lot more about setting up a server than I did before.
I posted the steps that I took to set up apache, mysql, django, and a few other things on a clean ubuntu machine on Code Spatter.
Now I have a WebFaction account for testing and subversion hosting and I’m using the Slicehost account for the live version of the site.
Subversion makes it easy to commit on one server and update on the other once it is stable. I should explore a distributed version control system like git since it might help out with this in the future.
Update October 21, 2008
The AWS developer community seems to be a good alternative to having direct support from amazon. The people there are knowledgeable and amazon reps post frequently. Here is a quote from someone at amazon about the issue I was having
We are still investigating the issue and will post an analysis a little later and a workaround. Basically the problem revolves around the interaction between very specific kernel versions, XFS and our version of Xen.
Even though my slice is running fine, I will still be keeping AWS in mind.
Update May 7, 2009
Some people have posted some solutions on the developer community. I haven’t tested them, but I will look into it if I need to use Amazon again.
Yesterday I dove into amazon’s web services to check it out as a solution for a project I’m working on. I followed a guide to setup django development server on a default amazon machine image to start off. Then I decided to go with a debian AMI and do a full production server. I used apt-get to install the newest versions of apache, python, mysql, mod_python, svn, and some others. Debian turned out to be a lot easier than some other flavors of linux I have used.
After getting the instance configured the way I wanted it, I saved an image of it to my storage bucket so I could bring it up at any time instead of paying ten cents an hour until I need it.
I was learning python and django earlier to build a social network. So far, I have created the ability for users to
- create an account with e-mail activation
- add other users as friends and confirm friendship that other users requested
- send/reply/forward messages
This was the base for a niche social network to be built upon.
Soon after completing those features, I discovered elgg. It’s an open source social network written in php. It can do all of those features and more. I am now looking into using that and modifying it for the original goal.
We’ve gone back to django since elgg wasn’t the easiest thing to modify. I was hoping they might have used a common php framework like cake or code igniter. More on the django developments in another post soon. On CodeSpatter I have posted about what I learned about Python, PIL, and Django working together.
Update November 12, 2008
If you are looking for an Open Source Social Network written in Django, Pinax is looking really good right now. They have combined many reusable django apps into one slick project. Cloud27 is set up as an example of all the features included in Pinax. The contact importing feature is one that I will be adding to my social app that I built before having knowledge of Pinax.
Code Spatter is a personal project that I started when I thought it would be useful to have a Weblog about projects and other things involving web development to be used by myself and other co-workers. It was also a chance to use CyTE for a practical application and start development on MorfU. Both are open source projects that I develop for.
Tragedy was a guild in World of Warcraft that had up to 40 members in a single raid event as often as 4-5 nights a week. There was a lot of information that needed to be saved from the raids. It was important to know which members attended them and which monsters were defeated that evening. The monsters would drop loot and it was necessary to know who received the loot. There was a game modification that would store all of this data, but there wasn’t an easy way to get this information onto the website.
All of my web development experience started with Pyrodius.com. I learned PHP and MySQL to allow the website to dynamically add movie reviews to the website and allow users to post their own reviews. I created a blog to display the news of the website before I even heard the word “blog”.
At the moment the site doesn’t have any activity, but I will still use it to learn and test out new software or ideas. I have installed a few versions of phpBB and MediaWiki to test out various ideas. CyTE is also installed there for testing.
There are many things I aspire to do with the site, however other projects have taken priority for the time being.
MorfU is a project that I conceived that will combine all of the features of wikis, blogs, and forums. The name is an anagram of forum and is pronounced like morph you.
Current the only development that has been done on it is with Codespatter which only has limited blog functionalities.
This is a module for CyTE that should be able to be packaged with any installation seamlessly. There hasn’t been much else to test this with as of yet.