Wednesday, July 20, 2011

Using real data to test

For the past two weeks or so I've been working on an idea for the micro-vendor space. As part of that project I needed to build a database of products which could be typically found at such stores. For this version of the script I only looked at Makro and Woolworths, unfortunately Pick 'n Pay just looked horrible. As a first run I used the following script (a django management command):
from django.core.management.base import BaseCommand, CommandError

from BeautifulSoup import BeautifulSoup
import urllib2, re, json

from decimal import *

WHOLESALERS = {
  "Makro" : {
    "Groceries": {
      "Carbonated Soft Drinks" : "http://www.makro.co.za/live/content.php?SortBy=1&ItemsPerPage=9999&Region=1&Action=catalog&Cat=58&Gifts=&catId=&Start=0&Images=0&Query=&ShowAll=1&Brand=&Extended=&Reduced=&Promo=&Session_ID=f9850a12c1942df9c77866b3bbf22654",
      "Confectionary & Beverage" : {
        "Snack" : "http://www.makro.co.za/live/content.php?SortBy=1&ItemsPerPage=9999&Region=1&Action=catalog&Cat=82&Gifts=&catId=&Start=0&Images=0&Query=&ShowAll=&Brand=&Extended=&Reduced=&Promo=&Session_ID=f9850a12c1942df9c77866b3bbf22654",
        "Confectionery": "http://www.makro.co.za/live/content.php?SortBy=1&ItemsPerPage=9999&Region=1&Action=catalog&Cat=84&Gifts=&catId=&Start=0&Images=0&Query=&ShowAll=1&Brand=&Extended=&Reduced=&Promo=&Session_ID=f9850a12c1942df9c77866b3bbf22654",
      }
    }
  },
  "Woolworths" : {
    "Food & Household" : {
      "Beverages" : {
        "Carbonated Drinks" : {
          "Cans" : "http://www.woolworths.co.za/store/browse/category.jsp?q_docSort=&categoryId=cat420030&addFacet=9004%3Acat420030&howMany=99999&q_pageNum=1&viewAll=false",
        }
      },
      "Snacks, Sweets & Biscuits" : {
        "Chips & Other Snacks" : {
          "Chips / Crisps" : "http://www.woolworths.co.za/store/browse/category.jsp?q_docSort=&categoryId=cat420218&addFacet=9004%3Acat420218&howMany=99999&q_pageNum=1&viewAll=false",
          "Snack Bars": "http://www.woolworths.co.za/store/browse/category.jsp?q_docSort=&categoryId=cat420220&addFacet=9004%3Acat420220&howMany=99999&q_pageNum=1&viewAll=false",
        },
        "Chocolate Bars & Boxes" : {
          "Boxes" : "http://www.woolworths.co.za/store/browse/category.jsp?q_docSort=&categoryId=cat420226&addFacet=9004%3Acat420226&howMany=99999&q_pageNum=1&viewAll=false",
          "Chocolate Bars" : "http://www.woolworths.co.za/store/browse/category.jsp?q_docSort=&categoryId=cat420224&addFacet=9004%3Acat420224&howMany=99999&q_pageNum=1&viewAll=false",
        },
        "Dried Fruit" : "http://www.woolworths.co.za/store/browse/category.jsp?q_docSort=&categoryId=cat200032&addFacet=9004%3Acat200032&howMany=99999&q_pageNum=1&viewAll=false",
        "Nuts" : "http://www.woolworths.co.za/store/browse/category.jsp?q_docSort=&categoryId=cat200026&addFacet=9004%3Acat200026&howMany=99999&q_pageNum=1&viewAll=false",
        "Popcorn": "http://www.woolworths.co.za/store/browse/category.jsp?q_docSort=&categoryId=cat200024&addFacet=9004%3Acat200024&howMany=99999&q_pageNum=1&viewAll=false",
      },
    }
  }
}

class Command(BaseCommand):
  args = '<output_file>'
  help = 'This command generates a json file which will eventually turn into a fixture file for import into database'

  def __init__(self, *args, **kwargs):
    super(Command, self).__init__(*args, **kwargs)
    self.opener = urllib2.build_opener()
    self.opener.addheaders = [('User-agent', 'Mozilla/5.0')]

  def parseMakroPage(self, url):
    soup = None
    import httplib
    while soup is None:
      # For some reason I kept getting IncompleteRead errors, this fixed it.
      try:
        page = self.opener.open(url)
        soup = BeautifulSoup(page.read())
        page.close()
      except (httplib.IncompleteRead, httplib.BadStatusLine), err:
        from time import sleep
        print "Read error occurred, sleeping for 1s then I will try again!"
        sleep(1)
    products = soup.findAll('table', attrs = { "background" : "/live/images/product_back.gif"})
    suffix = "http://www.makro.co.za"
    data = []
    for product in products:
      try:
        brand = product.find(attrs={'class' : 'style4'}).contents[0]
        # At least one product data entry is broken
        if len(brand) > 0:
          brand = brand.contents[0].strip()
        else:
          brand = ""
        variation = product.find(attrs={'class' : 'style4'}).contents[1].strip()
        sku = str(Decimal(product.find(attrs={"class" : "style20"})['href'].split('Sku=')[1].split('|')[0]))
        product_id = product.find(attrs={"class" : "style20"})['href'].split('ProdId=')[1].split('&')[0]
        link = "%s/%s" % (suffix, product.find(attrs={'class' : 'style4'})['href'].split('&')[0][1:])
        price = product.find(attrs={'class' : 'style5'}).contents[0].strip().split(' ')[1].strip()
      except IndexError:
        import pdb
        pdb.set_trace()
      price = str(Decimal(price).quantize(Decimal('.01'), rounding=ROUND_DOWN))
      print "%s [%s]:%s - %s R %s (%s)" % (product_id, sku, brand, variation, price, link) 
      data.append({
        'brand' : brand,
        'variation' : variation,
        'sku' : sku,
        'product_id' : product_id,
        'link' : link,
        'price' : price,
      })
    return data
      
  def parseWoolworthsPage(self, url):
    page = self.opener.open(url)
    soup = BeautifulSoup(page.read())
    page.close()
    products = soup.findAll('div', attrs = { "class" : "itemcontainerWW" })
    suffix = "http://www.woolworths.co.za"
    data = []
    for product in products:
      name = product.find(attrs = { "class" : "itemheader" }).a.contents[0].strip()
      link = "%s/%s" % (suffix, product.find(attrs = { "class" : "itemheader" }).a['href'][1:])
      product_id = link.split('=')[1]
      price = product.find(attrs = { "class" : "itemprice_strike" }).contents[0].strip().split(' ')[1].strip()
      price = str(Decimal(price).quantize(Decimal('.01'), rounding=ROUND_DOWN))
      print "%s: %s R %s (%s)" % (product_id, name, price, link) 
      data.append({
        'name' : name,
        'link' : link,
        'product_id' : product_id,
        'price' : price,
      })
    return data

  def recurseWholesalers(self, obj, parse_callback, categories=[], products = []):
    if isinstance(obj, dict):
      for k in obj.keys():
        new_categories = list(categories)
        new_categories.append(k)
        self.recurseWholesalers(obj[k], parse_callback, new_categories, products)
    else:
      newProducts = parse_callback(obj)
      for product in newProducts:
        product['categories'] = categories
      products.extend(newProducts)

  def handle(self, *args, **options):
    if len(args) == 1:
      woolworthsProducts = []
      makroProducts = []
      self.recurseWholesalers(WHOLESALERS['Woolworths'], self.parseWoolworthsPage, [], woolworthsProducts)
      self.recurseWholesalers(WHOLESALERS['Makro'], self.parseMakroPage, [], makroProducts)
      products = {
        'Woolworths': woolworthsProducts,
        'Makro': makroProducts,
      }
      filename = args[0]
      with open(filename, mode='w') as f: 
        json.dump(products, f, indent=2)
    else:
      print "You need to specify the output filename"

It was a fun exercise and perhaps it will help someone out there. There are still a couple of difficulties such as identifying the same products at different wholesalers, handling product variations, e.g. flavour, size, etc.

Tuesday, August 31, 2010

UCT Python Course - Part 3

What I really wanted to discuss about the Python course when I started these blog posts was a really cool experience I had when I set a couple of rowdy students a challenging but fun task. On the last day of the course I noticed two groups of students who were throwing paper around and looking generally bored. So I decided to be a bit proactive and manage the situation so that they weren't a distraction to the other students.

In order to assess the progress of the one group I asked them to implement a simple email spam proofing scheme where they have to replace "@" with " AT " and "." with " DOT ". Of course the students responded that they didn't want to do it now, however after a bit of coaxing I got them to do it. I fully expected them to struggle a bit with this task but I was pleasantly surprised to find they had no real trouble completing it and easy produced something like:
print 'email_address@domain.co.za'.replace(
  '@', ' AT ').replace('.', ' DOT ')

So this is where things get interesting. I had been joking with the other tutors about giving some of the advanced students hangman to do as a challenge (I assume everyone is familiar with the game). So I decided to get this group of students to do it instead. Now I was a bit worried that it was too difficult a problem for them - but then the gears in my head started turning and I began thinking of how I could break the problem down into a set of simpler problems or steps if you prefer.

So to start off I asked them to create the ASCII art of the final "hanged man". After much grumbling (the students thought I was punishing them for being noisy for some reason) I got them to start creating the man in their python file - this is when they started to get excited! It ended up looking something like this: 
|   |
|   O
|  /|\
|  / \
|
---------

Once they had the man drawn in the file without any code I asked them to run it. At first they were a bit confused why the program didn't work but soon figured out they had to add print statements and with a bit of help they got the character escaping working as well (Yes I know we should have used raw strings but they didn't work properly in the Wing IDE). So their program now looked something like this:

print '------   '
print '|   |    '
print '|   O    '
print '|  /|\\  '
print '|  / \\  '
print '|        '
print '---------'

Now that the students could draw the final version I asked them to create all the intermediate diagrams from gallows to the final diagram. After that I pointed out that we only want do display one diagram and asked them if there was any way to decide which version of the diagram to display. After a bit of discussion they suggested using if statements. I suggested they use a variable called incorrectGuess which they could update to choose which diagram to display. I also showed them how to turn it into a function called drawMan, since they were a bit vague on how functions worked I thought this would be a good chance to re-enforce the concept. I illustrated how raw_input worked and explained that it would be nice not to have to rewrite all those if statements everytime we wanted to draw the man - we could instead create a special program that did just that. After a bit of help the produced the following drawMan function:

def drawMan(incorrectGuess):
  if incorrectGuess == 0:
    print '------   '
    print '|        '
    print '|        '
    print '|        '
    print '|        '
    print '|        '
    print '---------'
  elif incorrectGuess == 1:
    print '------   '
    print '|   |    '
    print '|        '
    print '|        '
    print '|        '
    print '|        '
    print '---------'
#--------------------------- SNIP -------------------------

  elif incorrectGuess == 6:
    print '------   '
    print '|   |    '
    print '|   O    '
    print '|  /|\\  '
    print '|  /     '
    print '|        '
    print '---------'

  elif incorrectGuess == 7:
    print '------   '
    print '|   |    '
    print '|   O    '
    print '|  /|\\  '
    print '|  / \\  '
    print '|        '
    print '---------'
I ask them to test the function to see if it produced the correct diagrams when called with different incorrectGuess values and I think this really served to tie together the whole concept of a function as I think they finally understood why we use them.
drawMan(0)
drawMan(3)
drawMan(7)
Now that they had a function to do the drawing we had to start working on implementing the game logic - since we were running low on time I decided we could take a couple of shortcuts and hard code some values.

Firstly they created a variable to hold the word that had to be guessed. Next I explained how we could use lists (tried to use all the tools they were taught in this example) to keep track of all the letters that had been guessed, as well as, all the correct letters. I also explained how strings were a special type of list were the items were letters. With a bit of help and a reminder about the len function they were able to produce the following code to keep track of how well the player was doing:

word = 'introduction'
guessedLetters = []
correctLetters = []
incorrectLetters = 0
while incorrectLetters < 8 and len(correctLetters) < 8:
  incorrectLetters = len(guessedLetters) - len(correctLetters)
  drawMan(incorrectLetters)
  guess = raw_input('Enter a letter: ')
  if guess not in guessedLetters:
    guessedLetters.append(guess)
  if guess in word and guess not in correctLetters:
    correctLetters.append(guess)
    print 'Yes'
  else:
    print 'No'
  print correctLetters

Along the way I made some suggestions about adding print statements so they could see what was going on and after a couple of attempts the above was produced. Unfortunately we were now quite low on time - about 3 minutes left in the session - so I let them get away with hardcoding the 'len(correctLetters) < 8' condition instead of calculating the number of unique letters in the word (Yes in hindsight this is not the easiest way to do things I should have just made them keep track of the missing letters but it was a long weekend, I was under a bit of time pressure and was making it up as I went along).

So for all intents they had a working hangman game - the only thing that was really missing was drawing the word that they had to guess with underscores for the hidden characters. I debated whether to let them complete that at home but decided I wanted them to have something they could take away, show to their friends and get excited about. So I decided to help them write the last method and this is what I managed to come up with in the minute I had left:
def drawGuess(word, correctLetters):
  for i in word:
    if i in correctLetters:
      print i,
    else:
      print '_',
  print ''
Putting everything together we get a very simple but COOL hangman game:
def drawMan(incorrectGuess):
  if incorrectGuess == 0:
    print '------   '
    print '|        '
    print '|        '
    print '|        '
    print '|        ' 
    print '|        '
    print '---------'
  elif incorrectGuess == 1:
    print '------   '
    print '|   |    '
    print '|        '
    print '|        '
    print '|        ' 
    print '|        '
    print '---------'
  elif incorrectGuess == 2:
    print '------   '
    print '|   |    '
    print '|   O    '
    print '|        '
    print '|        ' 
    print '|        '
    print '---------'
  elif incorrectGuess == 3:
    print '------   '
    print '|   |    '
    print '|   O    '
    print '|   |    '
    print '|        ' 
    print '|        '
    print '---------'
  elif incorrectGuess == 4:
    print '------   '
    print '|   |    '
    print '|   O    '
    print '|  /|    '
    print '|        ' 
    print '|        '
    print '---------'
  elif incorrectGuess == 5:
    print '------   '
    print '|   |    '
    print '|   O    '
    print '|  /|\\  '
    print '|        ' 
    print '|        '
    print '---------'
  elif incorrectGuess == 6:
    print '------   '
    print '|   |    '
    print '|   O    '
    print '|  /|\\  '
    print '|  /     ' 
    print '|        '
    print '---------'
  elif incorrectGuess == 7:
    print '------   '
    print '|   |    '
    print '|   O    '
    print '|  /|\\  '
    print '|  / \\  ' 
    print '|        '
    print '---------'

def drawGuess(word, correctLetters):
  for i in word:
    if i in correctLetters:
      print i,
    else:
      print '_',
  print ''

word = 'introduction'
guessedLetters = []
correctLetters = []
incorrectLetters = 0
while incorrectLetters < 8 and len(correctLetters) < 8:
  incorrectLetters = len(guessedLetters) - len(correctLetters)
  drawMan(incorrectLetters)
  drawGuess(word, correctLetters )
  guess = raw_input('Enter a letter: ')
  if guess not in guessedLetters:
    guessedLetters.append(guess)
  if guess in word and guess not in correctLetters:
    correctLetters.append(guess)

Now of course there are far better ways to code the above but I think it was an amazing effort for kids who had never programmed before this weekend. Unfortunately I didn't have time to set them any extensions but I think I would have liked to suggest adding in a list of words and picking one at random instead of hardcoding 'introduction'. This would also mean fixing up all the other hardcoded values (looping condition for example). Another good idea would be to print out all the letters which had not been tried yet, I also think adding some sort of score table, perhaps with a multiplayer round-robin style format would be a great challenge question. I just hope the students feel confident enough to mess around with the example and have a bit of fun because in the end that is all this is really about.

Till next time ...

James Saunders' Blog: Fractals folding out new lands

Checkout James' cool HTML5 Fractals!

James Saunders' Blog: Fractals folding out new lands: "In keeping with my apparent intense dislike of doing anything constructive (like say my work or my masters), here is another fractal image...."

James Saunders' Blog: The Mandlebrot Set: "So I have been reading allot about fractals lately (you should start with this cool book), and of course the Everest of fractals is the Mand..."

Monday, August 30, 2010

UCT Python Course - Part 2

So this post is really more of an internal brain dump to all the tutors who were present during the course - everyone is welcome to comment and add their input but if you want to get to the exciting stuff skip to the next post.

So guys the main change I think we should make next time is to limit the extent to which we use the live terminal display. Instead of having it up the whole time I think we should only put it up when showing the students examples as this served as a bit of a crutch for the lecturers and a bit of a distraction for the students.

I also think that a lot of the students struggled initially with the idea of a variable, you must remember is that a lot of the kids are quite young and haven't really come to grips with basic algebra yet. Perhaps we should use a more visual demonstration of what a variable is - for example a jar or a box (with a label of course) which can only hold one value at a time. We could then illustrate different values as cut out paper strips with 1, 2, 3, "Hello World", etc. written on them and visualize setting variable's value by replacing the contents of the box with one of the paper strips, likewise looking up the value of a variable just becomes a problem of finding the box with the correct label and seeing what is inside it.

If we handle variables that way then lists are then just a special type of variable that can store more than one value at a time. We could ask students to create a list of their favourite food, movies etc. and get some of them to write them up on the board. We could follow that up by asking them if the item at the "top"of the list is their least or most favourite one. We can then explain "front" and "back" of a list by telling them that Computers just rotate the list onto it side so instead of "top" and "bottom" we have "front" and "back". Yeah I know it is a bit simple but I think it may go down better, what do you guys think?

We also slightly confused the issue by introducing different types (string, float, int) as well as records (yeah I know tuples in Python) we should have just have kept it simple - remember the KISS principle guys!

In terms of the sequence of topics, I would introduce functions really early on (bear with me) even before we get to conditionals. So something along the lines of output, input, functions, conditionals and the rest. I think treating functions as a bit of magic is a mistake as it just confuses them later on.

I think we can describe them as a special type of program that lives inside their main program and doesn't do anything until they call it (I'm sure we can come up with a good metaphor for this - several spring to mind).  To reinforce the idea of a function we could ask the students to create a special raw_input method called my_raw_input which prints 3 starts, asks the user for a number and then prints 3 more stars, so something like this:
def my_raw_input_1():
   print "*" * 3
   raw_input("enter a number: ")
   print "*" * 3


We can extend this idea by getting them to add a parameter to control how many stars are printed e.g.:
def my_raw_input_2(numStars):
   print "*" * numStars
   raw_input("enter a number: ")
   print "*" * numStars


Another one to control what the user prompt is, etc. until they are comfortable with the concept. We should do this in an interactive fashion i.e. give them the task and then go through a solution on the board and ask for questions or alternatives.

Another things I think we should consider is getting the students to group together for these interactive tasks, hopefully this will encourage peer-learning.

In terms of the tutoring process I think we do need to micromanage students a bit more. A lot of kids are embarrassed to ask for help so they do fall behind and then they start to become a bit rowdy - we should get tutors to be a bit more proactive with students that have blank screens and not only help the students who ask questions. I know varsity students somehow learn one of life's key lessons that asking questions when you don't understand something doesn't make you stupid (in fact it makes you smart) but for some reasons school tends to teach the opposite (WTF?)

I also want to throw the idea out for discussion of having some sort of fun unifying example for students to work on. Something we can develop throughout the course where they can apply all the tools they learn in a fun way. I would like to suggest something like my hangman exercise which went down really well (I had students arrive at the group asking if we were REALLY making a hangman game - they were so excited!) Anyway more on that in the next post!

As we all know this course is a bit of an ongoing process (I think we have now run it 3 times now?) and each time we learn something new about how to teach programming - which for me is one of the really exciting parts.

On a similar note I was also chatting to Stefano and a couple of guys in the lab and it seems that there is a lot of interest at UCT in offering a Python boot camp. Yes Marco I know we also chatted about this previously - but I know Rudy for example was looking for people to tutor Python to the Geomatics crowd and I am sure the Bioinfomatics and Chemistry crowd are also quite keen. Perhaps we should approach the Science/Engineering faculty about offering a Vac or Summer school style bootcamp in Python "literacy" (enough programming in Python to do useful things - but not all the theory we cover in CS) - so more like practical programming?

Anyway stay tuned to this post - I may be updating it with a couple more ideas once I've worked them out.

Till next time ...

UCT Python Course - Part 1

So this weekend Project Umonya (Umonya means Python in Zula) hosted a Python programming boot camp at the University of Cape Town for Grade 6 - 12 students who have never programmed before.

There were roughly 110 High School students signed up for the course with over 90 students attending for all 3 days.

The course could be called a Depth First introduction to Python as we really throw students in on the deep end. Not only do we cover all the basics such as sequencial, conditional and repetition statements but we also cover more advanced topics such as lists, string formatting and functions. What is amazing is how well all the students cope. I saw students come away from our course with a greater understanding of programming than the entire South African High School IT syllabus provides.

I think this can largely be attributed to our wonderful team of passionate tutors (some of whom are past South African Olympiad winners or top students in their degree programmes) as well as the ease with which python exposes these powerful programming concepts.

Speaking of our tutors, we had over 15 Tutors on hand for the entire weekend. They ranged from High School to Masters students, without them I don't think the course could have been the success it was. I must give a special shout out to Flora, Nina and Rizmari for showing all the young ladies who attended our course that Computer Science isn't a male only profession. I must also thank Stefano Rivera who took care of all the technical details and made sure everything just worked. He did all this while helping tutor, working on a pygame submission and hosting the Cape Town Ubuntu Global Jam (@mshuttleworth - Why haven't you hired him yet?) Also a big thank you to Marco Gallota who did most of the organization FROM Zurich - he even made the reservation for the Tutor dinner - and of course Michiel and Robert who ran things on the ground at UCT.

The course followed a fairly loose structure with the idea being that we give students a 15 minute or so lecture followed by some exercises. Obviously we have lots of room for improvement and I think there are a couple of topics we should have covered a bit better but on the whole I think this worked out rather well.

Overall the event was a huge success I think we received a lot of positive feedback from the kids, the parents and from Ewald from S1 (the event sponsor) who in his own personal capacity was on hand helping tutor the students and answering their questions about a career in software development - Big thank you to Ewald.

In Part 2 and Part 3 of this blog post I will mention some of the things we could have done better and my most positive experience for the whole weekend (getting some kids to program a hangman game!)

Till next time ...

Saturday, September 27, 2008

5FM iPhone Streaming

Well I've finally dived into the world of blogging, so I thought for my first post instead of introducing myself, I will give you guys in South Africa some useful information.

For those lucky enough to get hold of an iPhone 3G, here is an invaluable tip (at least for me):
  1. If you haven't already signed up for an iTunes store account do it now (there are loads of free apps for the iPhone, so signup even if you never plan to spend any money).
  2. To enable 5FM streaming, you need to download the excellent FStream application, it's free so don't worry. To do this just search for FStream in the application store.
  3. After you have synced you iPhone with iTunes and the application is installed, you need to setup the 5FM stream. 
  4. So start FStream up and tap the "Favorites" button at the bottom of the screen. Hit edit at the top left and of course select "Add new webradio".
  5. Enter whatever you like for the Name field e.g. "5FM".
  6. Enter the following for URL field, minus the quotation marks of course: "mms://196.35.64.36/5fm_22"
  7. Finally hit "Save" (top right) and go back to the "Play" menu by tapping the "Play" button at the bottom of the screen.
  8. Select the entry which matches the Name you previously created and 5FM should start buffering.
  9. Optional, increase the Buffer size to large, this can be done by tapping the "More" button at the bottom of the screen.
Hope this helps someone.