Wednesday 2 April 2014

Sorting Big Os

    Today, we talk not about how to achieve a certain task, or implement a certain datatype. We don't talk about execution at all! Because today, we talk about something more practical than intellectual, something that matters to everything, be it coursework, studying, or computer science. Bear with me for a while as apparently my class section sucks at this unit. Today is about efficiency.

    We are revisiting a topic that we have encountered before - given a list of values, say, numbers, we want to sort it such that the new list is in order (ascending, descending, etc.). This is a very simple task - when faced with this sort of problem we would naturally use something like insertion or selection sort, and for the sake of simplicity we can use an algorithm like bubble sort, which is very easy to implement. These methods however, are extremely inefficient - when executed by a computer, each value must be compared to each other value for each iteration of the algorithm. This doesn't really matter for smallish lists, where our brains and computers have plenty enough processing power to sort the list with no trouble at all. But sometimes, we have BIG lists. Like, very big. Let's say with n = (a million), or n = (ten million), or anything so big and incomprehensible that we start to play with the number by adding and knocking off zeroes here and there.

    The efficiency of a sorting algorithm is roughly measured by the number of comparisons needed to sort the list compared to the list size, in a best or worst case (some algorithms work better depending on the features of the list e.g. number of repeated values). For something like selection sort, each time you add a value to the "sorted" part of the list, it must be compared to each value in the "unsorted" part, giving it a worst case efficiency of O(n2), increasing the processing time exponentially as n goes up. For very large lists, we have better options - algorithmns like quick sort and merge sort break up the list into smaller lists based on their size relative to a reference value in the list. This information can then be used when searching for a larger or smaller value - larger values will be in the "larger" sublist and smaller values will be in the "smaller" sublist, allowing us to ignore half the list each time we run a comparison. These "divide and conquer" algorithmns substantially reduce the increase in sorting time with n, putting it in the order of O(log2(n)). Of course, there is the initial step of dividing the list into sublists, so the average performance is usually more like O(nlog(n)). Or something. It's a bit hard to wrap your head around the details, but the point is, that's a lot better then O(n2) when n is big. So when n is big, one should strive to use more efficient algorithms, rather than just simple ones, because even if there are more steps (ie. sorting a list or tree into a binary search list or tree then running binary search, rather than attempting to run a search algorithm straight out), the difference in efficiency for large n is substantial.

Sunday 2 March 2014

Recursion

Sometimes in life, we have to do very repetitive tasks. We may have to apply the same operation to a number for an unspecified number of times until we get a result we like, or we may have to create a complex data structure comprised of many instances of a simpler data structure. We may have to search through a complex set of data, infinitely large, one piece at a time. We may to write blog posts we don't really want to write. All of the above can be very time-consuming, headache-inducing tasks when tackled with brute force; fortunately, for all except the latter, there is a simple and elegant solution, and that solution is recursion.

Recursion is basically the use of a method or operation that is defined in terms of itself, allowing the same method to be repeated indefinitely, storing the result of each iteration along the way until an end condition is met (hopefully). This allows very complicated data structures to be initiated and traversed with very few lines of code. For example, large trees and linked lists are really just made of smaller trees and linked lists, and can be traversed by repeating the function 'search self then search children,' which will be repeated until the end of the data structure. Many mathematical sequences and series can also be defined recursively; the integer set can be defined as a set that starts at 0, and every subsequent term is just the previous term + 1. Arithmetic and geometric series can also be easily defined, by adding or multiplying each term to get the subsequent term; geometric series are very useful IRL for calculating chance and probability, and arithmetic series are useful for filtering through orderly sets of data like a spreadsheet arranged in rows, with the results of each trial following the previous trial.

There are of course countless uses for recursion that I cannot discuss or have yet to realize. The true power of recursion lies in its ability to break down a complex problem into many iterations of smaller problem. Understanding recursion will allow us to easily solve the problems that lie before us, opening the door for even more complex problems and solutions. I will remember recursion as a very useful tool going forward, and I look forward to discovering better ways to make use of it.

Sunday 23 February 2014

Week 6

A2 is here, and I'm going to try and get a start on it while I'm still on reading week, and before I get busy again. It's a lot more to digest than A1 at first glance, especially since we have to design a set of classes to represent a data type without the guidance of docstrings from the functions that will use them. Our classes need to model a simple regex language, based on a ternary alphabet in expressions of up to binary complexity. We also need to include __eq__ and __repr__ methods for our regex classes, but it is very difficult to decide how to implement these without knowing how whatever code that uses our classes will be calling and initializing them. My partner and I are both very confused, but we will watch piazza and go to extra help sessions over the next few weeks if necessary to finish step 1 before it is due.

Friday 31 January 2014

Weak 4 (intentional)

    It is currently Friday, January 31st, as in the last day of January, as in there are no more days left in January, so it's now February and time for exam season to start and never end until May *cry*. It's a good day to start indenting paragraphs for my SLOG with 4 spaces like a boss. It's also the start of a new Lunar Year, the year of the horse. I guess I should put my hoof down and stop horsing around in school.

    This week's exercise was due yesterday. My program worked fine when I tested it, so I wrote them docstrings and submitted both parts. And then I just realized that I forgot to indent my docstring for E2b, which prevents the program from compiling. I hope I don't end up getting a 0 for that...

    I'm actually having trouble following what goes on in class. Maybe signing up for the evening class right after my anatomy practical wasn't such a good idea, since that always gets my brain fried just in time for lecture. Fortunately, the content isn't super hard yet, also Google/online Python documentation/Stack Overflow exist which together will save my life as I sit down, write code for 5 minutes, then work out error after syntax error for an hour (are there any questions that HAVEN'T been asked on stack overflow?). On the bright side, my code is looking a lot cleaner and more organized than the embarrassments to humanity I've tried to pass off as scripts at work (I 'work' for free in a human psychology lab). Hopefully I will have reason to sing further of the joys of Object-Oriented Programming in the coming weeks.
Object-Oriented Programming

Like Java, Python is an object-oriented language, which allows the creation of abstract, user-defined data types to store information and useful built-in functions. Being able to define concepts, name them, and define the information stored within them helps a lot with making code easier to read and understand, which in turn will help the programmer himself understand his own code by defining it in his own terms.

This allows the existence of more complicated programs in the same way that going to school makes you smarter. Teachers at school may define a complicated theory or mathematical concept by explaining the reasoning behind them in detail, then they will condense all of that reason under a name or formula. Taking all the reasoning behind the theory and condensing it under one name allows you to 'chunk' it into a piece of data that can be operated on - because a name will take much less of your attention span to think about than a full theory, you free up more of your mind to operate on that name in a broader context, which allows you to produce more complicated ideas.

Object-oriented code will look much the same way. Instead of having one gigantic block of text splatted onto one text file, you can now package smaller data types into single words or names, that are then associated with an underlying data type. For example, instead of looking at the size, speed, color, fuel capacities, models, etc. of each car laid out on a gigantic amalgamation text on a road, you may look at a road with many 'car' objects racing along, and then look specifically at one 'car' object to find all the data associated with it.

It is this packaging of complicated data types under a name, or 'object,' that makes object-oriented programming so powerful.