Saturday 28 March 2015

That Post Where I Revist An Earlier Blog Post, Reflect On How Much I've Grown, And We All Get Sentimental

This is not a SLOG I looked forward to writing. Now, as my massive hoard of readers has probably noticed, I am not a socialite. I just don't like to talk about myself very much. If you skim through my many posts, you'll notice a reoccurring theme- they're mostly concerned with my comprehension of the concepts covered in the class, not so much on my experiences regarding them. And, if I can be even more blunt, I only interact with other SLOG's to get the participation marks. So, this post will, by and large, serve as a sort of way of remedying this.

Anyways, let's just cut to the chase. It should not come as a surprise that I've learned a lot since starting this course. As new concepts stack upon older ones, the big picture of the course keeps getting more and more clear. Sometimes, something I've written in the past seems as though I could've written it now. Other times, it just looks hopelessly outdated. Now, the goal of this blog post is to find just one example of this, and to present it to the world so that the TA's laugh at me behind my back for not getting it right the first time can experience that sentimental feeling a parent gets when they watch a child utter their first word, and than shed a tear in one eye. I love you all too.

Anyways, to be honest, I have a tendency to revisit past blog posts and make changes. Usually every time I write a new one. Sometimes it's just to correct spelling and grammar mistakes. Other times it's improve my style and prose. And sometimes, if I run into a sentence which is syntactically broken beyond repair, I'll just rewrite it from scratch. And of course, whenever I revisit these past blog posts, the temptation to update them with the stuff I know in the present becomes stronger and stronger. And in some cases, the finished product barely resembles the first draft.

So, long story short, I'm going to do something a little bit different. Rather than just recall a blog post from the past of which I agreed or disagreed with in the future, I'm going to do this will a whole series of blog posts. Mainly, my series on Abstract Data Types: which covers Trees, Linked Lists, and Binary Search Trees.

In all of these posts, I tried to provide a short and concise explanation of what these abstract data types are. I tried to remain as self-aware as possible to the subtle distinction between the implementation of them, versus the sort of abstract definition they all have. And, as I pointed out in a later post, the abstract definition of an ADT is often hard to pinpoint, since its focused more on the relationship between the parts of the structure than on any one part of it. However, in all of these posts on abstract data types, I feel as though I focused too much on the abstract part, and not enough on the specific details of implementation. In my defense, most of the SLOG's I read seemed to focus on the implementation of the ADT disproportionately, so it seemed appropriate for me to do something different. And whats wrong with a little differentiation?

However, my approach hasn't really afforded me the ability to discuss recursion, and the role it has played in this course. Now, as Many of my peers have noted, Recursion is a very central element of this course. It's an element of our theoretical toolbox which makes it possible to efficiently solve all sorts of problems we previously assumed were tedious at best. The competence shift I gained when I added this tool to my arsenal was analogous to the jump I made when I first learned how to use iteration in CSC108. Its a little embarrassing, but I remember feeling like I could solve any problem thrown at me. Like that there was nothing left to learn. It was wonderful.

Recursion can be thought of as being even more basic than iteration. Now, iteration can be thought of as sequentially traveling across an array from left to right, correct? Well, using that intuition, we can think of recursion as a way of traveling from any one thing to another. It can simply be the next element of an array, as is the case in a Linked List. There can be two next elements in the array, and a function determining which one you will travel to (as is the case with a binary search tree, if you'll pardon my loose use of the words "element" and "array"). Really, in this sense, there is no real direction- there's just the "current thing" of analysis and the "next thing".

Now, consider a program which relies on recursive processes, like a program which collapses a list of elements containing sub-lists which are arbitrarily deeply nested into a list of depth one. In this program, the method is simple: evaluate every element of the list. If that element itself is a list,  reapply this function to it, which will again cause it to iterate over every element of that list. If it's not a list, than just put the element into some new list which is passed along the function.

Now, although this isn't an instance of iteration over some array like a "for loop", it certainly shares some important similarities to one. We start off with an initial input, and than the function takes it and returns a call to the function itself, except this time, with a different input. You can think of this sequence of inputs and outputs which become new inputs as a sequence in which, if the recursive function is defined properly, will end at some point and return the desired ending output. So although it is not just mere iteration- it certainly is similar since it is a sequential process- which is extremely reminiscent to the sort of sequential processing computers are known and loved for.

So, now it's probably much easier to see what I mean when I say that the majority of the ADT's we work with are contingent on recursion and recursive sub processes. I'm just happy I finally had a chance to revisit those old posts in order to make this point. As many of my peers have pointed out, recursion does seem like a major theme of this course. And I'm happy I got a chance to express my thoughts on it.

Friday 20 March 2015

Inserting and Deleting Nodes in a Binary Search Tree

Something I didn't mention in any of my posts on the non built-in ADT's like Trees, LinkedLists, and BST's was how much of a pain it can be to remove or insert the nodes in these structures. This topic is of interest to me, not because of how my SLOG marks are on the line they are useful for solving homework problems, but because of how they lead to a deeper understanding and appreciation of the ADT in question.

So, lets first look at an ordinary Tree. As I've said before, a tree is just a bunch of nodes- each of which has two attributes: "data", which is the value it stores which can be used for identification, and "children"- which is some sort of collection of nodes which can be accessed directly from said node. So, based on the implementation of a Tree we did all the way back in the week 6 tutorial, to be the child of a node in that tree would be to be an element of the list "children", which is an attribute of that node object. As a result, a parent node can access it's children nodes directly. However, the child node cannot point back to the parent, or a cycle will be formed, which would violate the definition of a tree. This tells us something extremely important about the internal structure of trees. In a profound sense, there is no one place that the relation between a child and its parent node is defined. It is a relation which relies on the way how the  two nodes interact with each other.

Now, when we add or remove a node from a tree, we run the risk of altering delicate relations like these. And in order to avoid doing this, we need to understand what constitutes this relation in the first place, especially in respect to the specific implementation of the ADT we're using. In the case of Trees, we need to make sure that the definition of a tree is preserved (each node has only one parent excluding the root, there's a direct path from the node back to the root, none of the nodes form a cycle, etc). This isn't fun, but its not impossible. For instance, to add a node, we just need to add them as leaves- that way, we don't need to worry about accidentally creating a cycle. We also need to set it's "children" attribute to something which signifies that it is a leaf (such as an empty list). Depending on the implementation, we may also need to update certain attributes dependent on the structure as a whole, such as the branching factor or height. Although linked-lists may appear easier to alter due to their constant branching factor of one, they often contain these types of attributes, such as length.

With BST's, a whole other problem comes up. Due to it's ability to utilize a binary search, the tree must also be "balanced". In a nut shell, the notion of balance is the idea that, for any node in the BST, the height and number of nodes in the sub-tree rooted by its left child is equal to that of the sub-tree rooted in its right child. Balance is very hard to preserve since, for starters, its very hard to even define to begin with. It's a relationship which isn't stored in any one node, or even a large collection of them, but in every single node at some level of analysis. The reason why we want balance is simple- if we are looking for an element in a BST, in the worst case, we want to always have 2 options available to us- the ability to go left or right. If any node in a BST lacks, say, a left sub-tree, than if we are trans-versing the tree, we can only go right (assuming, of course, that the element hasn't already been found and there are still children). In this case, we don't actually reduce our options. Rather than halving the problem space, as we aim to for each node in a BST, we just go through it sequentially. And although this may not seem like a big deal, just keep in mind that some BST's contain thousands of nodes. So if your tree has a lot of nodes with only one path instead of two, and those nodes are traversed, than for those periods, it will be as if you are using a simple linear search. And that could be really inefficient.

So, when inserting and deleting nodes from a BST, we must try and preserve this sense of balance. If we remove a node, we want to find a way to replace it with another node directly below it. And as mentioned earlier, when we insert a new node, we want to attach it as a leaf. That way, it doesn't take over the place of an entire sub-tree, or unnecessarily elongate a given path in it. And of course, all of these make sure the Binary search tree retains it's ability to do binary search

Wednesday 18 March 2015

Abstract Data Type: Binary Search Tree

I said I would cover a new type of Tree in my post about Tree ADT's, didn't I?

Anyways, today I'm focusing on Binary Search Trees, which are also called BST's. Now, to understand BST's, lets first make one thing clear- they are NOT the same thing as Binary Trees. I made this  mistake a lot when I first started working with them, and it made communicating my ideas about these concepts quite annoying. So in case I wasn't clear- let me reiterate this point one more time:
BT   =/=   BST

A Binary Tree- also called a BT- is simply a tree ADT which always has 2 positions for possible children- left and right. This means that, if you take any node in a BT, you can only access a maximum of two other nodes. Although this may seem restrictive at first, it also allows us to implement various types of search algorithms. And this is where BST's come in.

A BST is just a type of a BT with an additional rule; that the value of any node in the tree must be larger than the value of every node in the sub-tree who's root is it's left child. And the same is true for the sub-tree rooted in its right child, except with larger values instead. Confusing? Here's a better way of explaining it.

Remember, the topmost node of a tree (the one with no parent) is the root. Any node in a tree can be considered the root of its own tree- called a sub-tree of the original tree. In order to access a specific node in some tree- let's just say it's value is 7- we need to use a search algorithm, and we must always start at the root of the tree provided. Than, we must find some way to travel (or "transverse") the nodes of the trees until we find the node we want. Unlike a list, we do not simply travel over every node in the tree in a sequence until we find the one with the value we want. In fact, such a searching method is actually considered to be really inefficient. After all, you would have to evaluate every element in the list just to verify if the element isn't in the list.

However, there are more efficient search algorithms, such as binary search. Binary Search takes list already sorted in ascending order, and takes it's middle element. It than splits the list into two sub-lists- one containing every element left of the middle, and one for every element to the right. It than does a comparison. If the element your looking for is bigger, than go to the right sub-list and repeat the algorithm. If its smaller, it will go to the left sub-list and repeat. And if it's not bigger or smaller, than congrats, it can only be equal, and so you've found it!

As you have probably guessed, binary search trees are another way of implementing this search algorithm. However, rather than take a presorted list, it takes a BST. Since a BST, by definition, stores elements in such a manner so that the only elements in the left sub-tree of the node have values smaller than its own (and bigger with right), a similar algorithm can be implemented to find any piece of data. For instance, if the node you are looking for has a value of 5, and the value of the node you are evaluating is 7, you know that the node you are looking for must occur in the sub-tree to the left of the current node (if it's in the tree at all), since that sub-tree contains all the values smaller than 7. So, even in the worst case, this search algorithm will be significantly faster than a simple, linear search.

Thursday 12 March 2015

Abstract Data Type: Richard Linklist-er

In week 8, the topic we are forced required to write about is an ADT called a "linked-list". Which sounds an awful lot like the surname of the director "Richard Linklater". And since I just watched "Dazed and Confused", I wanted to reference this admittedly shallow relation between the two words. Alright, now to move on. In this post, I will try to explain, to the best of my abilities, what a linked list is.

A linked-list is, to be succinct, a collection of individual node objects, each with two attributes: one called "data", which is the actual piece of data being stored in the linked list at that position, and one called "nxt", which stores the next node in the sequence (or None if it's the last node of the linked list). The first node is stored in an overarching data structure called a "LinkedList". In short, the linked-list shell lets you find the first node, and each node (including the first) lets you access the very next one (or last one).

Now, on the surface, linked-lists may seem trivial- especially when compared to tree data structures. After all, the whole premise of a tree is that each node has multiple children nodes. So if you are at any node in the tree, you can directly access a large set of nodes on the tree without having to go through them all sequentially. This is why we often use them over lists when storing comparable types of data (a topic I will cover in a later post). Now, for any node in a linked-list, we can only access the element directly behind it. In this sense, it seems nearly indistinguishable from a tree with a branching factor of one, which kinda defeats the point (as we'll see in a later post). And from my correspondences with my peers, this sentiment isn't uncommon. And speaking of correspondences with my peers, here's a link to all my interactions with other SLOG's. Just throwing that out there, if you just so happen to be a marking TA...

Anyways, how can I convince all of them that linked lists are worth learning about? Well, lets think of them in a different way. In much the same way we learn the fundamentals of programming using a specific language (in this case python), we can also learn the fundamentals of many of our built in data types by understanding analogous, non-built in ones. In this case, I think a case can be made that understanding linked-lists gives us a better appreciation for what is perhaps my favorite built in data type- List.

List is an extremely useful data type for solving problems. You can put various types of objects in, and like magic, they doesn't disappear into the abyss of computer memory. You can also identify objects that you have previously put into the list (called elements), so that you can do things with them later. You can immediately identify the first and last elements of the list. You can also identify the element in the list with a certain "position" called an index. Once you have identified the element you desire, you can use it just like any ordinary object in the shell. For example, if its an int, you can apply the arithmetic operators over it. Perhaps the most important feature of a list is the ability to evaluate every single element in the collection sequentially, from one end to the other. This process, called an iteration over the list, is very commonly used in the implementation of various types of algorithms- such as searching and sorting. It's hard to imagine what programming would be like without this ability.

Interestingly, a linked list is capable of doing these exact things. If you don't believe me, than go back and re-read my definition for a linked-list. Recall how, for any node in a linked list, we can immediately access the node to its right- which is identified by the attribute "nxt". We can also identify the beginning of the linked list immediately, since that piece of data is stored in the linked list object itself. And, although not immediately obvious, we can identify the very last node of the sequence by simply identifying the node which has a "nxt" attribute set to "None". Because we can identify the first and last element or a linked list, as well as the next element from any given position, we care capable of iterating over a linked list, just like a regular list. In fact, with little effort, one can even write an algorithm which identifies elements in a linked-list based on their index number- all you need to do is identify the first node, access it's "nxt" attribute, and than add one to some counter, and repeat the process for the next node until the value of the counter is equal to the value inputted, and than return the value of that node. This process perfectly simulates the concept of indexing in a List (at least for index values greater than or equal to zero).

Rightly or wrongly, I think that linked-list's are far more relevant to computer science than many of my peers appear to. Now, this is not to say that linked lists have no pragmatic value at all, outside of being an educational aid. Perhaps linked-lists are more efficient at storing certain pieces of data, such as the length of the linked-list. The point I wish to make is that they are useful for another reason. Which is because they take a relatively complicated process, and break it down into much more palatable pieces. And that's something a computer science course can never do too much.