Binary Trees

We extend the concept of linked data structures to structure containing nodes with more than one self-referenced field. A binary tree is made of nodes, where each node contains a "left" reference, a "right" reference, and a data element. The topmost node in the tree is called the root. Every node (excluding a root) in a tree is connected by a directed edge from exactly one other node. This node is called a parent. On the other hand, each node can be connected to arbitrary number of nodes, called children. Nodes with no children are called leaves, or external nodes. Nodes which are not leaves are called internal nodes. Nodes with the same parent are called siblings.

More tree terminology:

The depth of a node is the number of edges from the root to the node.
The height of a node is the number of edges from the node to the deepest leaf.
The height of a tree is a height of the root.
A full binary tree.is a binary tree in which each node has exactly zero or two children.
A complete binary tree is a binary tree, which is completely filled, with the possible exception of the bottom level, which is filled from left to right.

A complete binary tree is very special tree, it provides the best possible ratio between the number of nodes and the height. The height h of a complete binary tree with N nodes is at most O(log N). We can easily prove this by counting nodes on each level, starting with the root, assuming that each level has the maximum number of nodes:

n = 1 + 2 + 4 + ... + 2^h-1 + 2^h = 2^h+1 - 1

Solving this with respect to h, we obtain

h = O(log n)

where the big-O notation hides some superfluous details.

Advantages of trees

Trees are so useful and frequently used, because they have some very serious advantages:

Trees reflect structural relationships in the data
Trees are used to represent hierarchies
Trees provide an efficient insertion and searching
Trees are very flexible data, allowing to move subtrees around with minumum effort

Traversals

A traversal is a process that visits all the nodes in the tree. Since a tree is a nonlinear data structure, there is no unique traversal. We will consider several traversal algorithms with we group in the following two kinds

depth-first traversal
breadth-first traversal

There are three different types of depth-first traversals, :

PreOrder traversal - visit the parent first and then left and right children;
InOrder traversal - visit the left child, then the parent and the right child;
PostOrder traversal - visit left child, then the right child and then the parent;

There is only one kind of breadth-first traversal--the level order traversal. This traversal visits nodes by levels from top to bottom and from left to right.

As an example consider the following tree and its four traversals:
PreOrder - 8, 5, 9, 7, 1, 12, 2, 4, 11, 3
InOrder - 9, 5, 1, 7, 2, 12, 8, 4, 3, 11
PostOrder - 9, 1, 2, 12, 7, 5, 3, 11, 4, 8
LevelOrder - 8, 5, 4, 9, 7, 11, 1, 12, 3, 2

In the next picture we demonstarte the order of node visitation. Number 1 denote the first node in a particular traversal and 7 denote the last node.

These common traversals can be represented as a single algorithm by assuming that we visit each node three times. An Euler tour is a walk around the binary tree where each edge is treated as a wall, which you cannot cross. In this walk each node will be visited either on the left, or under the below, or on the right. The Euler tour in which we visit nodes on the left produces a preorder traversal. When we visit nodes from the below, we get an inorder traversal. And when we visit nodes on the right, we get a postorder traversal.

Binary Search Trees

We consider a particular kind of a binary tree called a Binary Search Tree (BST). The basic idea behind this data structure is to have such a storing repository that provides the efficient way of data sorting, searching and retriving.

A BST is a binary tree where nodes are ordered in the following way:

each node contains one key (also known as data)
the keys in the left subtree are less then the key in its parent node, in short L < P;
the keys in the right subtree are greater the key in its parent node, in short P < R;
duplicate keys are not allowed.

In the following tree all nodes in the left subtree of 10 have keys < 10 while all nodes in the right subtree > 10. Because both the left and right subtrees of a BST are again search trees; the above definition is recursively applied to all internal nodes:

Implementation

We implement a binary search tree using a private inner class BSTNode. In order to support the binary search tree property, we require that data stored in each node is Comparable:

public class BST <AnyType extends Comparable<AnyType>>
{
   private Node<AnyType> root;

   private class Node<AnyType>
   {
      private AnyType data;
      private Node<AnyType> left, right;

      public Node(AnyType data)
      {
         left = right = null;
         this.data = data;
      }
   }
   ...

}

Insertion

The insertion procedure is quite similar to searching. We start at the root and recursively go down the tree searching for a location in a BST to insert a new node. If the element to be inserted is already in the tree, we are done (we do not insert duplicates). The new node will always replace a NULL reference.

Exercise. Given a sequence of numbers:

11, 6, 8, 19, 4, 10, 5, 17, 43, 49, 31

Draw a binary search tree by inserting the above numbers from left to right.

Searching

Searching in a BST always starts at the root. We compare a data stored at the root with the key we are searching for (let us call it as toSearch). If the node does not contain the key we proceed either to the left or right child depending upon comparison. If the result of comparison is negative we go to the left child, otherwise - to the right child. The recursive structure of a BST yields a recursive algorithm. Searching in a BST has O(h) worst-case runtime complexity, where h is the height of the tree. Since s binary search tree with n nodes has a minimum of O(log n) levels, it takes at least O(log n) comparisons to find a particular node. Unfortunately, a binary serch tree can degenerate to a linked list, reducing the search time to O(n).

Deletion

Deletion is somewhat more tricky than insertion. There are several cases to consider. A node to be deleted (let us call it as toDelete)

is not in a tree;
is a leaf;
has only one child;
has two children.

If toDelete is not in the tree, there is nothing to delete. If toDelete node has only one child the procedure of deletion is identical to deleting a node from a linked list - we just bypass that node being deleted

Deletion of an internal node with two children is less straightforward. If we delete such a node, we split a tree into two subtrees and therefore, some children of the internal node won't be accessible after deletion. In the picture below we delete 8:

Deletion starategy is the following: replace the node being deleted with the largest node in the left subtree and then delete that largest node. By symmetry, the node being deleted can be swapped with the smallest node is the right subtree.
See BST.java for a complete implementation.
Exercise. Given a sequence of numbers:

11, 6, 8, 19, 4, 10, 5, 17, 43, 49, 31

Draw a binary search tree by inserting the above numbers from left to right and then show the two trees that can be the result after the removal of 11.

Non-Recursive Traversals

Depth-first traversals can be easily implemented recursively.A non-recursive implementation is a bit more difficult. In this section we implement a pre-order traversal as a tree iterator

public Iterator<AnyType> iterator()
{
  return new PreOrderIterator();
}

where the PreOrderIterator class is implemented as an inner private class of the BST class

private class PreOrderIterator implements Iterator<AnyType>
{
  ...
}

The main difficulty is with next() method, which requires the implicit recursive stack implemented explicitly. We will be using Java's Stack. The algorithm starts with the root and push it on a stack. When a user calls for the next() method, we check if the top element has a left child. If it has a left child, we push that child on a stack and return a parent node. If there is no a left child, we check for a right child. If it has a right child, we push that child on a stack and return a parent node. If there is no right child, we move back up the tree (by popping up elements from a stack) until we find a node with a right child. Here is the next() implementation

public AnyType next()
{
   Node cur = stk.peek();
   if(cur.left != null)
   {
      stk.push(cur.left);
   }
   else
   {
      Node tmp = stk.pop();
      while(tmp.right == null)
      {
         if (stk.isEmpty()) return cur.data;
         tmp = stk.pop();
      }
      stk.push(tmp.right);
   }
   return cur.data;
}

Hashing and Hash Tables

Hashing

Hashing is a technique used for storing and retrieving keys in a rapid manner.

In hashing, a string of characters are transformed into a usually shorter length value or key that represents the

original string.

Hashing is used to index and retrieve items in a database because it is faster to find item using shorter hashed

key than to find it using the original value.

Hashing can also be defined as a concept of distributing keys in an array called a hash table using a

predetermined function called hash function.

Hash table

Hash table is a table (array) where we store the original string. Index of the table is the hashed key while the

value is the original string.

The size of hash table is usually several orders of magnitude lower than the total number of possible string, so

several string might have a same hashed-key.

For example, there are 26⁷ (8,031,810,176) string of length 7 character consists of lowercase only.

Hash Function

There are many ways to hash a string into a key. The following are some of the important methods for

constructing hash functions.

Mid-square

Division (most common)

Folding

Digit Extraction

Rotating Hash

Mid-square

Square the string/identifier and then using an appropriate number of bits from the middle of the square to obtain

the hash-key.

If the key is a string, it is converted to a number.

Steps:

1. Square the value of the key. (k²)

2. Extract the middle r bits of the result obtained in Step 1 Division

Divide the string/identifier by using the modulus operator.

It’s the most simple method of hashing an integer x.

Function: h(z) = z mod M

z = key

M = the value using to divide the key, usually using a prime number, the table size or the size of memory used. Folding

Partition the string/identifier into several parts, then add the parts together to obtain the hash key

Steps:

1.Divide the key value into a number of parts

2.Add the individual part (usually the last carry is ignored) Digit Extraction

A predefined digit of the given number is considered as the hash address.

Example:

–Consider x = 14,568

–If we extract the 1^st, 3^rd, and 5^th digit, we will get a hash code of : 158. Rotating Hash

Use any hash method (such as division or mid-square method)

After getting the hash code/address from the hash method, do rotation

Rotation is performed by shifting the digits to get a new hash address.

Example:

–Given hash address = 20021 Rotation result: 12002 (fold the digits)

Collision

What happened if we want to store these strings using the previous hash function (use the first character of each string)

define, float, exp, char, atan, ceil, acos, floor.

There are several strings which have the same hash-key, it’s float and floor (hash-key: 5), char and ceil (hash-key: 2).

It’s called a collision. How can we handle this?

There are two general ways to handle collisions:

Linear Probing

Search the next empty slot and put the string there.

This is the hash table of these string:

define, float, exp, char, atan,

ceil, floor, acos.

Note that ceil is stored in h[6], acos is

stored in h[1] and floor is stored in h[7].

When we want to store “ceil”, there is

already “char” stored in h[2], so we

search the next empty slot which is h[6].

Linear probing has a bad

search complexity if there are

many collisions.

The table “step” on the right describe how many loop/step needed to find the string. Supposed we want to find ceil, we compute the hash key and found

2. But ceil is not there so we should iterate until we found ceil.

Chaining

Put the string in a slot as a chained list (linked list).

In chaining, we store each string in a chain (linked list). So if there is collision, we only need to iterate on that chain.

rangkuman DS

Sunday, April 5, 2020

Rangkuman