In-class notes for 04/16/2014 (CS 121B (CS1), Spring 2014)

In-class notes for 04/16/2014

CS 121B (CS1), Spring 2014

Upcoming

Homework 25
Quiz on Friday. See below for topics.

Submitted questions on readings

WebMapReduce exercises

Spec:

mapper() specs

    # IN format
    #    key is a netflix record   value is empty string
    #    NOTE: netflix record format is   movieID,reviewerID,rating,date
    # OUT format
    #    FILL THIS IN

reducer() specs

    # IN format
    #    FILL THIS IN
    # OUT format
    #    key is a movieID  value is the mean movie rating for that movie

Answer: Intermediate key-value pairs should be
```
    #   key: movieID   value: rating
```
Because
1. movieID and rating are all the info needed to compute mean rating per movie
2. choosing movieID as key will gather all of that movieID's ratings together in one reducer call

Thus, the completed IN/OUT specs are

mapper() specs

    # IN format
    #    key is a netflix record   value is empty string
    #    NOTE: netflix record format is   movieID,reviewerID,rating,date
    # OUT format
    #    key: movieID   value: rating

reducer() specs

    # IN format
    #    key: movieID   value: rating
    # OUT format
    #    key is a movieID  value is the mean movie rating for that movie

Code to satisfy those specs

    def mapper(key, value):
	lis = key.split(',')
	Wmr.emit(lis[0], lis[2])

    def reducer(key, iter):
	count = 0
	sum = 0
	for val in iter:
	    count = count + 1
	    sum = sum + int(val)
	Wmr.emit(key, str(sum/count))

Tracing WebMapReduce

The following multi-part blue box represents a trace of a map-reduce computation using the mapper() and reducer() above.

input
=mapper IN

1,1596531,5,2004-01-23
3,2318004,4,2004-02-05
1,1374216,2,2005-09-23

mapper()

def mapper(key, value):
    lis = key.split(',')
    Wmr.emit(lis[0], lis[2])

For first input line:
`key`	`'1,1596531,5,2004-01-23'`
`value`	`''`
`lis`	`['1', '1596531', '5', '2004-01-23']`

mapper OUT

1	5
3	4
1	2

Shuffle

reducer IN

1	5
1	2
3	4

reducer()

    def reducer(key, iter):
	count = 0
	sum = 0
	for val in iter:
	    count = count + 1
	    sum = sum + int(val)
	Wmr.emit(key, str(sum/count))

For call `reducer('1', ...):`
`key`	`'1'`
`iter`	`'5' '2'`
`val`	`'5' '2'`
`count`	`0 1 2`
`sum`	`0 5 7`

reducer OUT

1	3.5
3	4.0

Exercise

Exercise 1: Draw a trace of the following WMR computation:

Input (key holds all text, value is empty):
```
    The cat
    in the hat
```

Mapper:

    def mapper(key, value):
        lis = key.split()
        for word in lis:
	    Wmr.emit(word.lower(), '1')

Reducer:

    def reducer(key, iter):
        sum = 0
        for val in iter:
	    sum = sum + int(val)
        Wmr.emit(key, str(sum))

Exercise 2:

Compute word frequencies per book and overall frequencies, in a single map-reduce cycle.

Example input:

        cathat	The cat in the hat
        cathat	wore the hat
        cathat	to the cat hat party.
        OwlCat	The owl and the pussy cat went to sea

Final key-value pairs emitted from reducer:

        The	OwlCat 1
        The	cathat 1
        The	2
        and	OwlCat 1
        and	1
        cat	cathat 2
        cat	OwlCat 1
        cat	3
        ...

mapper() specs

# IN format
#    key is a book ID  value is a line of text from that book
# OUT format
#    FILL THIS IN

reducer() specs

# IN format
#    FILL THIS IN
# OUT formats (there are two)
#    1. key is a word  value is a book ID, a space, and freq of key in that book
#    2. key is a word  value is total freq of that word in all books

Hints:

(1) fill out the specs first -- what should key-value pairs be for example? (2) Use a fake "book id" 'zzzz' to count frequencies for all books, and handle that fake ID specially in reducer to get second output format.

To study for quiz

Genomes: combining strings, lists, and dictionaries
- Not a test of Biology -- look for the CS problem, and learn enough of the Biology language to understand the CS problem.
  
  The CS problems:
  - listCodons() (without having to handle the "leftover")
  - oneGene() (given start codon 'AUG', stop codons 'UAA', 'UAG', 'UGA') or something simpler.
  - genes()
  - codonAmino()
  - aminoNames()
  - aminoString()
  - getAminoCategories() or something simpler
- Given lists of strings (representing codons, amino acid codes), use those lists to generate new lists or strings (e.g., genes, amino acid codes or other data, etc.)
- Using given dictionaries to perform lookup operations.
Defining classes and using objects

Drawing memory diagrams for objects
OOP terms: object, class, state variable, method, constructor

< >

Submitted questions on assignments and technology

Upcoming

Submitted questions on readings

WebMapReduce exercises

Tracing WebMapReduce

Exercise

To study for quiz