In-class notes for 04/16/2014
CS 121B (CS1), Spring 2014
______
Submitted questions on assignments and technology
Upcoming
Quiz on Friday. See below for topics.
Submitted questions on readings
WebMapReduce exercises
Exercise 1 Solution
Spec:
mapper()
specs# IN format # key is a netflix record value is empty string # NOTE: netflix record format is movieID,reviewerID,rating,date # OUT format # FILL THIS IN
reducer()
specs# IN format # FILL THIS IN # OUT format # key is a movieID value is the mean movie rating for that movie
Answer: Intermediate key-value pairs should be
# key: movieID value: rating
BecausemovieID and rating are all the info needed to compute mean rating per movie
choosing movieID as key will gather all of that movieID's ratings together in one reducer call
mapper()
specs# IN format # key is a netflix record value is empty string # NOTE: netflix record format is movieID,reviewerID,rating,date # OUT format # key: movieID value: rating
reducer()
specs# IN format # key: movieID value: rating # OUT format # key is a movieID value is the mean movie rating for that movie
- Code to satisfy those specs
def mapper(key, value): lis = key.split(',') Wmr.emit(lis[0], lis[2])
def reducer(key, iter): count = 0 sum = 0 for val in iter: count = count + 1 sum = sum + int(val) Wmr.emit(key, str(sum/count))
Tracing WebMapReduce
The following multi-part blue box represents a trace of a
map-reduce computation using the mapper()
and reducer()
above.
input =mapper IN | 1,1596531,5,2004-01-23 3,2318004,4,2004-02-05 1,1374216,2,2005-09-23 | |||||||||||||
mapper() | def mapper(key, value): lis = key.split(',') Wmr.emit(lis[0], lis[2]) |
| ||||||||||||
mapper OUT | 1 5 3 4 1 2 | |||||||||||||
Shuffle | ||||||||||||||
reducer IN | 1 5 1 2 3 4 | |||||||||||||
reducer() | def reducer(key, iter): count = 0 sum = 0 for val in iter: count = count + 1 sum = sum + int(val) Wmr.emit(key, str(sum/count)) |
| ||||||||||||
reducer OUT | 1 3.5 3 4.0 |
Exercise
Exercise 1: Draw a trace of the following WMR computation:
- Input (key holds all text, value is empty):
The cat in the hat
- Mapper:
def mapper(key, value): lis = key.split() for word in lis: Wmr.emit(word.lower(), '1')
- Reducer:
def reducer(key, iter): sum = 0 for val in iter: sum = sum + int(val) Wmr.emit(key, str(sum))
- Input (key holds all text, value is empty):
Exercise 2:
Compute word frequencies per book and overall frequencies, in a single map-reduce cycle.
Example input:
cathat The cat in the hat cathat wore the hat cathat to the cat hat party. OwlCat The owl and the pussy cat went to sea
Final key-value pairs emitted from reducer:The OwlCat 1 The cathat 1 The 2 and OwlCat 1 and 1 cat cathat 2 cat OwlCat 1 cat 3 ...
mapper()
specs# IN format # key is a book ID value is a line of text from that book # OUT format # FILL THIS IN
reducer()
specs# IN format # FILL THIS IN # OUT formats (there are two) # 1. key is a word value is a book ID, a space, and freq of key in that book # 2. key is a word value is total freq of that word in all books
(1) fill out the specs first -- what should key-value pairs be for example? (2) Use a fake "book id"
'zzzz'
to count frequencies for all books, and handle that fake ID specially in reducer to get second output format.
To study for quiz
Genomes: combining strings, lists, and dictionaries
Not a test of Biology -- look for the CS problem, and learn enough of the Biology language to understand the CS problem.
The CS problems:
listCodons()
(without having to handle the "leftover")oneGene()
(given start codon'AUG'
, stop codons'UAA'
,'UAG'
,'UGA'
) or something simpler.getAminoCategories()
or something simpler
Given lists of strings (representing codons, amino acid codes), use those lists to generate new lists or strings (e.g., genes, amino acid codes or other data, etc.)
Using given dictionaries to perform lookup operations.
Defining classes and using objects
Drawing memory diagrams for objects
OOP terms: object, class, state variable, method, constructor
< >