>>     < >




In-class notes for 04/16/2014

CS 121B (CS1), Spring 2014

Submitted questions on assignments and technology

Upcoming

Submitted questions on readings

WebMapReduce exercises

  • Exercise 1 Solution

    • Spec:

      • mapper() specs

            # IN format
            #    key is a netflix record   value is empty string
            #    NOTE: netflix record format is   movieID,reviewerID,rating,date
            # OUT format
            #    FILL THIS IN
        

      • reducer() specs

            # IN format
            #    FILL THIS IN
            # OUT format
            #    key is a movieID  value is the mean movie rating for that movie
        

    • Answer: Intermediate key-value pairs should be

          #   key: movieID   value: rating
      
      Because
      1. movieID and rating are all the info needed to compute mean rating per movie

      2. choosing movieID as key will gather all of that movieID's ratings together in one reducer call

    Thus, the completed IN/OUT specs are
    • mapper() specs

          # IN format
          #    key is a netflix record   value is empty string
          #    NOTE: netflix record format is   movieID,reviewerID,rating,date
          # OUT format
          #    key: movieID   value: rating
      

    • reducer() specs

          # IN format
          #    key: movieID   value: rating
          # OUT format
          #    key is a movieID  value is the mean movie rating for that movie
      

  • Code to satisfy those specs
    •     def mapper(key, value):
      	lis = key.split(',')
      	Wmr.emit(lis[0], lis[2])
      

    •     def reducer(key, iter):
      	count = 0
      	sum = 0
      	for val in iter:
      	    count = count + 1
      	    sum = sum + int(val)
      	Wmr.emit(key, str(sum/count))
      

Tracing WebMapReduce

The following multi-part blue box represents a trace of a map-reduce computation using the mapper() and reducer() above.

input
=mapper IN
1,1596531,5,2004-01-23
3,2318004,4,2004-02-05
1,1374216,2,2005-09-23
mapper()
def mapper(key, value):
    lis = key.split(',')
    Wmr.emit(lis[0], lis[2])
For first input line:
key'1,1596531,5,2004-01-23'
value''
lis['1', '1596531', '5', '2004-01-23']
mapper OUT
1	5
3	4
1	2
Shuffle
reducer IN
1	5
1	2
3	4
reducer()
    def reducer(key, iter):
	count = 0
	sum = 0
	for val in iter:
	    count = count + 1
	    sum = sum + int(val)
	Wmr.emit(key, str(sum/count))
For call reducer('1', ...):
key'1'
iter'5'
'2'
val '5'  '2'
count 0  1  2
sum 0  5  7
reducer OUT
1	3.5
3	4.0

Exercise

  • Exercise 1: Draw a trace of the following WMR computation:

    • Input (key holds all text, value is empty):
          The cat
          in the hat
      
    • Mapper:
          def mapper(key, value):
              lis = key.split()
              for word in lis:
      	    Wmr.emit(word.lower(), '1')
      
    • Reducer:
          def reducer(key, iter):
              sum = 0
              for val in iter:
      	    sum = sum + int(val)
              Wmr.emit(key, str(sum))
      

  • Exercise 2:

    Compute word frequencies per book and overall frequencies, in a single map-reduce cycle.

    Example input:

            cathat	The cat in the hat
            cathat	wore the hat
            cathat	to the cat hat party.
            OwlCat	The owl and the pussy cat went to sea
    
    Final key-value pairs emitted from reducer:
            The	OwlCat 1
            The	cathat 1
            The	2
            and	OwlCat 1
            and	1
            cat	cathat 2
            cat	OwlCat 1
            cat	3
            ...
    
    • mapper() specs

      # IN format
      #    key is a book ID  value is a line of text from that book
      # OUT format
      #    FILL THIS IN
      

    • reducer() specs

      # IN format
      #    FILL THIS IN
      # OUT formats (there are two)
      #    1. key is a word  value is a book ID, a space, and freq of key in that book
      #    2. key is a word  value is total freq of that word in all books
      

    Hints:

    (1) fill out the specs first -- what should key-value pairs be for example? (2) Use a fake "book id" 'zzzz' to count frequencies for all books, and handle that fake ID specially in reducer to get second output format.

To study for quiz

  • Genomes: combining strings, lists, and dictionaries

    • Not a test of Biology -- look for the CS problem, and learn enough of the Biology language to understand the CS problem.

      The CS problems:

    • Given lists of strings (representing codons, amino acid codes), use those lists to generate new lists or strings (e.g., genes, amino acid codes or other data, etc.)

    • Using given dictionaries to perform lookup operations.

  • Defining classes and using objects

    Drawing memory diagrams for objects

  • OOP terms: object, class, state variable, method, constructor




< >