>>     < >




In-class notes for 04/07/2014

CS 121B (CS1), Spring 2014

Submitted questions on assignments and technology

Upcoming

  • Homework 21

  • Read online text: Defining Classes
    submit at least one reading question by 9am before the next class meeting

    • Recall that objects are entities that have state variables (for storing data) and methods (for operating on the state variables).

      Examples: Strings (sequence of characters makes up the data; many string methods); cImage objects (pixel values are the data, methods such as setPixel(), getPixel())

    • A class() is a type of object

    • This chapter is about defining your own object types, e.g., a LibBook type to hold and manage data about a library book (author, date published, last checkout, etc.)

  • Quiz on Wednesday. See below for topics.

Submitted questions on readings

Review of map-reduce computing

Exercises

  • Exercise 1:

    Use WMR to compute the highest reviewer id in a netflix dataset.

    Hints
    • A sample data set:

      1,1596531,5,2004-01-23
      3,2318004,4,2004-02-05
      6,110641,5,2003-12-15
      8,1447639,4,2005-03-27
      8,2557899,5,2005-04-16
      6,52076,4,2004-10-05
      1,13651,3,2004-06-16
      1,1374216,2,2005-09-23
      

    • Spec for mapper:

      # IN format
      #    key is a netflix record   value is empty string
      #    NOTE: netflix record format is   movieID,reviewerID,rating,date
      # OUT format
      #    key is empty  value is a reviewerID
      

    • Spec for reducer:

      # IN format
      #    key is empty   value are reviewerID
      # OUT format
      #    key is maximum reviewer id  value is empty
      

    • Hints: (a) use split() with an argument; (b) use an accumulator in reducer to determine the maximum value (since reviewerIDs are non-negative, you can initialize accumulator at -1); (c) use the Test interface to debug.

  • Exercise 2:

    Compute the average ratings for all movies

    • You can use the same data set as above. For that data set, the results should be:

              1	3.3333
              3	4
              6	4.5
              8	4.5
      

    • mapper() specs

      # IN format
      #    key is a netflix record   value is empty string
      #    NOTE: netflix record format is   movieID,reviewerID,rating,date
      # OUT format
      #    FILL THIS IN
      

    • reducer() specs

      # IN format
      #    FILL THIS IN
      # OUT format
      #    key is a movieID  value is the mean movie rating for that movie
      

    • Hints: (a) COMPLETE THE SPECS FIRST -- what intermediate key-value pairs will you need? (b) Use two accumulators in reducer, one for sum and one for count, in order to compute the mean

  • Exercise 3:

    Compute the average ratings for each movie and the maximum date of ratings for that movie, using a single map-reduce cycle.

    • You can use the same data set as above. For that data set, the results should be:

              1	3.33 2005-09-23
              3	4.00 2004-02-09
              6	4.50 2004-10-25
              8	4.50 2005-04-16
      

    • mapper() specs

      # IN format
      #    key is a netflix record   value is empty string
      #    NOTE: netflix record format is   movieID,reviewerID,rating,date
      # OUT format
      #    FILL THIS IN
      

    • reducer() specs

      # IN format
      #    FILL THIS IN
      # OUT format
      #    key is a movieID  value is the mean movie rating for that movie, 
      #       followed by a space, then the maximum date among ratings for that movie
      

    • Hints: (a) COMPLETE THE SPECS FIRST -- what intermediate key-value pairs will you need? (b) Add a third accumulator for max movie date (compare these using str order); (c) an empty string is smaller than any date in this format; (d) use format() to make columns in each value emitted by the reducer.

  • FYI: Large netflix data sets on WMR system (cluster paths):

    /shared/netflix/test -- all ratings on 100 movies
    /shared/netflix/all -- all ratings on all movies (don't use in class!)
    

    Note: Don't use Test computation with large data sets. The Test system will cut off the data after a certain number of characters, and it may lead to an incorrectly formatted record at the point of the cut.

To study for quiz

  • WMR

    • Write mappers and reducers for specific problems, given IN/OUT specs

    • Features of WMR, including sorting (shuffle phase, alphabetic vs. numeric), Test vs Submit

  • Dictionaries

    • Basic syntax and concepts

    • Using dictionaries as repositories, e.g., animals

    • Nested dictionaries

    • Using dictionaries to tally results, e.g., word frequencies, netflix (given format)

    Note: Here are some answers to selected dictionary homework exercises:
    problemsolutions
    Dictionary questions, 1a solution solution using
    format()
    (optional)
    Dictionary questions, 3a solution solution using
    format()
    (optional)



< >