In-class notes for 04/07/2014 (CS 121B (CS1), Spring 2014)

In-class notes for 04/07/2014

CS 121B (CS1), Spring 2014

Homework assignment
Quizzes returned
- Recursion problem

Submitted questions on assignments and technology

Piazza

Upcoming

Homework 21
Read online text: Defining Classes
submit at least one reading question by 9am before the next class meeting
- Recall that objects are entities that have state variables (for storing data) and methods (for operating on the state variables).
  
  Examples: Strings (sequence of characters makes up the data; many string methods); cImage objects (pixel values are the data, methods such as setPixel(), getPixel())
- A class() is a type of object
- This chapter is about defining your own object types, e.g., a LibBook type to hold and manage data about a library book (author, date published, last checkout, etc.)
Quiz on Wednesday. See below for topics.

Submitted questions on readings

Review of map-reduce computing

Exercises

Exercise 1:

Use WMR to compute the highest reviewer id in a netflix dataset.

Hints

A sample data set:

1,1596531,5,2004-01-23
3,2318004,4,2004-02-05
6,110641,5,2003-12-15
8,1447639,4,2005-03-27
8,2557899,5,2005-04-16
6,52076,4,2004-10-05
1,13651,3,2004-06-16
1,1374216,2,2005-09-23

Spec for mapper:

# IN format
#    key is a netflix record   value is empty string
#    NOTE: netflix record format is   movieID,reviewerID,rating,date
# OUT format
#    key is empty  value is a reviewerID

Spec for reducer:

# IN format
#    key is empty   value are reviewerID
# OUT format
#    key is maximum reviewer id  value is empty

Hints: (a) use split() with an argument; (b) use an accumulator in reducer to determine the maximum value (since reviewerIDs are non-negative, you can initialize accumulator at -1); (c) use the Test interface to debug.

Exercise 2:

Compute the average ratings for all movies

You can use the same data set as above. For that data set, the results should be:
```
        1	3.3333
        3	4
        6	4.5
        8	4.5
```

mapper() specs

# IN format
#    key is a netflix record   value is empty string
#    NOTE: netflix record format is   movieID,reviewerID,rating,date
# OUT format
#    FILL THIS IN

reducer() specs

# IN format
#    FILL THIS IN
# OUT format
#    key is a movieID  value is the mean movie rating for that movie

Hints: (a) COMPLETE THE SPECS FIRST -- what intermediate key-value pairs will you need? (b) Use two accumulators in reducer, one for sum and one for count, in order to compute the mean

Exercise 3:
Compute the average ratings for each movie and the maximum date of ratings for that movie, using a single map-reduce cycle.
- You can use the same data set as above. For that data set, the results should be:
```
        1	3.33 2005-09-23
        3	4.00 2004-02-09
        6	4.50 2004-10-25
        8	4.50 2005-04-16
```
- mapper() specs
```
# IN format
#    key is a netflix record   value is empty string
#    NOTE: netflix record format is   movieID,reviewerID,rating,date
# OUT format
#    FILL THIS IN
```
- reducer() specs
```
# IN format
#    FILL THIS IN
# OUT format
#    key is a movieID  value is the mean movie rating for that movie, 
#       followed by a space, then the maximum date among ratings for that movie
```
- Hints: (a) COMPLETE THE SPECS FIRST -- what intermediate key-value pairs will you need? (b) Add a third accumulator for max movie date (compare these using str order); (c) an empty string is smaller than any date in this format; (d) use format() to make columns in each value emitted by the reducer.
FYI: Large netflix data sets on WMR system (cluster paths):
```
/shared/netflix/test -- all ratings on 100 movies
/shared/netflix/all -- all ratings on all movies (don't use in class!)
```
Note: Don't use Test computation with large data sets. The Test system will cut off the data after a certain number of characters, and it may lead to an incorrectly formatted record at the point of the cut.

To study for quiz

WMR
- Write mappers and reducers for specific problems, given IN/OUT specs
- Features of WMR, including sorting (shuffle phase, alphabetic vs. numeric), Test vs Submit
Dictionaries
- Basic syntax and concepts
- Using dictionaries as repositories, e.g., animals
- Nested dictionaries
- Using dictionaries to tally results, e.g., word frequencies, netflix (given format)
Note: Here are some answers to selected dictionary homework exercises:

problem solutions

Dictionary questions, 1a solution solution using
format() (optional)

Dictionary questions, 3a solution solution using
format() (optional)

< >