In-class notes for 04/07/2014
CS 121B (CS1), Spring 2014
Quizzes returned
Submitted questions on assignments and technology
Upcoming
Read online text: Defining Classes
submit at least one reading question by 9am before the next class meetingRecall that objects are entities that have state variables (for storing data) and methods (for operating on the state variables).
Examples: Strings (sequence of characters makes up the data; many string methods);
cImage
objects (pixel values are the data, methods such assetPixel()
,getPixel()
)A class() is a type of object
This chapter is about defining your own object types, e.g., a
LibBook
type to hold and manage data about a library book (author, date published, last checkout, etc.)
Quiz on Wednesday. See below for topics.
Submitted questions on readings
Review of map-reduce computing
Exercises
Exercise 1:
Use WMR to compute the highest reviewer id in a netflix dataset.
HintsA sample data set:
1,1596531,5,2004-01-23 3,2318004,4,2004-02-05 6,110641,5,2003-12-15 8,1447639,4,2005-03-27 8,2557899,5,2005-04-16 6,52076,4,2004-10-05 1,13651,3,2004-06-16 1,1374216,2,2005-09-23
Spec for mapper:
# IN format # key is a netflix record value is empty string # NOTE: netflix record format is movieID,reviewerID,rating,date # OUT format # key is empty value is a reviewerID
Spec for reducer:
# IN format # key is empty value are reviewerID # OUT format # key is maximum reviewer id value is empty
Hints: (a) use
split()
with an argument; (b) use an accumulator in reducer to determine the maximum value (since reviewerIDs are non-negative, you can initialize accumulator at -1); (c) use the Test interface to debug.
Exercise 2:
Compute the average ratings for all movies
You can use the same data set as above. For that data set, the results should be:
1 3.3333 3 4 6 4.5 8 4.5
mapper()
specs# IN format # key is a netflix record value is empty string # NOTE: netflix record format is movieID,reviewerID,rating,date # OUT format # FILL THIS IN
reducer()
specs# IN format # FILL THIS IN # OUT format # key is a movieID value is the mean movie rating for that movie
Hints: (a) COMPLETE THE SPECS FIRST -- what intermediate key-value pairs will you need? (b) Use two accumulators in reducer, one for sum and one for count, in order to compute the mean
Exercise 3:
Compute the average ratings for each movie and the maximum date of ratings for that movie, using a single map-reduce cycle.
You can use the same data set as above. For that data set, the results should be:
1 3.33 2005-09-23 3 4.00 2004-02-09 6 4.50 2004-10-25 8 4.50 2005-04-16
mapper()
specs# IN format # key is a netflix record value is empty string # NOTE: netflix record format is movieID,reviewerID,rating,date # OUT format # FILL THIS IN
reducer()
specs# IN format # FILL THIS IN # OUT format # key is a movieID value is the mean movie rating for that movie, # followed by a space, then the maximum date among ratings for that movie
Hints: (a) COMPLETE THE SPECS FIRST -- what intermediate key-value pairs will you need? (b) Add a third accumulator for max movie date (compare these using
str
order); (c) an empty string is smaller than any date in this format; (d) useformat()
to make columns in each value emitted by the reducer.
FYI: Large netflix data sets on WMR system (cluster paths):
/shared/netflix/test -- all ratings on 100 movies /shared/netflix/all -- all ratings on all movies (don't use in class!)
Note: Don't use Test computation with large data sets. The Test system will cut off the data after a certain number of characters, and it may lead to an incorrectly formatted record at the point of the cut.
To study for quiz
WMR
Write mappers and reducers for specific problems, given IN/OUT specs
Features of WMR, including sorting (shuffle phase, alphabetic vs. numeric), Test vs Submit
Dictionaries
Basic syntax and concepts
Using dictionaries as repositories, e.g., animals
Nested dictionaries
Using dictionaries to tally results, e.g., word frequencies, netflix (given format)
problem solutions Dictionary questions, 1a solution solution using
format() (optional)Dictionary questions, 3a solution solution using
format() (optional)
< >