In-class notes for 04/04/2014 (CS 121B (CS1), Spring 2014)

In-class notes for 04/04/2014

CS 121B (CS1), Spring 2014

Homework assignment
Fall CS courses
- CS 241, Hardware Design [HD]. Prereq: CS 121 or 125
  How can an electronic circuit carry out computer instructions for a language like Python3?
  Design of computer hardware; connection of hardware to various programming languages; current and future trends
  This is a unique time of revolutionary change in history of computer hardware, and HD explains why.
- CS 251, Software Design [SD]. Prereq: CS 121 or 125
  Designing and building software using the popular C++ programming language,
  plus a substantial structured team project
  Includes a .25 cr lab, for more "time-on-task" mastering programming skills
  Works equally well as a last course in CS (strategic preparation for creating software relevant to other fields)
  or as a next course (there's a reason why it's a prereq for almost every other CS course)
Interested in advising in CS? Sign up for a time

Next week: Science Symposium, on "Big Data"

Events Thursday evening, Friday afternoon and evening.
Friday evening speaker: Dr. Kathy Yelick, CS prof at University of California/Berkeley and lead researcher at Lawrence-Berkeley National Labs
"More Data, More Science, and... Moore's Law?"
Friday 4/11, 7:30pm, Tomson 280. Extra credit for attending sessions and writing 1-page reactions.

Click here for more information and complete schedule

Submitted questions on assignments and technology

Upcoming

Homework 21

Submitted questions on readings

Intro to WebMapReduce

Map-reduce computing

Basic idea of map-reduce computing:
1. Break data up into labelled pieces (mapper())
2. Gather and combine pieces that have the same label (reducer())
Example: For computing word frequencies, we
1. break up each line of text into words, and emit a labelled piece (key = word, value = '1') for each word found, then
2. gather all the '1's for each word, and add up those '1's as integers to determine the frequency for that word.
Stages of computation:
- mapper stage: start with one line of the original data; produce intermediate key-value pairs.
- shuffle stage: organize key-value pairs according to key; transmit across network to a convenient location for next stage.
- reducer stage: start with all key-value pairs for a given fixed key; produce result key-value pairs of interest.
A map-reduce implementation (such as Hadoop, which we use) takes care of details such as splitting the data and shuffling, so a map-reduce programmer needs only provide mapper and reducer algorithms.
Python3 mapper() and reducer() functions
Entering the job into WMR
Teaching materials

Exercises

Exercise 1:

Use WMR to compute the highest reviewer id in a netflix dataset.

Hints

A sample data set:

1,1596531,5,2004-01-23
3,2318004,4,2004-02-05
6,110641,5,2003-12-15
8,1447639,4,2005-03-27
8,2557899,5,2005-04-16
6,52076,4,2004-10-05
1,13651,3,2004-06-16
1,1374216,2,2005-09-23

Spec for mapper:

# IN format
#    key is a netflix record   value is empty string
#    NOTE: netflix record format is   movieID,reviewerID,rating,date
# OUT format
#    key is empty  value is a reviewerID

Spec for reducer:

# IN format
#    key is empty   value are reviewerID
# OUT format
#    key is maximum reviewer id  value is empty

Hints: (a) use split() with an argument; (b) use an accumulator in reducer to determine the maximum value (since reviewerIDs are non-negative, you can initialize accumulator at -1); (c) use the Test interface to debug.

Exercise 2:

Compute the average ratings for all movies

You can use the same data set as above. For that data set, the results should be:
```
        1	3.3333
        3	4
        6	4.5
        8	4.5
```

mapper() specs

# IN format
#    key is a netflix record   value is empty string
#    NOTE: netflix record format is   movieID,reviewerID,rating,date
# OUT format
#    key is a movieID  value is a rating

reducer() specs

# IN format
#    key is a movieID  values are ratings
# OUT format
#    key is a movieID  value is the mean movie rating for that movie

Hints: (a) Use two accumulators in reducer, one for sum and one for count, in order to compute the mean

< >