AllNotes-Mobile-AndroidTrack
AllNotes-Mobile-AndroidTrack
OpenCourseWare
Syllabus
Introduction to the intellectual enterprises of computer science and the art of programming. This course teaches students how to think
algorithmically and solve problems ef ciently. Topics include abstraction, algorithms, data structures, encapsulation, resource management,
security, and software engineering. Languages include C, Python, and SQL plus students’ choice of: HTML, CSS, and JavaScript (for web
development); Java or Swift (for mobile app development); or Lua (for game development). Problem sets inspired by the arts, humanities, social
sciences, and sciences. Course culminates in a nal project. Designed for concentrators and non-concentrators alike, with or without prior
programming experience. Two thirds of CS50 students have never taken CS before. Among the overarching goals of this course are to inspire
students to explore unfamiliar waters, without fear of failure, create an intensive, shared experience, accessible to all students, and build
community among students.
Expectations
Website
https://cs50.edx.org/
Certi cates
CS50x is free to take, and you are welcome to submit the course’s nine problem sets and nal project for automated feedback. To be eligible for
a veri ed certi cate (https://www.edx.org/veri ed-certi cate) from edX, however, you must receive a satisfactory score (at least 70%) on each
problem you submit as part of one of the course’s nine problem sets as well as on the course’s nal project.
Problems are evaluated along axes of correctness (as determined by a program called check50 ) and style (as determined by a program called
style50 ), with scores ordinarily computed as 3 × correctness + 1 × style.
Books
No books are required or recommended for this course. However, you might nd the below books of interest. Realize that free, if not superior,
resources can be found on the course’s website.
C Programming Absolute Beginner’s Guide, Third Edition + Greg Perry, Dean Miller + Pearson Education, 2014 + ISBN 0-789-75198-4
Hacker’s Delight, Second Edition + Henry S. Warren Jr. + Pearson Education, 2013 + ISBN 0-321-84268-5
How Computers Work, Tenth Edition + Ron White + Que Publishing, 2014 + ISBN 0-7897-4984-X
Programming in C, Fourth Edition + Stephen G. Kochan + Pearson Education, 2015 + ISBN 0-321-77641-0
Lectures
Integrated into problem sets are “walkthroughs,” videos that offer direction on where to begin and how to approach problems.
Problem Sets
Problem sets are programming assignments. CS50x does not have deadlines for problem sets. You are welcome to work on and submit them at
your own pace. To be eligible for a veri ed certi cate from edX, however, you must submit (and receive a score of at least 70% on) all problem
sets by 31 December 2020.
Final Project
The climax of this course is its nal project. The nal project is your opportunity to take your newfound savvy with programming out for a spin
and develop your very own piece of software. So long as your project draws upon this course’s lessons, the nature of your project is entirely up
to you. You may implement your project in any language(s). You are welcome to utilize infrastructure other than the CS50 IDE. All that we ask is
that you build something of interest to you, that you solve an actual problem, that you impact your community, or that you change the world.
Strive to create something that outlives this course.
Inasmuch as software development is rarely a one-person effort, you are allowed an opportunity to collaborate with one or two classmates for
this nal project. Needless to say, it is expected that every student in any such group contribute equally to the design and implementation of
that group’s project. Moreover, it is expected that the scope of a two- or three-person group’s project be, respectively, twice or thrice that of a
typical one-person project. A one-person project, mind you, should entail more time and effort than is required by each of the course’s problem
sets. Although no more than three students may design and implement a given project, you are welcome to solicit advice from others, so long
as you respect the course’s policy on academic honesty.
CS50x does not have a deadline for the nal project. You are welcome to work on and submit it at your own pace. To be eligible for a veri ed
certi cate from edX, however, you must submit (and receive a score of at least 70% on) it by 31 December 2020.
Academic Honesty
This course’s philosophy on academic honesty is best stated as “be reasonable.” The course recognizes that interactions with classmates and
others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the
work of another. This policy characterizes both sides of that line.
The essence of all work that you submit to this course must be your own. Collaboration on problem sets is not permitted except to the extent
that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking,
when asking for help, you may show your code to others, but you may not view theirs, so long as you and they respect this policy’s other
constraints. Collaboration on the course’s nal project is permitted to the extent prescribed by its speci cation.
Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to
whether some act is reasonable, do not commit it. If the course determines that you have commited an act that is not reasonable, you may be
deemed ineligible for a certi cate. If you commit some act that is not reasonable but bring it to the attention of the course’s instructor within
72 hours, the course may reconsider that outcome.
Reasonable
Communicating with classmates about problem sets’ problems in English (or some other spoken language).
Discussing the course’s material with others in order to understand it better.
Helping a classmate identify a bug in his or her code in person or online, as by viewing, compiling, or running his or her code, even on your
own computer.
Incorporating a few lines of code that you nd online or elsewhere into your own code, provided that those lines are not themselves
solutions to assigned problems and that you cite the lines’ origins.
Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and x a bug.
Sharing a few lines of your own code online so that others might help you identify and x a bug.
Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical dif culties, but not
for outright solutions to problem set’s problems or your own nal project.
Whiteboarding solutions to problem sets with others using diagrams or pseudocode but not actual code.
Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.
N tR bl 2/3
Not Reasonable
Accessing a solution to some problem prior to (re-)submitting your own.
Asking a classmate to see his or her solution to a problem set’s problem before (re-)submitting your own.
Decompiling, deobfuscating, or disassembling the staff’s solutions to problem sets.
Failing to cite (as with comments) the origins of code or techniques that you discover outside of the course’s own lessons and integrate
into your own work, even while respecting this policy’s other constraints.
Giving or showing to a classmate a solution to a problem set’s problem when it is he or she, and not you, who is struggling to solve it.
Paying or offering to pay an individual for work that you may submit as (part of) your own.
Searching for or soliciting outright solutions to problem sets online or elsewhere.
Splitting a problem set’s workload with another individual and combining your work.
Submitting (after possibly modifying) the work of another individual beyond the few lines allowed herein.
Submitting the same or similar work to this course that you have submitted or will submit to another.
Viewing another’s solution to a problem set’s problem and basing your own solution on it.
3/3
This is CS50x
OpenCourseWare
Lecture 0
Welcome
What is computer science?
Binary
Representing data
Algorithms
Pseudocode
Scratch
Welcome
When David was a rst year, he was too intimidated to take any computer science courses. By the time he was a sophomore, he found the
courage to take the equivalent of CS50, but only pass/fail.
In fact, two-thirds of CS50 students have never taken a CS course before.
And importantly, too:
what ultimately matters in this course is not so much where you end up relative to your classmates but where you end up relative to
yourself when you began
We need a way to represent inputs, such that we can store and work with information in a standard way.
Binary
A computer, at the lowest level, stores data in binary, a numeral system in which there are just two digits, 0 and 1.
When we rst learned to count, we might have used one nger to represent one thing. That system is called unary. When we learned to
write numbers with the digits 0 through 9, we learned to use decimal.
For example, we know the following represents one hundred and twenty-three.
1 2 3
1/13
The 3 is in the ones column, the 2 is in the tens column, and the 1 is in the hundreds column.
So 123 is 100×1 + 10×2 + 1×3 = 100 + 20 + 3 = 123.
Each place for a digit represents a power of ten, since there are ten possible digits for each place.
In binary, with just two digits, we have powers of two for each place value:
4 2 1
0 0 0
Now if we change the binary value to, say, 0 1 1 , the decimal value would be 3.
4 2 1
0 1 1
8 4 2 1
1 0 0 0
And binary makes sense for computers because we power them with electricity, which can be either on or off, so each bit only needs to be
on or off. In a computer, there are millions or billions of switches called transistors that can store electricity and represent a bit by being
“on” or “off”.
With enough bits, or binary digits, computers can count to any number.
8 bits make up one byte.
Representing data
To represent letters, all we need to do is decide how numbers map to letters. Some humans, many years ago, collectively decided on a
standard mapping called ASCII (https://en.wikipedia.org/wiki/ASCII). The letter “A”, for example, is the number 65, and “B” is 66, and so on.
The mapping also includes punctuation and other symbols. Other characters, like letters with accent marks, and emoji, are part of a
standard called Unicode (https://en.wikipedia.org/wiki/Unicode) that use more bits than ASCII to accommodate all these characters.
When we receive an emoji, our computer is actually just receiving a decimal number like 128514 ( 11111011000000010 in binary, if
you can read that more easily) that it then maps to the image of the emoji.
An image, too, is comprised of many smaller square dots, or pixels, each of which can be represented in binary with a system called RGB,
with values for red, green, and blue light in each pixel. By mixing together different amounts of each color, we can represent millions of
colors:
The red, green, and blue values are combined to get a light yellow color:
2/13
And computer programs know, based on the context of its code, whether the binary numbers should be interpreted as numbers, or letters,
or pixels.
And videos are just many, many images displayed one after another, at some number of frames per second. Music, too, can be represented
by the notes being played, their duration, and their volume.
Algorithms
So now we can represent inputs and outputs. The black box earlier will contain algorithms, step-by-step instructions for solving a problem:
Our rst solution, one page at a time, is like the red line: our time to solve increases linearly as the size of the problem increases.
The second solution, two pages at a time, is like the yellow line: our slope is less steep, but still linear.
Our nal solution, is like the green line: logarithmic, since our time to solve rises more and more slowly as the size of the problem
increases. In other words, if the phone book went from 1000 to 2000 pages, we would need one more step to nd Mike. If the size
doubled again from 2000 to 4000 pages, we would still only need one more step.
3/13
doubled again from 2000 to 4000 pages, we would still only need one more step.
Pseudocode
We can write pseudocode, an informal syntax that is just a more speci c version of English (or other human language) that represents our
algorithm:
Some of these lines start with verbs, or actions. We’ll start calling these functions:
We also have branches that lead to different paths, like forks in the road, which we’ll call conditions:
And the questions that decide where we go are called Boolean expressions, which eventually result to a value of true or false:
Fi ll h d h l d l h f ll d l 4/13
Finally, we have words that lead to cycles, where we can repeat parts of our program, called loops:
Scratch
We can write programs with the building blocks we just discovered:
functions
conditions
Boolean expressions
loops
We’ll use a graphical programming language called Scratch (https://scratch.mit.edu/), where we’ll drag and drop blocks that contain
instructions.
Later in our course, we’ll move onto textual programming languages like C, and Python, and JavaScript. All of these languages, including
Scratch, has more powerful features like:
variables
the ability to store values and change them
threads
the ability for our program to do multiple things at once
events
the ability to respond to changes in our program or inputs
…
The programming environment for Scratch looks like this:
On the left, we have puzzle pieces that represent functions or variables, or other concepts, that we can drag and drop into our
instruction area in the center.
5/13
On the right, we have a stage that will be shown by our program to a human, where we can add or change backgrounds, characters
(called sprites in Scratch), and more.
The “when green ag clicked” block is the start of our program, and below it we’ve snapped in a “say” block and typed in “hello,
world”.
We can also drag in the “ask and wait” block, with a question like “What’s your name?”, and combine it with a “say” block for the answer:
But we didn’t wait after we said “Hello” with the rst block, so we can use the “say () for () seconds” block:
We can use the “join” block to combine two phrases so Scratch can say “hello, David”:
6/13
The “ask” block, too, takes in an input (the question we want to ask), and produces the output of the “answer” block:
We can then use the “answer” block along with our own text, “hello, “, as two inputs to the join algorithm …
7/13
… which we pass as input again to the “say” block:
We can try to make Scratch (the name of the cat) say meow:
But when we click the green ag, we hear the meow sound over and over immediately. Our rst bug, or mistake! We can add a block
to wait, so the meows sound more normal.
We can have Scratch point towards the mouse and move towards it:
8/13
We’ll look at a sheep that can count:
Here, counter is a variable, the value of which we can set, use, and change.
We can also have Scratch meow if we touch it with the mouse pointer:
Here, we have two different branches, or conditions, that will repeat forever. If the mouse is touching it, Scratch will “roar”, otherwise
it will just meow.
9/13
We can make Scratch move back and forth on the screen with a few more blocks we can discover by looking around:
10/13
We look at another program, bark, where we can use the space bar to mute a sea lion:
We have a variable, muted , that’s false by default. And our program will constantly check if the space bar is pressed, and set muted
to false if it’s true , or true if not. This way, we can toggle whether the sound plays or not, since our other set of blocks for the
sea lion check the muted variable:
With multiple sprites, or characters, we can have different sets of blocks for each of them:
11/13
For one puppet, we have these blocks that say “Marco!”, and then a “broadcast event” block. This “event” is used for our two sprites to
communicate with each other, like sending a secret message. So our other puppet can just wait for this event to say “Polo!”:
Now that we know some basics, we can think about the design, or quality of our programs. For example, we might want to have Scratch
cough three times by repeating some blocks:
The next step is abstracting away some of our code into a function, or making it reusable in different ways. We can make a block called
“cough” and put some blocks inside it:
12/13
Now, all of our sprites can use the same “cough” block, in as many places as we’d like.
We can even put a number of times into our cough function, so we only need a single block to cough any number of times:
We look at some examples and discuss how we might implement components of them with different sprites that follow the mouse cursor,
or cause something else to happen on the stage.
Welcome aboard!
13/13
This is CS50x
OpenCourseWare
Lecture 1
C
hello, world
Compilers
String
Scratch blocks in C
Types, formats, operators
More examples
Screens
Memory, imprecision, and over ow
C
Today we’ll learn a new language, C: a programming language that has all the features of Scratch and more, but perhaps a little less
friendly since it’s purely in text:
#include <stdio.h>
int main(void)
{
printf("hello, world\n");
}
Though the words are new, the ideas are exactly as same as the “when green ag clicked” and “say (hello, world)” blocks in Scratch:
Though cryptic, don’t forget that 2/3 of CS50 students have never taken CS before, so don’t be daunted! And though at rst, to borrow a
phrase from MIT, trying to absorb all these new concepts may feel like drinking from a re hose, be assured that by the end of the
semester we’ll be empowered by and experienced at learning and applying these concepts.
We can compare a lot of the constructs in C, to blocks we’ve already seen and used in Scratch. The syntax is far less important than the
principles, which we’ve already been introduced to.
hello, world
The “when green ag clicked” block in Scratch starts the main program; clicking the green ag causes the right set of blocks underneath
to start. In C, the rst line for the same is int main(void) , which we’ll learn more about over the coming weeks, followed by an open
curly brace { , and a closed curly brace } , wrapping everything that should be in our program.
int main(void)
{
}
1/12
The “say (hello, world)” block is a function, and maps to printf("hello, world"); . In C, the function to print something to the screen is
printf , where f stands for “format”, meaning we can format the printed string in different ways. Then, we use parentheses to pass in
what we want to print. We have to use double quotes to surround our text so it’s understood as text, and nally, we add a semicolon ; to
end this line of code.
To make our program work, we also need another line at the top, a header line #include <stdio.h> that de nes the printf function that
we want to use. Somewhere there is a le on our computer, stdio.h , that includes the code that allows us to access the printf function,
and the #include line tells the computer to include that le with our program.
To write our rst program in Scratch, we opened Scratch’s website. Similarly, we’ll use the CS50 Sandbox (https://sandbox.cs50.io/) to start
writing and running code the same way. The CS50 Sandbox is a virtual, cloud-based environment with the libraries and tools already
installed for writing programs in various languages. At the top, there is a simple code editor, where we can type text. Below, we have a
terminal window, into which we can type commands:
We’ll type our code from earlier into the top, after using the + sign to create a new le called hello.c :
We end our program’s le with .c by convention, to indicate that it’s intended as a C program. Notice that our code is colorized, so that
certain things are more visible.
Compilers
Once we save the code that we wrote, which is called source code, we need to convert it to machine code, binary instructions that the
computer understands directly.
We use a program called a compiler to compile our source code into machine code.
To do this, we use the Terminal panel, which has a command prompt. The $ at the left is a prompt, after which we can type commands.
We type clang hello.c (where clang stands for “C languages”, a compiler written by a group of people). But before we press enter,
we click the folder icon on the top left of CS50 Sandbox. We see our le, hello.c . So we press enter in the terminal window, and see
that we have another le now, called a.out (short for “assembly output”). Inside that le is the code for our program, in binary. Now,
we can type ./a.out in the terminal prompt to run the program a.out in our current folder. We just wrote, compiled, and ran our
rst program!
String
But after we run our program, we see hello, world$ , with the new prompt on the same line as our output. It turns out that we need to
specify precisely that we need a new line after our program, so we can update our code to include a special newline character, \n :
#include <stdio.h>
2/12
int main(void)
{
printf("hello, world\n");
}
Now we need to remember to recompile our program with clang hello.c before we can run this new version.
Line 2 of our program is intentionally blank since we want to start a new section of code, much like starting new paragraphs in essays. It’s
not strictly necessary for our program to run correctly, but it helps humans read longer programs more easily.
We can change the name of our program from a.out to something else, too. We can pass command-line arguments, or additional options,
to programs in the terminal, depending on what the program is written to understand. For example, we can type clang -o hello
hello.c , and -o hello is telling the program clang to save the compiled output as just hello . Then, we can just run ./hello .
In our command prompt, we can run other commands, like ls (list), which shows the les in our current folder:
$ ls
a.out* hello* hello.c
The asterisk, * , indicates that those les are executable, or that they can be run by our computer.
We can use the rm (remove) command to delete a le:
$ rm a.out
rm: remove regular file 'a.out'?
We can type y or yes to con rm, and use ls again to see that it’s indeed gone forever.
Now, let’s try to get input from the user, as we did in Scratch when we wanted to say “hello, David”:
First, we need a string, or piece of text (speci cally, zero or more characters in a sequence in double quotes, like "" , "ba" , or
“bananas”), that we can ask the user for, with the function get_string . We pass the prompt, or what we want to ask the user, to the
function with "What is your name?\n" inside the parentheses. On the left, we want to create a variable, answer , the value of which
will be what the user enters. (The equals sign = is setting the value from right to left.) Finally, the type of variable that we want is
string , so we specify that to the left of answer .
Next, inside the printf function, we want the value of answer in what we print back out. We use a placeholder for our string
variable, %s , inside the phrase we want to print, like "hello, %s\n" , and then we give printf another argument, or option, to tell
it that we want the variable answer to be substituted.
If we made a mistake, like writing printf("hello, world"\n); with the \n outside of the double quotes for our string, we’ll see an errors
from our compiler:
The rst line of the error tells us to look at hello.c , line 5, column 26, where the compiler expected a closing parentheses, instead
of a backslash.
To simplify things (at least for the beginning), we’ll include a library, or set of code, from CS50. The library provides us with the string
variable type, the get_string function, and more. We just have to write a line at the top to include the le cs50.h :
#include <cs50.h>
#include <stdio.h>
int main(void)
{
string name = get_string("What's your name?\n");
printf("hello, name\n");
3/12
p ( , \ );
}
#include <stdio.h>
int main(void)
{
string name = get_string("What's your name?\n");
printf("hello, %s\n", name);
}
Now, if we try to compile that code, we get a lot of lines of errors. Sometimes, one mistake means that the compiler then starts
interpreting correct code incorrectly, generating more errors than there actually are. So we start with our rst error:
We didn’t mean stdin (“standard in”) instead of string , so that error message wasn’t helpful. In fact, we need to import another le
that de nes the type string (actually a training wheel from CS50, as we’ll nd out in the coming weeks).
So we can include another le, cs50.h , which also includes the function get_string , among others.
#include <cs50.h>
#include <stdio.h>
int main(void)
{
string name = get_string("What's your name?\n");
printf("hello, %s\n", name);
}
Now, when we try to compile our program, we have just one error:
It turns out that we also have to tell our compiler to add our special CS50 library le, with clang -o string string.c -lcs50 , with -
l for “link”.
We can even abstract this away and just type make string . We see that, by default in the CS50 Sandbox, make uses clang to compile
our code from string.c into string , with all the necessary arguments, or ags, passed in.
Scratch blocks in C
The “set [counter] to (0)” block is creating a variable, and in C we would write int counter = 0; , where int speci es that the type of our
variable is an integer:
“change [counter] by (1)” is counter = counter + 1; in C. (In C, the = isn’t like an equals sign in a equation, where we are saying
counter is the same as counter + 1 . Instead, = is an assignment operator that means, “copy the value on the right, into the value on
the left”.) And notice we don’t need to say int anymore, since we presume that we already speci ed previously that counter is an int ,
with some existing value. We can also say counter += 1; or counter++; both of which are “syntactic sugar”, or shortcuts that have the
same effect with fewer characters to type.
Notice that in C, we use { and } (as well as indentation) to indicate how lines of code should be nested.
We can also have if-else conditions:
if (x < y)
{
printf("x is less than y\n");
}
else
{
printf("x is not less than y\n");
}
Notice that lines of code that themselves are not some action ( if... , and the braces) don’t end in a semicolon.
And even else if :<
5/12
if (x < y)
{
printf("x is less than y\n");
}
else if (x > y)
{
printf("x is greater than y\n");
}
else if (x == y)
{
printf("x is equal to y\n");
}
while (true)
{
printf("hello, world\n");
}
The while keyword also requires a condition, so we use true as the Boolean expression to ensure that our loop will run forever. Our
program will check whether the expression evaluates to true (which it always will in this case), and then run the lines inside the
curly braces. Then it will repeat that until the expression isn’t true anymore (which won’t change in this case).
We could do something a certain number of times with while :
int i = 0;
while (i < 50)
{
printf("hello, world\n");
i++;
}
We create a variable, i , and set it to 0. Then, while i < 50 , we run some lines of code, and we add 1 to i after each run.
The curly braces around the two lines inside the while loop indicate that those lines will repeat, and we can add additional lines to
6/12
our program after if we wanted to.
To do the same repetition, more commonly we can use the for keyword:
Again, rst we create a variable named i and set it to 0. Then, we check that i < 50 every time we reach the top of the loop, before
we run any of the code inside. If that expression is true, then we run the code inside. Finally, after we run the code inside, we use i++
to add one to i , and the loop repeats.
More examples
For each of these examples, you can click on the sandbox links to run and edit your own copies of them.
In int.c , we get and print an integer:
#include <cs50.h>
#include <stdio.h>
int main(void)
{
int age = get_int("What's your age?\n");
int days = age * 365;
printf("You are at least %i days old.\n", days);
}
Though, once a line is too long or complicated, it may be better to keep two or even three lines for readability.
In float.c , we can get decimal numbers (called oating-point values in computers, because the decimal point can “ oat” between the
digits, depending on the number):
#include <cs50.h>
#include <stdio.h>
int main(void)
{
float price = get_float("What's the price?\n");
printf("Your total is %f.\n", price * 1.0625);
}
Now, if we compile and run our program, we’ll see a price printed out with tax.
We can specify the number of digits printed after the decimal with a placeholder like %.2f for two digits after the decimal point.
With parity.c , we can check if a number is even or odd:
#include <cs50.h>
#include <stdio.h>
int main(void)
{
int n = get_int("n: ");
if (n % 2 == 0)
{
printf("even\n");
}
else
{
printf("odd\n");
}
}
With the % (modulo) operator, we can get the remainder of n after it’s divided by 2. If the remainder is 0, we know that n is even.
Otherwise, we know n is odd.
And functions like get_int from the CS50 library do error-checking, where only inputs from the user that matches the type we want
is accepted.
In conditions.c , we turn the condition snippets from before into a program:
#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Prompt user for x
int x = get_int("x: ");
// Compare x and y
if (x < y)
{
printf("x is less than y\n");
}
else if (x > y)
{
printf("x is greater than y\n");
}
8/12
}
else
{
printf("x is equal to y\n");
}
}
Lines that start with // are comments, or note for humans that the compiler will ignore.
For David to compile and run this program in his sandbox, he rst needed to run cd src1 in the terminal. This changes the directory,
or folder, to the one in which he saved all of the lecture’s source les. Then, he could run make conditions and ./conditions . With
pwd , he can see that he’s in a src1 folder (inside other folders). And cd by itself, with no arguments, will take us back to our
default folder in the sandbox.
// Logical operators
#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Prompt user to agree
char c = get_char("Do you agree?\n");
We use two vertical bars, || , to indicate a logical “or”, whether either expression can be true for the condition to be followed.
And if none of the expressions are true, nothing will happen since our program doesn’t have a loop.
Let’s implement the coughing program from week 0:
#include <stdio.h>
int main(void)
{
printf("cough\n");
printf("cough\n");
printf("cough\n");
}
#include <stdio.h>
int main(void)
{
for (int i = 0; i < 3; i++)
{
printf("cough\n");
}
}
By convention, programmers tend to start counting at 0, and so i will have the values of 0 , 1 , and 2 before stopping, for a total
of three iterations. We could also write for (int i = 1, i <= 3, i++) for the same nal effect.
We can move the printf line to its own function:
#include <stdio.h>
void cough(void);
int main(void)
{
for (int i = 0; i < 3; i++)
{
cough();
9/12
}
}
void cough(void)
{
printf("cough\n");
}
We declared a new function with void cough(void); , before our main function calls it. The C compiler reads our code from top to
bottom, so we need to tell it that the cough function exists, before we use it. Then, after our main function, we can implement the
cough function. This way, the compiler knows the function exists, and we can keep our main function close to the top.
And our cough function doesn’t take any inputs, so we have cough(void) .
#include <stdio.h>
int main(void)
{
cough(3);
}
void cough(int n)
{
for (int i = 0; i < n; i++)
{
printf("cough\n");
}
}
Now, when we want to print “cough” any number of times, we can just call the same function. Notice that, with void cough(int n) ,
we indicate that the cough function takes as input an int , which we refer to as n . And inside cough , we use n in our for loop
to print “cough” the right number of times.
Let’s look at positive.c :
#include <cs50.h>
#include <stdio.h>
int get_positive_int(void);
int main(void)
{
int i = get_positive_int();
printf("%i\n", i);
}
The CS50 library doesn’t have a get_positive_int function, but we can write one ourselves. Our function int
get_positive_int(void) will prompt the user for an int and return that int , which our main function stores as i . In
get_positive_int , we initialize a variable, int n , without assigning a value to it yet. Then, we have a new construct, do ...
while , which does something rst, then checks a condition, and repeats until the condition is no longer true.
Once the loop ends because we have an n that is not < 1 , we can return it with the return keyword. And back in our main
function, we can set int i to that value.
Screens
We might want a program that prints part of a screen from a video game like Super Mario Bros. In mario0.c , we have:
int main(void)
{
printf("????\n");
}
We can ask the user for a number of question marks, and then print them, with mario2.c :
#include <cs50.h>
#include <stdio.h>
int main(void)
{
int n;
do
{
n = get_int("Width: ");
}
while (n < 1);
for (int i = 0; i < n; i++)
{
printf("?");
}
printf("\n");
}
#include <cs50.h>
#include <stdio.h>
int main(void)
{
int n;
do
{
n = get_int("Size: ");
}
while (n < 1);
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
printf("#");
}
printf("\n");
}
}
Notice we have two nested loops, where the outer loop uses i to do everything inside n times, and the inner loop uses j , a
different variable, to do something n times for each of those times. In other words, the outer loop prints n “rows”, or lines, and the
inner loop prints n “columns”, or # characters, in each line.
Other examples not covered in lecture are available under “Source Code” for Week 1.
#include <cs50.h>
#include <stdio.h>
i t i ( id) 11/12
int main(void)
{
// Prompt user for x
float x = get_float("x: ");
// Perform division
printf("x / y = %.50f\n", x / y);
}
x: 1
y: 10
x / y = 0.10000000149011611938476562500000000000000000000000
It turns out that this is called oating-point imprecision, where we don’t have enough bits to store all possible values, so the
computer has to store the closest value it can to 1 divided by 10.
We can see a similar problem in overflow.c :
#include <stdio.h>
#include <unistd.h>
int main(void)
{
for (int i = 1; ; i *= 2)
{
printf("%i\n", i);
sleep(1);
}
}
In our for loop, we set i to 1 , and double it with *= 2 . (And we’ll keep doing this forever, so there’s no condition we check.)
We also use the sleep function from unistd.h to let our program pause each time.
Now, when we run this program, we see the number getting bigger and bigger, until:
1073741824
overflow.c:6:25: runtime error: signed integer overflow: 1073741824 * 2 cannot be represented in type 'int'
-2147483648
0
0
...
It turns out, our program recognized that a signed integer (an integer with a positive or negative sign) couldn’t store that next value,
and printed an error. Then, since it tried to double it anyways, i became a negative number, and then 0.
This problem is called integer over ow, where an integer can only be so big before it runs out of bits and “rolls over”. We can picture
adding 1 to 999 in decimal. The last digit becomes 0, we carry the 1 so the next digit becomes 0, and we get 1000. But if we only had
three digits, we would end up with 000 since there’s no place to put the nal 1!
The Y2K problem arose because many programs stored the calendar year with just two digits, like 98 for 1998, and 99 for 1999. But when
the year 2000 approached, the programs would have stored 00, leading to confusion between the years 1900 and 2000.
A Boeing 787 airplane also had a bug where a counter in the generator over ows after a certain number of days of continuous operation,
since the number of seconds it has been running could no longer be stored in that counter.
So, we’ve seen a few problems that can happen, but now understand why, and how to prevent them.
With this week’s problem set, we’ll use the CS50 Lab, built on top of the CS50 Sandbox, to write some programs with walkthroughs to
guide us.
12/12
This is CS50x
OpenCourseWare
Lecture 2
Compiling
Debugging
help50 and printf
debug50
check50 and style50
Data Types
Memory
Arrays
Strings
Command-line arguments
Readability
Encryption
Compiling
Last time, we learned to write our rst program in C. We learned the syntax for the main function in our program, the printf function
for printing to the terminal, how to create strings with double quotes, and how to include stdio.h for the printf function.
Then, we compiled it with clang hello.c to be able to run ./a.out (the default name), and then clang -o hello hello.c (passing in a
command-line argument for the output’s name) to be able to run ./hello .
If we wanted to use CS50’s library, via #include <cs50.h> , for strings and the get_string function, we also have to add a ag: clang -o
hello hello.c -lcs50 . The -l ag links the cs50 le, which is already installed in the CS50 Sandbox, and includes prototypes, or
de nitions of strings and get_string (among more) that our program can then refer to and use.
We write our source code in C, but need to compile it to machine code, in binary, before our computers can run it.
clang is the compiler, and make is a utility that helps us run clang without having to indicate all the options manually.
“Compiling” source code into machine code is actually made up of smaller steps:
preprocessing
compiling
assembling
linking
Preprocessing involves looking at lines that start with a # , like #include , before everything else. For example, #include <cs50.h> will
tell clang to look for that header le rst, since it contains content that we want to include in our program. Then, clang will essentially
replace the contents of those header les into our program.
For example …
#include <cs50.h>
#include <stdio.h>
int main(void)
{
string name = get_string("Name: ");
printf("hello, %s\n", name);
}
int main(void)
{
string name = get_string("Name: ");
printf("hello, %s\n", name);
}
Compiling takes our source code, in C, and converts it to assembly code, which looks like this:
...
main: # @main
.cfi_startproc
# BB#0:
pushq %rbp
.Ltmp0:
.cfi_def_cfa_offset 16
.Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
.Ltmp2:
.cfi_def_cfa_register %rbp
subq $16, %rsp
xorl %eax, %eax
movl %eax, %edi
movabsq $.L.str, %rsi
movb $0, %al
callq get_string
movabsq $.L.str.1, %rdi
movq %rax, -8(%rbp)
movq -8(%rbp), %rsi
movb $0, %al
callq printf
...
These instructions are lower-level and is closer to the binary instructions that a computer’s CPU can directly understand. They
generally operate on bytes themselves, as opposed to abstractions like variable names.
The next step is to take the assembly code and translate it to instructions in binary by assembling it. The instructions in binary are called
machine code, which a computer’s CPU can run directly.
The last step is linking, where the contents of previously compiled libraries that we want to link, like cs50.c , are actually combined with
the binary of our program. So we end up with one binary le, a.out or hello , that is the compiled version of hello.c , cs50.c , and
printf.c .
Debugging
Bugs are mistakes in programs that we didn’t intend to make. And debugging is the process of nding and xing bugs.
int main(void)
{
printf("hello, world\n");
}
We see an error (in red), when we try to make this program, that we are implicitly declaring library function 'printf' . We don’t
really understand this, so we can run help50 make buggy0 , which will tell us, at the end, that we might have forgotten to write
#include <stdio.h> , which contains printf .
We can try this again with buggy1.c :
#include <stdio.h>
int main(void)
{
string name = get_string("What's your name?\n");
i f("h ll % \ " ) 2/11
printf("hello, %s\n", name);
}
We see a lot of errors, and even the rst one doesn’t seem to make much sense. So we can again run help50 make buggy1 , which will
hint to us that we need cs50.h since string isn’t de ned.
To clear the terminal window (so that we can see just the output of whatever we want to run next), we can press control + L , or type in
clear as a command to the terminal window.
Let’s look at buggy2.c :
#include <stdio.h>
int main(void)
{
for (int i = 0; i <= 10; i++)
{
printf("#\n");
}
}
Hmm, we intended to only see 10 # s, but there are 11. If we didn’t know what the problem is (since our program is compiling
without any errors, and we now have a logical error), we could add another print line to help us:
#include <stdio.h>
int main(void)
{
for (int i = 0; i <= 10; i++)
{
printf("i is now %i: ", i);
printf("#\n");
}
}
Now, we see that i started at 0 and continued until it was 10, but we should have it stop once it’s at 10, with i < 10 instead of i
<= 10 .
debug50
Today we’ll also take a look at CS50 IDE, which is like the CS50 Sandbox, but with more features. It is an online development environment,
with a code editor and a terminal window, but also tools for debugging and collaborating:
In the CS50 IDE, we’ll have another tool, debug50 , to help us debug programs.
We’ll open buggy2.c and try to make buggy2 . But we saved buggy2.c into a folder called src2 , so we need to run cd src2 to change
our directory to the right one. And CS50 IDE’s terminal will remind us what directory we’re in, with a prompt like ~/src/ $ . (The ~
indicates the default, or home directory.)
3/11
Instead of using printf , we can also debug our program interactively. We can add a breakpoint, or an indicator for a line of code where
the debugger should pause our program. For example, we can click to the left of line 5 of our code, and a red circle will appear:
Now, if we run debug50 ./buggy2 , we’ll see the debugger panel open on the right:
We see that the variable we made, i , is under the Local Variables section, and see that there’s a value of 0 .
Our breakpoint has paused our program after line 5, to just before line 7, since it’s the rst line of code that can run. To continue, we have a
few controls in the debugger panel. The blue triangle will continue our program until we reach another breakpoint or the end of our
program. The curved arrow to its right will “step over” the line, running it and pausing our program again immediately after.
So, we’ll use the curved arrow to run the next line, and see what changes after. We’re at the printf line, and pressing the curved arrow
again, we see a single # printed to our terminal window. With another click of the arrow, we see the value of i on the right change to
1 . And we can keep clicking the arrow to watch our program run, one line at a time.
To exit the debugger, we can press control + C to stop the program.
We can save lots of time in the future by investing a little bit now to learn how to use debug50 !
Data Types
In C, we have different types of variables we can use for storing data:
bool 1 byte
char 1 byte
int 4 bytes
oat 4 bytes
long 8 bytes
double 8 bytes
string ? bytes
Each of these types take up a certain number of bytes per variable we create, and the sizes above are what the sandbox, IDE, and most
likely your computer uses for each type in C.
Memory
Inside our computers, we have chips called RAM, random-access memory, that stores data for short-term use. We might save a program or
le to our hard drive (or SSD) for long-term storage, but when we open it, it gets copied to RAM rst. Though RAM is much smaller, and
temporary (until the power is turned off), it is much faster.
We can think of bytes, stored in RAM, as though they were in a grid:
Arrays
Let’s say we wanted to store three variables:
#include <stdio.h>
int main(void)
{
char c1 = 'H';
char c2 = 'I';
char c3 = '!';
printf("%c %c %c\n", c1, c2, c3);
}
Notice that we use single quotes to indicate a literal character, and double quotes for multiple characters together in a string.
We can compile and run this, to see H I ! .
And we know characters are just numbers, so if we change our string formatting to be printf("%i %i %i\n", c1, c2, c3); , we can see
the numeric values of each char printed: 72 73 33 .
We can explicitly convert, or cast, each character to an int before we use it, with (int) c1 , but our compiler can implicitly do that for
us.
And in memory, we might have three boxes, labeled c1 , c2 , and c3 somehow, each of which representing a byte of binary with the
values of each variable.
Let’s look at scores0.c :
#include <cs50.h>
5/11
#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Scores
int score1 = 72;
int score2 = 73;
int score3 = 33;
// Print average
printf("Average: %i\n", (score1 + score2 + score3) / 3);
}
We can print the average of three numbers, but now we need to make one variable for every score we want to include, and we can’t
easily use them later.
It turns out, in memory, we can store variables one after another, back-to-back. And in C, a list of variables stored, one after another in a
contiguous chunk of memory, is called an array.
For example, we can use int scores[3]; to declare an array of 3 integers.
And we can assign and use variables in an array with:
#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Scores
int scores[3];
scores[0] = 72;
scores[1] = 73;
scores[2] = 33;
// Print average
printf("Average: %i\n", (scores[0] + scores[1] + scores[2]) / 3);
}
Notice that arrays are zero-indexed, meaning that the rst element, or value, has index 0.
And we repeated the value 3, representing the length of our array, in two different places. So we can use a constant, or xed value, to
indicate it should always be the same in both places:
#include <cs50.h>
#include <stdio.h>
const int N = 3;
int main(void)
{
// Scores
int scores[N];
scores[0] = 72;
scores[1] = 73;
scores[2] = 33;
// Print average
printf("Average: %i\n", (scores[0] + scores[1] + scores[2]) / N);
}
We can use the const keyword to tell the compiler that the value of N should never be changed by our program. And by convention,
we’ll place our declaration of the variable outside of the main function and capitalize its name, which isn’t necessary for the compiler
but shows other humans that this variable is a constant and makes it easy to see from the start.
With an array, we can collect our scores in a loop, and access them later in a loop, too:
6/11
#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Get number of scores
int n = get_int("Scores: ");
// Get scores
int scores[n];
for (int i = 0; i < n; i++)
{
scores[i] = get_int("Score %i: ", i + 1);
}
// Print average
printf("Average: %.1f\n", average(n, scores));
}
First, we’ll ask the user for the number of scores they have, create an array with enough int s for the number of scores they have, and
use a loop to collect all the scores.
Then we’ll write a helper function, average , to return a float , or a decimal value. We’ll pass in the length and an array of int s
(which could be any size), and use another loop inside our helper function to add up the values into a sum. We use (float) to cast
both sum and length into oats, so the result we get from dividing the two is also a oat.
Finally, when we print the result we get, we use %.1f to show just one place after the decimal.
In memory, our array is now stored like this, where each value takes up not one but four bytes:
Strings
Strings are actually just arrays of characters. If we had a string s , each character can be accessed with s[0] , s[1] , and so on.
And it turns out that a string ends with a special character, ‘\0’, or a byte with all bits set to 0. This character is called the null character, or
null terminating character. So we actually need four bytes to store our string “HI!”:
7/11
Now let’s see what four strings in an array might look like:
string names[4];
names[0] = "EMMA";
names[1] = "RODRIGO";
names[2] = "BRIAN";
names[3] = "DAVID";
printf("%s\n", names[0]);
printf("%c%c%c%c\n", names[0][0], names[0][1], names[0][2], names[0][3]);
We can print the rst value in names as a string, or we can get the rst string, and get each individual character in that string by
using [] again. (We can think of it as (names[0])[0] , though we don’t need the parentheses.)
And though we know that the rst name had four characters, printf probably used a loop to look at each character in the string,
printing them one at a time until it reached the null character that marks the end of the string. And in fact, we can print names[0][4]
as an int with %i , and see a 0 being printed.
We can visualize each character with its own label in memory:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
string s = get_string("Input: ");
printf("Output: ");
for (int i = 0; i < strlen(s); i++)
{
printf("%c", s[i]);
}
printf("\n");
}
We can use the condition s[i] != '\0' , where we can check the current character and only print it if it’s not the null character.
We can also use the length of the string, but rst, we need a new library, string.h , for strlen , which tells us the length of a string.
We can improve the design of our program. string0 was a bit inef cient, since we check the length of the string, after each character is
printed, in our condition. But since the length of the string doesn’t change, we can check the length of the string once:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
string s = get_string("Input: ");
printf("Output:\n");
8/11
for (int i = 0, n = strlen(s); i < n; i++)
{
printf("%c\n", s[i]);
}
}
Now, at the start of our loop, we initialize both an i and n variable, and remember the length of our string in n . Then, we can
check the values each time, without having to actually calculate the length of the string.
And we did need to use a little more memory for n , but this saves us some time with not having to check the length of the string
each time.
We can now combine what we’ve seen, to write a program that can capitalize letters:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
string s = get_string("Before: ");
printf("After: ");
for (int i = 0, n = strlen(s); i < n; i++)
{
if (s[i] >= 'a' && s[i] <= 'z')
{
printf("%c", s[i] - 32);
}
else
{
printf("%c", s[i]);
}
}
printf("\n");
}
First, we get a string s . Then, for each character in the string, if it’s lowercase (its value is between that of a and z ), we convert it
to uppercase. Otherwise, we just print it.
We can convert a lowercase letter to its uppercase equivalent, by subtracting the difference between their ASCII values. (We know that
lowercase letters have a higher ASCII value than uppercase letters, and the difference is conveniently the same between the same
letters, so we can subtract that difference to get an uppercase letter from a lowercase letter.)
We can use the man pages (https://man.cs50.io/), or programmer’s manual, to nd library functions that we can use to accomplish the
same thing:
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
string s = get_string("Before: ");
printf("After: ");
for (int i = 0, n = strlen(s); i < n; i++)
{
printf("%c", toupper(s[i]));
}
printf("\n");
}
From searching the man pages, we see toupper() is a function, among others, from a library called ctype , that we can use.
Command-line arguments
We’ve used programs like make and clang , which take in extra words after their name in the command line. It turns out that programs of
our own, can also take in command-line arguments.
In argv.c , we change what our main function looks like:
#include <cs50.h>
#include <stdio.h>
argc and argv are two variables that our main function will now get, when our program is run from the command line. argc is
the argument count, or number of arguments, and argv is an array of strings that are the arguments. And the rst argument,
argv[0] , is the name of our program (the rst word typed, like ./hello ). In this example, we check if we have two arguments, and
print out the second one if so.
For example, if we run ./argv David , we’ll get hello, David printed, since we typed in David as the second word in our command.
It turns out that we can indicate errors in our program by returning a value from our main function (as implied by the int before our
main function). By default, our main function returns 0 to indicate nothing went wrong, but we can write a program to return a
different value:
#include <cs50.h>
#include <stdio.h>
Readability
Now that we know how to work with strings in our programs, we can analyze paragraphs of text for their level of readability, based on
factors like how long and complicated the words and sentences are.
Encryption
If we wanted to send a message to someone, we might want to encrypt, or somehow scramble that message so that it would be hard for
others to read. The original message, or input to our algorithm, is called plaintext, and the encrypted message, or output, is called
ciphertext.
A message like HI! could be converted to ASCII, 72 73 33 . But anyone would be able to convert that back to letters.
An encryption algorithm generally requires another input, in addition to the plaintext. A key is needed, and sometimes it is simply a
number, that is kept secret. With the key, plaintext can be converted, via some algorith, to ciphertext, and vice versa.
For example, if we wanted to send a message like I L O V E Y O U , we can rst convert it to ASCII: 73 76 79 86 69 89 79 85 . Then,
we can encrypt it with a key of just 1 and a simple algorithm, where we just add the key to each value: 74 77 80 87 70 90 80 86 .
Then, someone converting that ASCII back to text will see J M P W F Z P V . To decrypt this, someone will need to know the key.
We’ll apply these concepts in our problem set!
10/11
11/11
This is CS50x
OpenCourseWare
Lecture 3
Searching
Big O
Linear search
Structs
Sorting
Selection sort
Recursion
Merge sort
Searching
Last time, we talked about memory in a computer, or RAM, and how our data can be stored as individual variables or as arrays of many
items, or elements.
We can think of an array with a number of items as a row of lockers, where a computer can only open one locker to look at an item, one at
a time.
For example, if we want to check whether a number is in an array, with an algorithm that took in an array as input and produce a boolean
as a result, we might:
look in each locker, or at each element, one at a time, from the beginning to the end.
This is called linear search, where we move in a line, since our array isn’t sorted.
start in the middle and move left or right depending on what we’re looking for, if our array of items is sorted.
This is called binary search, since we can divide our problem in two with each step, like what David did with the phone book in
week 0.
We might write pseudocode for linear search with:
We can label each of n lockers from 0 to n–1 , and check each of them in order.
For binary search, our algorithm might look like:
If no items
Return false
If middle item is 50
Return true
Else if 50 < middle item
Search left half
Else if 50 > middle item
Search right half
Eventually, we won’t have any parts of the array left (if the item we want wasn’t in it), so we can return false .
Otherwise, we can search each half depending on the value of the middle item.
Big O
1/10
In week 0, we saw different types of algorithms and their running times:
The more formal way to describe this is with big O notation, which we can think of as “on the order of”. For example, if our algorithm is
linear search, it will take approximately O(n) steps, “on the order of n”. In fact, even an algorithm that looks at two items at a time and
takes n/2 steps has O(n). This is because, as n gets bigger and bigger, only the largest term, n, matters.
Similarly, a logarithmic running time is O(log n), no matter what the base is, since this is just an approximation of what happens with n is
very large.
There are some common running times:
O(n2)
O(n log n)
O(n)
(linear search)
O(log n)
(binary search)
O(1)
Computer scientists might also use big Ω, big Omega notation, which is the lower bound of number of steps for our algorithm. (Big O is the
upper bound of number of steps, or the worst case, and typically what we care about more.) With linear search, for example, the worst case
is n steps, but the best case is 1 step since our item might happen to be the rst item we check. The best case for binary search, too, is 1
since our item might be in the middle of the array.
And we have a similar set of the most common big Ω running times:
Ω(n2)
Ω(n log n)
Ω(n)
(counting the number of items)
Ω(log n)
Ω(1)
(linear search, binary search)
Linear search
2/10
Let’s take a look at numbers.c :
#include <cs50.h>
#include <stdio.h>
int main(void)
{
// An array of numbers
int numbers[] = {4, 8, 15, 16, 23, 42};
// Search for 50
for (int i = 0; i < 6; i++)
{
if (numbers[i] == 50)
{
printf("Found\n");
return 0;
}
}
printf("Not found\n");
return 1;
}
Here we initialize an array with some values, and we check the items in the array one at a time, in order.
And in each case, depending on whether the value was found or not, we can return an exit code of either 0 (for success) or 1 (for
failure).
We can do the same for names:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
// An array of names
string names[] = {"EMMA", "RODRIGO", "BRIAN", "DAVID"};
We can’t compare strings directly, since they’re not a simple data type but rather an array of many characters, and we need to compare
them differently. Luckily, the string library has a strcmp function which compares strings for us and returns 0 if they’re the same,
so we can use that.
Let’s try to implement a phone book with the same ideas:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
string names[] = {"EMMA", "RODRIGO", "BRIAN", "DAVID"};
string numbers[] = {"617–555–0100", "617–555–0101", "617–555–0102", "617–555–0103"};
3/10
string numbers[] = { 617 555 0100 , 617 555 0101 , 617 555 0102 , 617 555 0103 };
We’ll use strings for phone numbers, since they might include formatting or be too long for a number.
Now, if the name at a certain index in the names array matches who we’re looking for, we’ll return the phone number in the numbers
array, at the same index. But that means we need to particularly careful to make sure that each number corresponds to the name at
each index, especially if we add or remove names and numbers.
Structs
It turns out that we can make our own custom data types called structs:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
typedef struct
{
string name;
string number;
}
person;
int main(void)
{
person people[4];
people[0].name = "EMMA";
people[0].number = "617–555–0100";
people[1].name = "RODRIGO";
people[1].number = "617–555–0101";
people[2].name = "BRIAN";
people[2].number = "617–555–0102";
people[3].name = "DAVID";
people[3].number = "617–555–0103";
We can think of structs as containers, inside of which are multiple other data types.
Here, we create our own type with a struct called person , which will have a string called name and a string called number .
Then, we can create an array of these struct types and initialize the values inside each of them, using a new syntax, . , to access the
properties of each person .
In our loop, we can now be more certain that the number corresponds to the name since they are from the same person element.
Sorting
If our input is an unsorted list of numbers, there are many algorithms we could use to produce an output of a sorted list.
4/10
With eight volunteers on the stage with the following numbers, we might consider swapping pairs of numbers next to each other as a rst
step.
Our volunteers start in the following random order:
6 3 8 5 2 7 4 1
We look at the rst two numbers, and swap them so they are in order:
6 3 8 5 2 7 4 1
– –
3 6 8 5 2 7 4 1
The next pair, 6 and 8 , are in order, so we don’t need to swap them.
The next pair, 8 and 5 , need to be swapped:
3 6 8 5 2 7 4 1
– –
3 6 5 8 2 7 4 1
3 6 5 2 8 7 4 1
– –
3 6 5 2 7 8 4 1
– –
3 6 5 2 7 4 8 1
– –
3 6 5 2 7 4 1 8
Our list isn’t sorted yet, but we’re slightly closer to the solution because the biggest value, 8 , has been shifted all the way to the right.
We repeat this with another pass through the list:
3 6 5 2 7 4 1 8
– –
3 6 5 2 7 4 1 8
– –
3 5 6 2 7 4 1 8
– –
3 5 2 6 7 4 1 8
– –
3 5 2 6 7 4 1 8
– –
3 5 2 6 4 7 1 8
– –
3 5 2 6 4 1 7 8
Since we are comparing the i'th and i+1'th element, we only need to go up to n – 2 for i . Then, we swap the two elements if
they’re out of order.
And we can stop after we’ve made n – 1 passes, since we know the largest n–1 elements will have bubbled to the right.
We have n – 2 steps for the inner loop, and n – 1 loops, so we get n2 – 3n + 2 steps total. But the largest factor, or dominant term, is n2, as
n gets larger and larger, so we can say that bubble sort is O(n2).
We’ve seen running times like the following, and so even though binary search is much faster than linear search, it might not be worth the
one–time cost of sorting the list rst, unless we do lots of searches over time:
O(n2)
bubble sort
O(n log n)
O(n)
linear search
5/10
O(log n)
binary search
O(1)
And Ω for bubble sort is still n2, since we still check each pair of elements for n – 1 passes.
Selection sort
We can take another approach with the same set of numbers:
6 3 8 5 2 7 4 1
First, we’ll look at each number, and remember the smallest one we’ve seen. Then, we can swap it with the rst number in our list, since
we know it’s the smallest:
6 3 8 5 2 7 4 1
– –
1 3 8 5 2 7 4 6
Now we know at least the rst element of our list is in the right place, so we can look for the smallest element among the rest, and swap
it with the next unsorted element (now the second element):
1 3 8 5 2 7 4 6
– –
1 2 8 5 3 7 4 6
We can repeat this over and over, until we have a sorted list.
This algorithm is called selection sort, and we might write pseudocode like this:
With big O notation, we still have running time of O(n2), since we were looking at roughly all n elements to nd the smallest, and making n
passes to sort all the elements.
More formally, we can use some formulas to show that the biggest factor is indeed n2:
n + (n – 1) + (n – 2) + ... + 1
n(n + 1)/2
(n^2 + n)/2
n^2/2 + n/2
O(n^2)
So it turns out that selection sort is fundamentally about the same as bubble sort in running time:
O(n2)
bubble sort, selection sort
O(n log n)
O(n)
linear search
O(log n)
binary search
O(1)
The best case, Ω, is also n2.
We can go back to bubble sort and change its algorithm to be something like this, which will allow us to stop early if all the elements are
sorted:
Now, we only need to look at each element once, so the best case is now Ω(n):
Ω(n2)
selection sort 6/10
selection sort
Ω(n log n)
Ω(n)
bubble sort
Ω(log n)
Ω(1)
linear search, binary search
We look at a visualization online comparing sorting algorithms (https://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html) with
animations for how the elements move within arrays for both bubble sort and selection sort.
Recursion
Recall that in week 0, we had pseudocode for nding a name in a phone book, where we had lines telling us to “go back” and repeat some
steps:
We could instead just repeat our entire algorithm on the half of the book we have left:
This seems like a cyclical process that will never end, but we’re actually dividing the problem in half each time, and stopping once
there’s no more book left.
Recursion occurs when a function or algorithm refers to itself, as in the new pseudocode above.
In week 1, too, we implemented a “pyramid” of blocks in the following shape:
#
##
###
####
#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Get height of pyramid
int height = get_int("Height: ");
// Draw pyramid
draw(height);
}
7/10
}
void draw(int h)
{
// Draw pyramid of height h
for (int i = 1; i <= h; i++)
{
for (int j = 1; j <= i; j++)
{
printf("#");
}
printf("\n");
}
}
#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Get height of pyramid
int height = get_int("Height: ");
// Draw pyramid
draw(height);
}
void draw(int h)
{
// If nothing to draw
if (h == 0)
{
return;
}
Now, our draw function rst calls itself recursively, drawing a pyramid of height h - 1 . But even before that, we need to stop if h
is 0, since there won’t be anything left to drawn.
After, we draw the next row, or a row of width h .
Merge sort
We can take the idea of recusion to sorting, with another algorithm called merge sort. The pseudocode might look like:
7 4 5 2 6 3 8 1
First, we’ll sort the left half (the rst four elements):
8/10
7 4 5 2 | 6 3 8 1
– – – –
Well, to sort that, we need to sort the left half of the left half rst:
7 4 | 5 2 | 6 3 8 1
– –
Now, we have just one item, 7 , in the left half, and one item, 4 , in the right half. So we’ll merge that together, by taking the smallest
item from each list rst:
– – | 5 2 | 6 3 8 1
4 7
And now we go back to the right half of the left half, and sort it:
– – | – – | 6 3 8 1
4 7 | 2 5
Now, both halves of the left half are sorted, so we can merge the two of them together. We look at the start of each list, and take 2 since
it’s smaller than 4 . Then, we take 4 , since it’s now the smallest item at the front of both lists. Then, we take 5 , and nally, 7 , to get:
– – – – | 6 3 8 1
– – – –
2 4 5 7
We now sort the right half the same way. First, the left half of the right half:
– – – – | – – | 8 1
– – – – | 3 6 |
2 4 5 7
– – – – | – – | – –
– – – – | 3 6 | 1 8
2 4 5 7
– – – – | – – – –
– – – – | – – – –
2 4 5 7 | 1 3 6 8
And nally, we can merge both halves of the whole list, following the same steps as before. Notice that we don’t need to check all the
elements of each half to nd the smallest, since we know that each half is already sorted. Instead, we just take the smallest element of the
two at the start of each half:
– – – – | – – – –
– – – – | – – – –
2 4 5 7 | – 3 6 8
1
– – – – | – – – –
– – – – | – – – –
– 4 5 7 | – 3 6 8
1 2
– – – – | – – – –
– – – – | – – – –
– 4 5 7 | – – 6 8
1 2 3
– – – – | – – – –
– – – – | – – – –
– – 5 7 | – – 6 8
1 2 3 4
9/10
– – – – | – – – –
– – – – | – – – –
– – – 7 | – – 6 8
1 2 3 4 5
– – – – | – – – –
– – – – | – – – –
– – – 7 | – – – 8
1 2 3 4 5 6
– – – – | – – – –
– – – – | – – – –
– – – – | – – – 8
1 2 3 4 5 6 7
– – – – | – – – –
– – – – | – – – –
– – – – | – – – –
1 2 3 4 5 6 7 8
It took a lot of steps, but it actually took fewer steps than the other algorithms we’ve seen so far. We broke our list in half each time, until
we were “sorting” eight lists with one element each:
7 | 4 | 5 | 2 | 6 | 3 | 8 | 1
4 7 | 2 5 | 3 6 | 1 8
2 4 5 7 | 1 3 6 8
1 2 3 4 5 6 7 8
Since our algorithm divided the problem in half each time, its running time is logarithmic with O(log n). And after we sorted each half (or
half of a half), we needed to merge together all the elements, with n steps since we had to look at each element once.
So our total running time is O(n log n):
O(n2)
bubble sort, selection sort
O(n log n)
merge sort
O(n)
linear search
O(log n)
binary search
O(1)
Since log n is greater than 1 but less than n, n log n is in between n (times 1) and n2.
The best case, Ω, is still n log n, since we still sort each half rst and then merge them together:
Ω(n2)
selection sort
Ω(n log n)
merge sort
Ω(n)
bubble sort
Ω(log n)
Ω(1)
linear search, binary search
Finally, there is another notation, Θ, Theta, which we use to describe running times of algorithms if the upper bound and lower bound is
the same. For example, merge sort has Θ(n log n) since the best and worst case both require the same number of steps. And selection sort
has Θ(n2).
We look at a nal visualization (https://www.youtube.com/watch?v=ZZuD6iUe3Pc) of sorting algorithms with a larger number of inputs,
running at the same time.
10/10
This is CS50x
OpenCourseWare
Lecture 4
Hexadecimal
Pointers
string
Compare and copy
valgrind
Swap
Memory layout
get_int
Files
JPEG
Hexadecimal
In week 0, we learned binary, a counting system with 0s and 1s.
In week 2, we talked about memory and how each byte has an address, or identi er, so we can refer to where our variables are actually
stored.
It turns out that, by convention, the addresses for memory use the counting system hexadecimal, where there are 16 digits, 0-9 and A-F.
Recall that, in binary, each digit stood for a power of 2:
128 64 32 16 8 4 2 1
1 1 1 1 1 1 1 1
16^1 16^0
F F
Here, the F is a value of 15 in decimal, and each place is a power of 16, so the rst F is 16^1 * 15 = 240, plus the second F with
the value of 16^0 * 15 = 15, for a total of 255.
And 0A is the same as 10 in decimal, and 0F the same as 15. 10 in hexadecimal would be 16, and we would say it as “one zero in
hexadecimal” instead of “ten”, if we wanted to avoid confusion.
The RGB color system also conventionally uses hexadecimal to describe the amount of each color. For example, 000000 in hexadecimal
means 0 of each red, green, and blue, for a color of black. And FF0000 would be 255, or the highest possible, amount of red. With
different values for each color, we can represent millions of different colors.
In writing, we can also indicate a value is in hexadecimal by pre xing it with 0x , as in 0x10 , where the value is equal to 16 in decimal,
as opposed to 10.
Pointers
We might create a value n , and print it out:
#include <stdio.h>
int main(void)
{
int n = 50; 1/11
int n = 50;
printf("%i\n", n);
}
In our computer’s memory, there are now 4 bytes somewhere that have the binary value of 50, labeled n :
It turns out that, with the billions of bytes in memory, those bytes for the variable n starts at some unique address that might look like
0x12345678 .
In C, we can actually see the address with the & operator, which means “get the address of this variable”:
#include <stdio.h>
int main(void)
{
int n = 50;
printf("%p\n", &n);
}
And in the CS50 IDE, we might see an address like 0x7ffe00b3adbc , where this is a speci c location in the server’s memory.
The address of a variable is called a pointer, which we can think of as a value that “points” to a location in memory. The * operator lets us
“go to” the location that a pointer is pointing to.
For example, we can print *&n , where we “go to” the address of n , and that will print out the value of n , 50 , since that’s the value at
the address of n :
#include <stdio.h>
int main(void)
{
int n = 50;
printf("%i\n", *&n);
}
We also have to use the * operator (in an unfortunately confusing way) to declare a variable that we want to be a pointer:
#include <stdio.h>
int main(void)
{
int n = 50;
int *p = &n;
printf("%p\n", p);
}
Here, we use int *p to declare a variable, p , that has the type of * , a pointer, to a value of type int , an integer. Then, we can
print its value (something like 0x12345678 ), or print the value at its location with printf("%i\n", *p); .
2/11
In our computer’s memory, the variables might look like this:
Let’s say we have a mailbox labeled “123”, with the number “50” inside it. The mailbox would be int n , since it stores an integer. We
might have another mailbox with the address “456”, inside of which is the value “123”, which is the address of our other mailbox. This
would be int *p , since it’s a pointer to an integer.
With the ability to use pointers, we can create different data structures, or different ways to organize data in memory that we’ll see next
week.
Many modern computer systems are “64-bit”, meaning that they use 64 bits to address memory, so a pointer will be 8 bytes, twice as big as
an integer of 4 bytes.
string
We might have a variable string s for a name like EMMA , and be able to access each character with s[0] and so on:
But it turns out that each character is in stored in memory at a byte with some address, and s is actually just a pointer with the address
of the rst character:
3/11
And since s is just a pointer to the beginning, only the \0 indicates the end of the string.
In fact, the CS50 Library de nes a string with typedef char *string , which just says that we want to name a new type, string , as a
char * , or a pointer to a character.
Let’s print out a string:
#include <cs50.h>
#include <stdio.h>
int main(void)
{
string s = "EMMA";
printf("%s\n", s);
}
#include <stdio.h>
int main(void)
{
char *s = "EMMA";
printf("%s\n", s);
}
#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Get two integers
int i = get_int("i: ");
int j = get_int("j: ");
// Compare integers
if (i == j)
{
printf("Same\n");
}
else
{
printf("Different\n");
}
}
We can compile and run this, and our program works as we’d expect, with the same values of the two integers giving us “Same” and
different values “Different”.
In compare1 , we see that the same string values are causing our program to print “Different”:
4/11
p , g g p g p
#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Get two strings
string s = get_string("s: ");
string t = get_string("t: ");
Given what we now know about strings, this makes sense because each “string” variable is pointing to a different location in memory,
where the rst character of each string is stored. So even if the values of the strings are the same, this will always print “Different”.
For example, our rst string might be at address 0x123, our second might be at 0x456, and s will be 0x123 and t will be 0x456 ,
so those values will be different.
And get_string , this whole time, has been returning just a char * , or a pointer to the rst character of a string from the user.
Now let’s try to copy a string:
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
int main(void)
{
string s = get_string("s: ");
string t = s;
t[0] = toupper(t[0]);
We get a string s , and copy the value of s into t . Then, we capitalize the rst letter in t .
But when we run our program, we see that both s and t are now capitalized.
Since we set s and t to the same values, they’re actually pointers to the same character, and so we capitalized the same character!
To actually make a copy of a string, we have to do a little more work:
#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
char *s = get_string("s: ");
t[0] = toupper(t[0]);
We create a new variable, t , of the type char * , with char *t . Now, we want to point it to a new chunk of memory that’s large
enough to store the copy of the string. With malloc , we can allocate some number of bytes in memory (that aren’t already used to
store other values), and we pass in the number of bytes we’d like. We already know the length of s , so we add 1 to that for the
terminating null character. So, our nal line of code is char *t = malloc(strlen(s) + 1); .
Then, we copy each character, one at a time, and now we can capitalize just the rst letter of t . And we use i < n + 1 , since we
actually want to go up to n , to ensure we copy the terminating character in the string.
We can actually also use the strcpy library function with strcpy(t, s) instead of our loop, to copy the string s into t . To be
clear, the concept of a “string” is from the C language and well-supported; the only training wheels from CS50 are the type string
instead of char * , and the get_string function.
If we didn’t copy the null terminating character, \0 , and tried to print out our string t , printf will continue and print out the unknown,
or garbage, values that we have in memory, until it happens to reach a \0 , or crashes entirely, since our program might end up trying to
read memory that doesn’t belong to it!
valgrind
It turns out that, after we’re done with memory that we’ve allocated with malloc , we should call free (as in free(t) ), which tells our
computer that those bytes are no longer useful to our program, so those bytes in memory can be reused again.
If we kept running our program and allocating memory with malloc , but never freed the memory after we were done using it, we would
have a memory leak, which will slow down our computer and use up more and more memory until our computer runs out.
valgrind is a command-line tool that we can use to run our program and see if it has any memory leaks. We can run valgrind on our
program above with help50 valgrind ./copy and see, from the error message, that line 10, we allocated memory that we never freed (or
“lost”).
So at the end, we can add a line free(t) , which won’t change how our program runs, but no errors from valgrind.
Let’s take a look at memory.c :
// http://valgrind.org/docs/manual/quick-start.html#quick-start.prepare
#include <stdlib.h>
void f(void)
{
int *x = malloc(10 * sizeof(int));
x[10] = 0;
}
int main(void)
{
f();
return 0;
}
This is an example from valgrind’s documentation (valgrind is a real tool, while help50 was written speci cally to help us in this
course).
The function f allocates enough memory for 10 integers, and stores the address in a pointer called x . Then we try to set the 11th
value of x with x[10] to 0 , which goes past the array of memory we’ve allocated for our program. This is called buffer over ow,
where we go past the boundaries of our buffer, or array, and into unknown memory.
valgrind will also tell us there’s an “Invalid write of size 4” for line 8, where we are indeed trying to change the value of an integer (of size
4 bytes).
And this whole time, the CS50 Library has been freeing memory it’s allocated in get_string , when our program nishes!
Swap
We have two colored drinks, purple and green, each of which is in a cup. We want to swap the drinks between the two cups, but we can’t
do that without a third cup to pour one of the drink into rst.
6/11
do t at w t out a t d cup to pou o e o t e d to st.
Now, let’s say we wanted to swap the values of two integers.
With a third variable to use as temporary storage space, we can do this pretty easily, by putting a into tmp , and then b to a , and
nally the original value of a , now in tmp , into b .
But, if we tried to use that function in a program, we don’t see any changes:
#include <stdio.h>
int main(void)
{
int x = 1;
int y = 2;
It turns out that the swap function gets its own variables, a and b when they are passed in, that are copies of x and y , and so
changing those values don’t change x and y in the main function.
Memory layout
Within our computer’s memory, the different types of data that need to be stored for our program are organized into different sections:
The machine code section is our compiled program’s binary code. When we run our program, that code is loaded into the “top” of
memory.
Globals are global variables we declare in our program or other shared variables that our entire program can access.
The heap section is an empty area where malloc can get free memory from, for our program to use.
The stack section is used by functions in our program as they are called. For example, our main function is at the very bottom of the
stack, and has the local variables x and y . The swap function, when it’s called, has its own frame, or slice, of memory that’s on top
7/11
of main ’s, with the local variables a , b , and tmp :
Once the function swap returns, the memory it was using is freed for the next function call, and we lose anything we did, other
than the return values, and our program goes back to the function that called swap .
So by passing in the addresses of x and y from main to swap , we can actually change the values of x and y :
By passing in the address of x and y , our swap function can actually work:
8/11
#include <stdio.h>
int main(void)
{
int x = 1;
int y = 2;
The addresses of x and y are passed in from main to swap , and we use the int *a syntax to declare that our swap function
takes in pointers. We save the value of x to tmp by following the pointer a , and then take the value of y by following the pointer
b , and store that to the location a is pointing to ( x ). Finally, we store the value of tmp to the location pointed to by b ( y ), and
we’re done.
If we call malloc too many times, we will have a heap over ow, where we end up going past our heap. Or, if we have too many functions
being called, we will have a stack over ow, where our stack has too many frames of memory allocated as well. And these two types of
over ow are generally known as buffer over ows, after which our program (or entire computer) might crash.
get_int
We can implement get_int ourselves with a C library function, scanf :
#include <stdio.h>
int main(void)
{
int x;
printf("x: ");
scanf("%i", &x);
printf("x: %i\n", x);
}
scanf takes a format, %i , so the input is “scanned” for that format, and the address in memory where we want that input to go. But
scanf doesn’t have much error checking, so we might not get an integer.
We can try to get a string the same way:
#include <stdio.h>
int main(void)
{
char *s = NULL;
printf("s: ");
scanf("%s", s);
printf("s: %s\n", s);
}
But we haven’t actually allocated any memory for s ( s is NULL , or not pointing to anything), so we might want to call char s[5]
9/11
to allocate an array of 5 characters for our string. Then, s will be treated as a pointer in scanf and printf .
Now, if the user types in a string of length 4 or less, our program will work safely. But if the user types in a longer string, scanf
might be trying to write past the end of our array into unknown memory, causing our program to crash.
Files
With the ability to use pointers, we can also open les:
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
// Open file
FILE *file = fopen("phonebook.csv", "a");
// Close file
fclose(file);
}
fopen is a new function we can use to open a le. It will return a pointer to a new type, FILE , that we can read from and write to.
The rst argument is the name of the le, and the second argument is the mode we want to open the le in ( r for read, w for write,
and a for append, or adding to).
After we get some strings, we can use fprintf to print to a le.
Finally, we close the le with fclose .
Now we can create our own CSV les, les of comma-separated values (like a mini-spreadsheet), programmatically.
JPEG
We can also write a program that opens a le and tells us if it’s a JPEG (image) le:
#include <stdio.h>
// Open file
FILE *file = fopen(argv[1], "r");
if (!file)
{
return 1;
}
// Close file
fclose(file);
}
Now, if we run this program with ./jpeg brian.jpg , our program will try to open the le we specify (checking that we indeed get a
non-NULL le back), and read the rst three bytes from the le with fread .
We can compare the rst three bytes (in hexadecimal) to the three bytes required to begin a JPEG le. If they’re the same, then our le
is likely to be a JPEG le (though, other types of les may still begin with those bytes). But if they’re not the same, we know it’s
de nitely not a JPEG le.
We can use these abilities to read and write les, in particular images, and modify them by changing the bytes in them, in this week’s
problem set!
11/11
This is CS50x
OpenCourseWare
Lecture 5
Pointers
Resizing arrays
Data structures
Linked Lists
More data structures
Pointers
Last time, we learned about pointers, malloc , and other useful tools for working with memory.
Let’s review this snippet of code:
int main(void)
{
int *x;
int *y;
x = malloc(sizeof(int));
*x = 42;
*y = 13;
}
Here, the rst two lines of code in our main function are declaring two pointers, x and y . Then, we allocate enough memory for an
int with malloc , and stores the address returned by malloc into x .
With *x = 42; , we go to the address pointed to by x , and stores the value 42 into that location.
The nal line, though, is buggy since we don’t know what the value of y is, since we never set a value for it. Instead, we can write:
y = x;
*y = 13;
And this will set y to point to the same location as x does, and then set that value to 13 .
We take a look at a short clip, Pointer Fun with Binky (https://www.youtube.com/watch?v=3uLKjb973HU), which also explains this snippet
in an animated way!
Resizing arrays
In week 2, we learned about arrays, where we could store the same kind of value in a list, side-by-side. But we need to declare the size of
arrays when we create them, and when we want to increase the size of the array, the memory surrounding it might be taken up by some
other data.
One solution might be to allocate more memory in a larger area that’s free, and move our array there, where it has more space. But we’ll
need to copy our array, which becomes an operation with running time of O(n), since we need to copy each of n elements in an array.
We might write a program like the following, to do this in code:
1/9
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
// Here, we allocate enough memory to fit three integers, and our variable
// list will point to the first integer.
int *list = malloc(3 * sizeof(int));
// We should check that we allocated memory correctly, since malloc might
// fail to get us enough free memory.
if (list == NULL)
{
return 1;
}
// With this syntax, the compiler will do pointer arithmetic for us, and
// calculate the byte in memory that list[0], list[1], and list[2] maps to,
// since integers are 4 bytes large.
list[0] = 1;
list[1] = 2;
list[2] = 3;
// Now, if we want to resize our array to fit 4 integers, we'll try to allocate
// enough memory for them, and temporarily use tmp to point to the first:
int *tmp = malloc(4 * sizeof(int));
if (tmp == NULL)
{
return 1;
}
// Now, we copy integers from the old array into the new array ...
for (int i = 0; i < 3; i++)
{
tmp[i] = list[i];
}
// We should free the original memory for list, which is why we need a
// temporary variable to point to the new array ...
free(list);
// ... and now we can set our list variable to point to the new array that
// tmp points to:
list = tmp;
It turns out that there’s actually a helpful function, realloc , which will reallocate some memory:
2/9
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *list = malloc(3 * sizeof(int));
if (list == NULL)
{
return 1;
}
list[0] = 1;
list[1] = 2;
list[2] = 3;
// Here, we give realloc our original array that list points to, and it will
// return a new address for a new array, with the old data copied over:
int *tmp = realloc(list, 4 * sizeof(int));
if (tmp == NULL)
{
return 1;
}
// Now, all we need to do is remember the location of the new array:
list = tmp;
list[3] = 4;
free(list);
}
Data structures
Data structures are programming constructs that allow us to store information in different layouts in our computer’s memory.
To build a data structure, we’ll need some tools we’ve seen:
struct to create custom data types
. to access properties in a structure
* to go to an address in memory pointed to by a pointer
Linked Lists
With a linked list, we can store a list of values that can easily be grown by storing values in different parts of memory:
3/9
This is different than an array since our values are no longer next to one another in memory.
We can link our list together by allocating, for each element, enough memory for both the value we want to store, and the address of the
next element:
By the way, NUL refers to \0 , a character that ends a string, and NULL refers to an address of all zeros, or a null pointer that we can
think of as pointing nowhere.
Unlike we can with arrays, we no longer randomly access elements in a linked list. For example, we can no longer access the 5th element
of the list by calculating where it is, in constant time. (Since we know arrays store elements back-to-back, we can add 1, or 4, or the size of
our element, to calculate addresses.) Instead, we have to follow each element’s pointer, one at a time. And we need to allocate twice as
much memory as we needed before for each element.
In code, we might create our own struct called node (like a node from a graph in mathematics), and we need to store both an int and a
pointer to the next node called next :
We start this struct with typedef struct node so that we can refer to a node inside our struct.
We can build a linked list in code starting with our struct. First, we’ll want to remember an empty list, so we can use the null pointer: node
*list = NULL; .
To add an element, rst we’ll need to allocate some memory for a node, and set its values:
node *n = malloc(sizeof(node));
// We want to make sure malloc succeeded in getting memory for us:
if (n != NULL)
{
// This is equivalent to (*n).number, where we first go to the node pointed
// to by n, and then set the number property. In C, we can also use this
// arrow notation:
n->number = 2;
// Then we need to store a pointer to the next node in our list, but the
// new node won't point to anything (for now):
n->next = NULL;
}
4/9
To add to the list, we’ll create a new node the same way, perhaps with the value 4. But now we need to update the pointer in our rst node
to point to it.
Since our list pointer points only to the rst node (and we can’t be sure that the list only has one node), we need to “follow the
breadcrumbs” and follow each node’s next pointer:
If we want to insert a node to the front of our linked list, we would need to carefully update our node to point to the one following it,
before updating list. Otherwise, we’ll lose the rest of our list:
// Here, we're inserting a node into the front of the list, so we want its
// next pointer to point to the original list, before pointing the list to
// n:
n->next = list;
list = n;
And to insert a node in the middle of our list, we can go through the list, following each element one at a time, comparing its values, and
changing the next pointers carefully as well.
With some volunteers on the stage, we simulate a list, with each volunteer acting as the list variable or a node. As we insert nodes into
the list, we need a temporary pointer to follow the list, and make sure we don’t lose any parts of our list. Our linked list only points to the
rst node in our list, so we can only look at one node at a time, but we can dynamically allocate more memory as we need to grow our list.
Now, even if our linked list is sorted, the running time of searching it will be O(n), since we have to follow each node to check their values,
and we don’t know where the middle of our list will be.
We can combine all of our snippets of code into a complete program:
5/9
#include <stdio.h>
#include <stdlib.h>
// Represents a node
typedef struct node
{
int number;
struct node *next;
}
node;
int main(void)
{
// List of size 0, initially not pointing to anything
node *list = NULL;
// Print list
// Here we can iterate over all the nodes in our list with a temporary
// variable. First, we have a temporary pointer, tmp, that points to the
// list. Then, our condition for continuing is that tmp is not NULL, and
// finally, we update tmp to the next pointer of itself.
for (node *tmp = list; tmp != NULL; tmp = tmp->next)
{
// Within the node, we'll just print the number stored:
printf("%i\n", tmp->number);
}
6/9
}
// Free list
// Since we're freeing each node as we go along, we'll use a while loop
// and follow each node's next pointer before freeing it, but we'll see
// this in more detail in Problem Set 5.
while (list != NULL)
{
node *tmp = list->next;
free(list);
list = tmp;
}
}
Notice that there are now two dimensions to this data structure, where some nodes are on different “levels” than others. And we can
imagine implementing this with a more complex version of a node in a linked list, where each node has not one but two pointers, one
to the value in the “middle of the left half” and one to the value in the “middle of the right half”. And all elements to the left of a node
are smaller, and all elemnts to the right are greater.
This is called a binary search tree because each node has at most two children, or nodes it is pointing to, and a search tree because
it’s sorted in a way that allows us to search correctly.
And like a linked list, we’ll want to keep a pointer to just the beginning of the list, but in this case we want to point to the root, or top
center node of the tree (the 4).
Now, we can easily do binary search, and since each node is pointing to another, we can also insert nodes into the tree without moving all
of them around as we would have to in an array. Recursively searching this tree would look something like:
7/9
The running time of searching a tree is O(log n), and inserting nodes while keeping the tree balanced is also O(log n). By spending a bit
more memory and time to maintain the tree, we’ve now gained faster searching compared to a plain linked list.
A data structure with almost a constant time search is a hash table, which is a combination of an array and a linked list. We have an array
of linked lists, and each linked list in the array has elements of a certain category. For example, in the real world we might have lots of
nametags, and we might sort them into 26 buckets, one labeled with each letter of the alphabet, so we can nd nametags by looking in
just one bucket.
We can implement this in a hash table with an array of 26 pointers, each of which points to a linked list for a letter of the alphabet:
Since we have random access with arrays, we can add elements quickly, and also index quickly into a bucket.
A bucket might have multiple matching values, so we’ll use a linked list to store all of them horizontally. (We call this a collision, when two
values match in some way.)
This is called a hash table because we use a hash function, which takes some input and maps it to a bucket it should go in. In our example,
the hash function is just looking at the rst letter of the name, so it might return 0 for “Albus” and 25 for “Zacharias”.
But in the worst case, all the names might start with the same letter, so we might end up with the equivalent of a single linked list again.
We might look at the rst two letters, and allocate enough buckets for 26*26 possible hashed values, or even the rst three letters, and
now we’ll need 26*26*26 buckets. But we could still have a worst case where all our values start with the same three characters, so the
running time for search is O(n). In practice, though, we can get closer to O(1) if we have about as many buckets as possible values,
especially if we have an ideal hash function, where we can sort our inputs into unique buckets.
We can use another data structure called a trie (pronounced like “try”, and is short for “retrieval”):
Imagine we want to store a dictionary of words ef ciently, and be able to access each one in constant time. A trie is like a tree, but
each node is an array. Each array will have each letter, A-Z, stored. For each word, the rst letter will point to an array, where the next
valid letter will point to another array, and so on, until we reach something indicating the end of a valid word. If our word isn’t in the
8/9
p y, , g g
trie, then one of the arrays won’t have a pointer or terminating character for our word. Now, even if our data structure has lots of
words, the lookup time will be just the length of the word we’re looking for, and this might be a xed maximum so we have O(1) for
searching and insertion. The cost for this, though, is 26 times as much memory as we need for each character.
There are even higher-level constructs, abstract data structures, where we use our building blocks of arrays, linked lists, hash tables, and
tries to implement a solution to some problem.
For example, one abstract data structure is a queue, where we want to be able to add values and remove values in a rst-in- rst-out (FIFO)
way. To add a value we might enqueue it, and to remove a value we would dequeue it. And we can implement this with an array that we
resize as we add items, or a linked list where we append values to the end.
An “opposite” data structure would be a stack, where items most recently added (pushed) are removed (popped) rst, in a last-in- rst-out
(LIFO) way. Our email inbox is a stack, where our most recent emails are at the top.
Another example is a dictionary, where we can map keys to values, or strings to values, and we can implement one with a hash table
where a word comes with some other information (like its de nition or meaning).
We take a look at “Jack Learns the Facts About Queues and Stacks” (https://www.youtube.com/watch?v=2wM6_PuBIxY), an animation about
these data structures.
9/9
This is CS50x
OpenCourseWare
Lecture 6
Python Basics
Examples
More features
Files
New features
Python Basics
Today we’ll learn a new programming language called Python, and remember that one of the overall goals of the course is not learning
any particular languages, but how to program in general.
Source code in Python looks a lot simpler than C, but is capable of solving problems in elds like data science. In fact, to print “hello,
world”, all we need to write is:
print("hello, world")
Notice that, unlike in C, we don’t need to import a standard library, declare a main function, specify a newline in the print function,
or use semicolons.
Python is an interpreted language, which means that we actually run another program (an interpreter) that reads our source code and runs
it top to bottom. For example, we can save the above as hello.py , and run the command python hello.py to run our code, without
having to compile it.
We can get strings from a user:
We create a variable called answer , without specifying the type (the interpreter determins that from context for us), and we can
easily combine two strings with the + operator before we pass it into print .
We can also pass in multiple arguments to print , with print("hello,", answer) , and it will automatically join them with spaces
for us too.
print also accepts format strings like f"hello, {answer}" , which substitutes variables inside curly braces into a string.
We can create variables with just counter = 0 . To increment a variable, we can use counter = counter + 1 or counter += 1 .
Conditions look like:
if x < y:
print("x is less than y")
elif x > y:
print("x is greater than y")
else:
print("x is equal to y")
Unlike in C and JavaScript (whereby braces { } are used to indicate blocks of code), the exact indentation of each line is what
determines the level of nesting in Python.
And instead of else if , we just say elif .
Boolean expressions are slightly different, too:
while True:
print("hello, world")
1/8
We can write a loop with a variable:
i = 3
while i > 0:
print("cough")
i -= 1
We can also use a for loop, where we can do something for each element in a list:
Lists in Python are like arrays in C, but they can grow and shrink easily with the interpreter managing the implementation and
memory for us.
This for loop will set the variable i to the rst element, 0 , run, then to the second element, 1 , run, and so on.
And we can use a special function, range , to get some number of values, as in for i in range(3) . This will give us 0 , 1 , and 2 ,
for a total of thee values.
In Python, there are many data types:
bool , True or False
float , real numbers
int , integers
str , strings
range , sequence of numbers
list , sequence of mutable values, that we can change or add or remove
tuple , sequence of immutable values, that we can’t change
dict , collection of key/value pairs, like a hash table
set , collection of unique values
docs.python.org (https://docs.python.org) is the of cial source of documentation, but Google and StackOver ow will also have helpful
resources when we need to gure out how to do something in Python. In fact, programmers in the real world rarely know everything in the
documentation, but rather how to nd what they need when they need it.
Examples
We can blur an image with:
before = Image.open("bridge.bmp")
after = before.filter(ImageFilter.BLUR)
after.save("out.bmp")
In Python, we include other libraries with import , and here we’ll import the Image and ImageFilter names from the PIL library.
It turns out, if we look for documention for the PIL library, we can use the next three lines of code to open an image called
bridge.bmp , run a blur lter on it, and save it to a le called out.bmp .
And we can run this with python blur.py after saving to a le called blur.py .
We can implement a dictionary with:
words = set()
def check(word):
if word.lower() in words:
return True
else:
return False
def load(dictionary):
file = open(dictionary, "r")
for line in file:
words.add(line.rstrip("\n"))
file.close()
return True
def size():
return len(words)
2/8
def unload():
return True
First, we create a new set called words . Then, for check , we can just ask ` if word.lower() in words . For load , we open the le
and use words.add to add each line to our set. For size , we can use len to count the number of elements in our set, and nally,
for unload , we don’t have to do anything!
It turns out, even though implementing a program in Python is simpler for us, the running time of our program in Python is slower than
our program in C since our interpreter has to do more work for us. So, depending on our goals, we’ll also have to consider the tradeoff of
human time of writing a program that’s more ef cient, versus the running time of the program.
In Python, we can too include the CS50 library, but our syntax will be:
x = get_int("x: ")
y = get_int("y: ")
if x < y:
print("x is less than y")
elif x > y:
print("x is greater than y")
else:
print("x is equal to y")
if s == "Y" or s == "y":
print("Agreed.")
elif s == "N" or s == "n":
print("Not agreed.")
print("cough")
print("cough")
print("cough")
We don’t need to declare a main function, so we just write the same line of code three times.
But we can do better:
for i in range(3):
cough()
Notice that we don’t need to specify the return type of a new function, which we can de ne with def .
But this causes an error when we try to run it: NameError: name 'cough' is not defined . It turns out that we need to de ne our
function before we use it, so we can either move our de nition of cough to the top, or create a main function:
def main():
for i in range(3):
cough()
def cough():
print("cough")
main()
Now, by the time we actually call our main function, the cough function will have been read by our interpreter.
Our functions can take inputs, too:
def main():
cough(3)
def cough(n):
for i in range(n):
print("cough")
main()
def main():
i = get_positive_int()
print(i)
def get_positive_int():
while True:
n = get_int("Positive Integer: ")
if n > 0:
break
return n
main()
Since there is no do-while loop in Python as there is in C, we have a while loop that will go on in nitely, but we use break to end
the loop as soon as n > 0 . Then, our function will just return n .
Notice that variables in Python have function scope by default, meaning that n can be initialized within a loop, but still be accessible
later in the function.
We can print out a row of question marks on the screen:
for i in range(4):
print("?", end="")
print()
When we print each block, we don’t want the automatic new line, so we can pass a parameter, or named argument, to the print
function. Here, we say end="" to specify that nothing should be printed at the end of our string. Then, after we print our row, we can
call print to get a new line.
We can also “multiply” a string and print that directly with: print("?" * 4) .
We can print a column with a loop:
for i in range(3):
print("#")
for i in range(3):
for j in range(3):
print("#", end="")
print()
We don’t need to use the get_string function from the CS50 library, since we can use the input function built into Python to get a
string from the user. But if we want another type of data, like an integer, from the user, we’ll need to cast it with int() .
But our program will crash if the string isn’t convertable to an integer, so we can use get_string which will just ask again.
In Python, trying to get an integer over ow actually won’t work:
i = 1
while True:
print(i)
sleep(1)
i *= 2
We call the sleep function to pause our program for a second between each iteration.
This will continue until the integer can no longer t in your computer’s memory.
Floating-point imprecision, too, can be prevented by libraries that can represent decimal numbers with as many bits as are needed.
We can make a list:
scores = []
scores.append(72)
scores.append(73)
scores.append(33)
With append , we can add items to our list, using it like a linked list.
We can also declare a list with some values like scores = [72, 73, 33] .
We can iterate over each character in a string:
s = get_string("Input: ")
print("Output: ", end="")
for c in s:
print(c, end="")
print()
More features
We can take command-line arguments with:
for i in range(len(argv)):
print(argv[i])
Since argv is a list of strings, we can use len() to get its length, and range() for a range of values that we can use as an index for
each element in the list.
But we can also let Python iterate over the list for us:
5/8
from sys import argv, exit
if len(argv) != 2:
print("missing command-line argument")
exit(1)
print(f"hello, {argv[1]}")
exit(0)
We import the exit function, and call it with the code we want our program to exit with.
We can implement linear search by just checking each element in a list:
import sys
if "EMMA" in names:
print("Found")
sys.exit(0)
print("Not found")
sys.exit(1)
If we have a dictionary, a set of key:value pairs, we can also check each key:
import sys
people = {
"EMMA": "617-555-0100",
"RODRIGO": "617-555-0101",
"BRIAN": "617-555-0102",
"DAVID": "617-555-0103"
}
if "EMMA" in people:
print(f"Found {people['EMMA']}")
sys.exit(0)
print("Not found")
sys.exit(1)
Notice that we can get the value of of a particular key in a dictionary with people['EMMA'] . Here, we use single quotes (both single
and double quotes are allowed, as long they match for a string) to differentiate the inner string from the outer string.
And we declare dictionaries with curly braces, {} , and lists with brackets [] .
In Python, we can compare strings directly with just == :
s = get_string("s: ")
t = get_string("t: ")
if s == t:
print("Same")
else:
print("Different")
Copying strings, too, works without any extra work from us:
s = get_string("s: ")
t = s
t = t.capitalize()
print(f"s: {s}")
print(f"t: {t}")
Swapping two variables can also be done by assigning both values at the same time:
x = 1
y = 2
Files
Let’s open a CSV le:
import csv
from cs50 import get_string
writer = csv.writer(file)
writer.writerow((name, number))
file.close()
It turns out that Python also has a csv package (library) that helps us work with CSV les, so after we open the le for appending, we
can call csv.writer to create a writer from the le and then writer.writerow to write a row. With the inner parentheses, we’re
creating a tuple with the values we want to write, so we’re actually passing in a single argument that has all the values for our row.
We can use the with keyword, which will helpfully close the le for us:
...
with open("phonebook.csv", "a") as file:
writer = csv.writer(file)
writer.writerow((name, number))
New features
A feature of Python that C does not have is regular expressions, or patterns against which we can match strings. For example, its syntax
includes:
. , for any character
.* , for 0 or more characters
.+ , for 1 or more characters
? , for something optional
^ , for start of input
$ , for end of input
For example, we can match strings with:
import re
from cs50 import get_string
if re.search("^y(es)?$", s, re.IGNORECASE):
print("Agreed.")
elif re.search("^no?$", s, re.IGNORECASE):
print("Not agreed.")
import speech_recognition
recognizer = speech_recognition.Recognizer()
with speech_recognition.Microphone() as source:
print("Say something!")
audio = recognizer.listen(source)
It turns out that there’s another library we can download, called speech_recognition , that can listen to audio and convert it to a
string.
And now, we can match on the audio to print something else:
...
words = recognizer.recognize_google(audio)
# Respond to speech
if "hello" in words:
print("Hello to you too!")
elif "how are you" in words:
print("I am well, thanks!")
elif "goodbye" in words:
print("Goodbye to you too!")
else:
print("Huh?")
...
words = recognizer.recognize_google(audio)
Here, we can get all the characters after my name is with .* , and print it out.
We run detect.py and faces.py (https://cdn.cs50.net/2019/fall/lectures/6/src6/6/faces/), which nds each face (or even a speci c face) in a
photo.
qr.py (https://cdn.cs50.net/2019/fall/lectures/6/src6/6/qr/) will also generate a QR code to a particular URL.
8/8
This is CS50x
OpenCourseWare
Lecture 7
Spreadsheets
SQL
IMDb
Multiple tables
Problems
Spreadsheets
Most of us are familiar with spreadsheets, rows of data, with each column in a row having a different piece of data that relate to each
other somehow.
A database is an application that can store data, and we can think of Google Sheets as one such application.
For example, we created a Google Form to ask students their favorite TV show and genre of it. We look thorugh the responses, and see
that the spreadsheet has three columns: “Timestamp”, “title”, and “genres”:
We can download a CSV le from the spreadsheet with “File > Download”, upload it to our IDE, and see that it’s a text le with comma-
separated values matching the spreadsheet’s data.
We’ll write favorites.py :
import csv
with open("CS50 2019 - Lecture 7 - Favorite TV Shows (Responses) - Form Responses 1.csv", "r") as file:
reader = csv.DictReader(file)
We’re just going to open the le and make sure we can get the title of each row.
Now we can use a dictionary to count the number of times we’ve seen each title, with the keys being the titles and the values for each key
an integer, tracking how many times we’ve seen that title:
1/9
import csv
counts = {}
with open("CS50 2019 - Lecture 7 - Favorite TV Shows (Responses) - Form Responses 1.csv", "r") as file:
reader = csv.DictReader(file)
def f(item):
return item[1]
We de ne a function, f , which just returns the value from the item in the dictionary with item[1] . The sorted function, in turn,
can use that as the key to sort the dictionary’s items. And we’ll also pass in reverse=True to sort from largest to smallest, instead of
smallest to largest.
We can actually de ne our function in the same line, with this syntax:
We pass in a lambda, or anonymous function, as the key, which takes in the item and returns item[1] .
Finally, we can make all the titles lowercase with title = row["title"].lower() , so our counts can be a little more accurate even if the
names weren’t typed in the exact same way.
SQL
We’ll look at a new program in our terminal window, sqlite3 , a command-line program that lets us use another language, SQL
(pronounced like “sequel”).
We’ll run some commands to create a new database called favorites.db and import our CSV le into a table called “favorites”:
~/ $ sqlite3 favorites.db
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
sqlite> .mode csv
sqlite> .import "CS50 2019 - Lecture 7 - Favorite TV Shows (Responses) - Form Responses 1.csv" favorites
We see a favorites.db in our IDE after we run this, and now we can use SQL to interact with our data:
We can even set the count of each title to a new variable, n , and order our results by that, in descending order. Then we can see the top
10 results with LIMIT 10 :
sqlite> SELECT title, COUNT(title) AS n FROM favorites GROUP BY title ORDER BY n DESC LIMIT 10;
title | n
The Office | 30
Friends | 20
Game of Thrones | 20
Breaking Bad | 14
Black Mirror | 9
Rick and Morty | 9
Brooklyn Nine-Nine | 5
Game of thrones | 5
No | 5
Prison Break | 5
SQL is a language that lets us work with a relational database, an application lets us store data and work with them more quickly than
with a CSV.
With .schema , we can see how the format for the table for our data is created:
sqlite> .schema
CREATE TABLE favorites(
"Timestamp" TEXT,
"title" TEXT,
"genres" TEXT
);
It turns out that, when working with data, we only need four operations:
CREATE
READ
UPDATE
DELETE
In SQL, the commands to perform each of these operations are:
INSERT
3/9
SELECT
UPDATE
DELETE
First, we’ll need to insert a table with the CREATE TABLE table (column type, ...); command.
SQL, too, has its own data types to optimize the amount of space used for storing data:
BLOB , for “binary large object”, raw binary data that might represent les
INTEGER
smallint
integer
bigint
NUMERIC
boolean
date
datetime
numeric(scale,precision) , which solves oating-point imprecision by using as many bits as needed, for each digit before and
after the decimal point
time
timestamp
REAL
real , for oating-point values
double precision , with more bits
TEXT
char(n) , for an exact number of characters
varchar(n) , for a variable number of characters, up to a certain limit
text
SQLite is one database application that supports SQL, and there are many companies with server applications that support SQL, includes
Oracle Database, MySQL, PostgreSQL, MariaDB, and Microsoft Access.
After inserting values, we can use functions to perform calculations, too:
AVG
COUNT
DISTINCT , for getting distinct values without duplicates
MAX
MIN
…
There are also other operations we can combine as needed:
WHERE , matching on some strict condition
LIKE , matching on substrings for text
LIMIT
GROUP BY
ORDER BY
JOIN , combining data from multiple tables
We can update data with UPDATE table SET column=value WHERE condition; , which could include 0, 1, or more rows depending on our
condition. For example, we might say UPDATE favorites SET title = "The Office" WHERE title LIKE "%office" , and that will set all the
rows with the title containing “of ce” to be “The Of ce” so we can make them consistent.
And we can remove matching rows with DELETE FROM table WHERE condition; , as in DELETE FROM favorites WHERE title = "Friends"; .
We can even delete an entire table altogether with another command, DROP .
IMDb
IMDb, or “Internet Movie Database”, has datasets available to download (https://www.imdb.com/interfaces/) as TSV, or tab-separate values,
les.
For example, we can download title.basics.tsv.gz , which will contain basic data about titles:
tconst , a unique identi er for each title, like tt4786824
titleType , the type of the title, like tvSeries
4/9
yp , yp ,
primaryTitle , the main title used, like The Crown
startYear , the year a title was released, like 2016
genres , a comma-separated list of genres, like Drama,History
We take a look at title.basics.tsv after we’ve unzipped it, and we see that the rst rows are indeed the headers we expected and each
row has values separated by tabs. But the le has more than 6 million rows, so even searching for one value takes a moment.
We’ll download the le into our IDE with wget , and then gunzip to unzip it. But our IDE doesn’t have enough space, so we’ll use our
Mac’s terminal instead.
We’ll write import.py to read the le in:
import csv
# Since the file is a TSV file, we can use the CSV reader and change
# the separator to a tab.
reader = csv.DictReader(titles, delimiter="\t")
# Create writer
writer = csv.writer(shows)
# If non-adult TV show
if row["titleType"] == "tvSeries" and row["isAdult"] == "0":
# Write row
writer.writerow([row["tconst"], row["primaryTitle"], row["startYear"], row["genres"]])
Now, we can open shows0.csv and see a smaller set of data. But it turns out, for some of the rows, startYear has a value of \N , and
that’s a special value from IMDb when they want to represent values that are missing. So we can lter out those values and convert the
startYear to an integer to lter for shows after 1970:
...
# If year not missing (We need to escape the backslash too)
if row["startYear"] != "\\N":
# If since 1970
if int(row["startYear"]) >= 1970:
# Write row
writer.writerow([row["tconst"], row["primaryTitle"], row["startYear"], row["genres"]])
import csv
# Create DictReader
reader = csv.DictReader(input)
We can run this program and see our results, but we can see how SQL can do a better job.
5/9
In Python, we can connect to a SQL database and read our le into it once, so we can make lots of queries without writing new programs
and without having to read the entire le each time.
Let’s do this more easily with the CS50 library:
import cs50
import csv
# Create DictReader
reader = csv.DictReader(titles, delimiter="\t")
# If non-adult TV show
if row["titleType"] == "tvSeries" and row["isAdult"] == "0":
# If since 1970
startYear = int(row["startYear"])
if startYear >= 1970:
Now we can run sqlite3 shows3.db and run commands like before, such as SELECT * FROM shows LIMIT 10; .
With SELECT COUNT(*) FROM shows; we can see that there are more than 150,000 shows in our table, and with SELECT COUNT(*) FROM
shows WHERE startYear = 2019; , we see that there were more than 6000 this year.
Multiple tables
But each of the rows will only have one column for genres, and the values are multiple genres put together. So we can go back to our
import program, and add another table:
6/9
import cs50
import csv
# Create database
open(f"shows4.db", "w").close()
db = cs50.SQL("sqlite:///shows4.db")
# Create tables
db.execute("CREATE TABLE shows (id INT, title TEXT, year NUMERIC, PRIMARY KEY(id))")
# The `genres` table will have a column called `show_id` that references
# the `shows` table above
db.execute("CREATE TABLE genres (show_id INT, genre TEXT, FOREIGN KEY(show_id) REFERENCES shows(id))")
# Create DictReader
reader = csv.DictReader(titles, delimiter="\t")
# If non-adult TV show
if row["titleType"] == "tvSeries" and row["isAdult"] == "0":
# If since 1970
startYear = int(row["startYear"])
if startYear >= 1970:
# Insert show
db.execute("INSERT INTO shows (id, title, year) VALUES(?, ?, ?)", id, row["primaryTitle"], startYear)
# Insert genres
if row["genres"] != "\\N":
for genre in row["genres"].split(","):
db.execute("INSERT INTO genres (show_id, genre) VALUES(?, ?)", id, genre)
So now our shows table no longer has a genres column, but instead we have a genres table with each row representing a show
and an associated genre. Now, a particular show can have multiple genres we can search for, and we can get other data about the
show from the shows table given its ID.
In fact, we can combine both tables with SELECT * FROM shows WHERE id IN (SELECT show_id FROM genres WHERE genre = "Comedy") AND
year = 2019; . We’re ltering our shows table by IDs where the ID in the genres table has a value of “Comedy” for the genre column,
and has the value of 2019 for the year column.
Our tables look like this:
Since the ID in the genre table come from the shows table, we call it show_id . And the arrow indicates that a single show ID might
have many matching rows in the genres table.
7/9
We see that some datasets from IMDb, like title.principals.tsv , have only IDs for certain columns that we’ll have to look up in other
tables.
By reading the descriptions for each table, we can see that all of the data can be used to construct these tables:
Notice that, for example, a person’s name could also be copied to the stars or writers tables, but instead only the person_id is
used to link to the data in the people table. This way, we only need to update the name in one place if we need to make a change.
We’ll open a database, shows.db , with these tables to look at some more examples.
We’ll download a program called DB Browser for SQLite (https://sqlitebrowser.org/dl/), which will have a graphical user interface to browse
our tables and data. We can use the “Execute SQL” tab to run SQL directly in the program, too.
We can run SELECT * FROM shows JOIN genres ON show.id = genres.show_id; to join two tables by matching IDs in columns we specify.
Then we’ll get back a wider table, with columns from each of those two tables.
We can take a person’s ID and nd them in shows with SELECT * FROM stars WHERE person_id = 1122; , but we can do a query inside our
query with SELECT show_id FROM stars WHERE person_id = (SELECT id FROM people WHERE name = "Ellen DeGeneres"); .
This gives us back the show_id , so to get the show data we can run: SELECT * FROM shows WHERE id IN (...); with ... being the
query above.
We can get the same results with:
We join the people table with the stars table, and then with the shows table by specifying columns that should match between
the tables, and then selecting just the title with a lter on the name.
But now we can select other elds from our combined tables, too.
It turns out that we can specify columns of our tables to be special types, such as:
PRIMARY KEY , used as the primary identi er for a row
FOREIGN KEY , which points to a row in another table
UNIQUE , which means it has to be unique in this table
INDEX , which asks our database to create a index to more quickly query based on this column. An index is a data structure like a tree,
which helps us search for values.
We can create an index with CREATE INDEX person_index ON stars (person_id); . Then the person_id column will have an index called
person_index . With the right indexes, our join query is several hundred times faster.
Problems
One problem with databases is race conditions, where the timing of two actions or events cause unexpected behavior.
For example, consider two roommates and a shared fridge in their dorm. The rst roommate comes home, and sees that there is no milk in
the fridge. So the rst roommate leaves to the store to buy milk, and while they are at the store, the second roommate comes home, sees
that there is no milk and leaves for another store to get milk Later there will be two jugs of milk in the fridge By leaving a note we can 8/9
that there is no milk, and leaves for another store to get milk. Later, there will be two jugs of milk in the fridge. By leaving a note, we can
solve this problem. We can even lock the fridge so that our roommate can’t check whether there is milk, until we’ve gotten back.
This can happen in our database if we have something like this:
First, we’re getting the number of likes on a post with a given ID. Then, we set the number of likes to that number plus one.
But now if we have two different web servers both trying to add a like, they might both set it to the same value instead of actually
adding one each time. For example, if there are 2 likes, both servers will check the number of likes, see that there are 2, and set the
value to 3. One of the likes will then be lost.
To solve this, we can use transactions, where a set of actions is guaranteed to happen together.
Another problem in SQL is called a SQL injection attack, where an adversary can execute their own commands on our database.
For example, someone might try type in [email protected]'-- as their email. If we have a SQL query that’s a formatted string (without
escaping, or substituting dangerous characters from, the input), such as f"SELECT * FROM users WHERE username = '{username}' AND
password = '{password}'" , then the query will end up being f"SELECT * FROM users WHERE username = '[email protected]'--' AND
password = '{password}'" , which will actually select the row where username = '[email protected]' and turn the rest of the line into a
comment. To prevent this, we should use ? placeholders for our SQL library to automatically escape inputs from the user.
9/9
This is CS50x
OpenCourseWare
Lecture 8
A Look Back
Privacy
A Look Back
Just a few weeks ago, 2/3rd of us had never taken a CS course before. We started with making programs in Scratch, struggled through
using C to write loops and eventually implementing more applicable algorithms, and nally took advantage of higher-level languages like
Python and its packages, and SQL, to solve even more interesting problems.
In week 0, we said:
what ultimately matters in this course is not so much where you end up relative to your classmates but where you end up relative to
yourself when you began
And now we can look back to see how far we’ve come.
Indeed, David’s own notes from when he took CS50 in 1996 includes concepts like algorithms, functions, and arguments.
To start solving problems with algorithms, we need to represent inputs and outputs. So we can use binary to represent data, whether that’s
numbers, letters, or pixels in images.
We demonstrate binary search in a phone book by dividing the book in half each time.
Precision and correctness are both critical in programming, since computers can’t infer “what we mean”. We demonstrate this with a
volunteer giving the audience instructions on how to draw an image. We see that abstractions (“draw a stick gure”) can be useful, but we
lose some precision when we use them.
Privacy
Computer science, in essence, is about the processing and storage of information. But we need to also consider not just what we can do,
but whether we should do it.
For example, we use passwords to protect many of our accounts and data, but the top 10 passwords are just:
1. 123456
2. 123456789
3. qwerty
4. password
5. 111111
6. 12345678
7. abc123
8. 1234567
9. password1
10. 12345
But unfortunately, even a more complex password can be quickly guessed by modern computers. We can write a program in just a few
minutes, that will generate all possible PINs and check them. We can even open a dictionary le that has all English words, and iterate
over each of them.
Cookies are small pieces of data that websites store on our computers when we visit them, useful for identifying us such that we don’t
have to log in on every visit, but can also be used for advertising and tracking purposes.
1/3
In Chrome, we can use View > Developer > Developer Tools to see the cookies that a particular site leaves under the “Network” tab:
And on other websites, where Google’s ads might be embedded, Google can track us there, too, with the same cookie.
And the request that our web browser sends to each site also includes a string called “user-agent”, which describes the version of the
browser we have.
On the internet, too, we have unique IP addresses that identify us so that we can receive responses from servers.
We also explored how we might recover “deleted” photos in a problem set, and services like Snapchat that promise to delete photos after
some time, may not actually remove the data.
In fact, a “soft delete” might set a value of “deleted” to be “true” to hide it from us, but the rest of the data is still stored.
Photos of ourselves on social media, too, can help someone else track us, what we do, and who we’re with.
In the Chrome’s Developer Tools again, we can run some code in a website that prompts us to share our location and then puts it on the
screen:
We’ll now have the opportunity to explore one of four tracks: web programming, mobile app development for either iOS or Android, and
game de elopment ith L a 2/3
game development with Lua.
With these new skills, we’ll be working on a nal project of our own design, solving a problem in the real world that we’re interested in.
We’ll have an overnight hackathon, focused on collaborating with classmates and staff on our nal projects.
Finally, we’ll have the CS50 Fair, where we’ll celebrate our nal projects to friends and visitors.
We give a big thanks to our staff, without whom this course would not be possible!
3/3
This is CS50x
OpenCourseWare
Android
What to Do
When to Do It
How to Do It
Introduction
We’ll learn to write mobile apps for Android with a new language, Java, and build three apps: one that loads data and displays it; one that
applies lters to images; one that lets you take notes and save them.
Lesson 1
1/7
We’ll use Android Studio, an IDE provided by Google to help us write Android apps. We’ll download and open it, and start a new project.
We’ll select the Empty Activity template for our app, and use JavaExample for our app name. A convention for the package name is the
domain name in reverse, plus the app name, like edu.harvard.cs50.javaexample . We’ll use Java and support Android 5.0 or above, so
most devices can use our app.
[2:20] Inside Android Studio we’ll see a lot of les that have been generated for us. We’ll want to rst create an AVD, or Android Virtual
Device, so we can run our app on our laptop instead of a separate device.
[4:00] We’ll take a look at the syntax for Java, which is similar to C and has familiar data types. We can initialize and change variables, have
conditions, arrays, for loops, Lists (like a dynamically-sized array),
We can create a List object with List<String> values = new ArrayList<>() , specifying that the type of data it will hold is String .
Java supports many different types of Lists, but we’ll just use ArrayList .
We can also iterate over the elements in our List with for (String value : values) { .
[9:40] Java has the concept of generics, a way for a type like List to understand the type inside, in this case String. Java also has maps, also
called dictionaries in Python or objects in JavaScript, which stores key-value pairs.
[12:45] Lists and Maps are examples of classes in Java, which we can think of as structs with functions attached to them. And we call
functions inside classes methods.
[15:25] We can add a method to our Person class, and use a variable in our method that the constructor in the class already saved.
[17:10] We can also have static methods, or methods we can call without an instance (one that we constructed) of our class.
[17:50] Another feature of classes is inheritance, where we can inherit elds and methods from a parent class, and optionally modify some
of them.
[20:40] Interfaces are like a list of methods that any class implementing them has to have. If a method is missing, the compiler will tell us.
It turns out that declaring a list with List<String> strings = new ArrayList<>() is actually using an interface, where List is the
interface, and ArrayList is the class that will implement the behaviors of a list. With new ArrayList<>() , we’re creating an instance of
the class.
[24:20] Java also has packages, which helps us organize and namespace our les. Like in Python, we also need to import certain packages
we want to use.
[26:10] We’ll come back now to Android Studio and look at the project we created. Inside the java folder, there’s our package with a
MainActivity.java le that has one class and a method, onCreate , that calls super.onCreate rst (which is the parent class’
implementation), and then calls setContentView , which we won’t worry about now.
[29:20] We can start by creating a new Java Class by right-clicking on our package folder on the left. We’ll name it Track, to represent the
tracks in our course, and now we can add elds and a constructor that saves its arguments.
[32:00] Back in our onCreate function, we’ll create a new List of Track s and add some tracks. We’ll create a list of strings with a static
method, Arrays.asList , to represent student names.
[35:40] Now we’ll use a map, of strings to tracks, to represent track assignments for students. We’ll iterate over every string in our list of
strings and use the Random class to get a random track for each of them.
[39:00] To print everything, we can iterate over the map, and use the Log class in Android Studio to print out a log. We’ll press the play
button on the top right, and the device shows the default “Hello, world”. But in Android Studio, on the bottom right we can click Logcat,
inside which we can see our logs.
[43:00] We’ll add getter methods to our Track class to return its elds, so they can be private to the class and so other code can’t change
2/7
them directly. Our getter might also have other logic.
Lesson 2
Now we’ll add UI to our Android app. The build system is called Gradle, which helps us by downloading libraries and compiling our code.
MVC, Model-View-Controller, is a general design pattern in which we separate our concerns, or types of code, into three categories. Models,
like the Track class we created in lesson 1, stored our data. The view takes care of displaying the data when it gets it. And nally, the
controller is the bridge between the model and the view, with logic deciding what data to pass to the view and when.
An Activity in Android is like a base class for each of our screens in the app, representing a single thing we’re trying to do. For example, in
a contacts app the rst activity might be the list of contacts we see, and the second activity is the view of a single contact.
Our app will also have Resources, non-Java code such as Layouts that describe how a view should look. Layouts are in a language called
XML, which looks similar to HTML, with tags and attributes. For example, we can specify a LinearLayout with a TextView .
[6:15] Another concept we’ll see is called Intents, that let us move from one activity to another. We’ll also have Recycler Views, which
displays a list of items that we can scroll through like a feed.
[8:15] We’ll create a new project again, an Empty Activity, to display a list of Pokemon and details about each of them. We’ll set up our
project and take a look at the generated les:
AndroidManifest.xml contains some con guration for our application, like an icon and the activities in our app.
In the java folder, we’ll have our MainActivity le, but also packages for test les.
In the res folder, we’ll see resources. In particular, the layout folder will have view for each activity, and if we open
activity_main.xml , and we can see a UI to drag and drop component. We can also click the Text tab at the bottom to see the source
XML. Important attributes include layout-width and layout_height , so we can choose to ll the entire screen or only some fraction
of the parent view.
In the values folder, we can also add constants like strings for translation.
[17:15] We’ll look at some Gradle scripts, which specify ags and dependencies that’s like a con guration le for the Java compiler. In the
dependencies section, we can add more libraries as we use them.
[19:15] Now we’ll come back to MainActivity and add a RecyclerView . Android’s developer documentation has a lot of details and
examples that we can learn from. We’ll add the library in the Gradle le, and click Sync in Android Studio to automatically download the
package. Then we’ll change our activity_main.xml layout le to use a RecyclerView instead of the TextView . We’ll add an identi er to
the view so we can reference it from our controller with android:id="@+id/..." .
[23:55] We also need to de ne what each row looks like, so we’ll need to create a new Layout resource le in the same folder, and create a
new LinearLayout . We’ll add a TextView inside, and give both IDs.
[26:20] Our view is ready, so we’ll create some classes for our models. First, we’ll create a Pokemon class with properties and a constructor
to save them.
[28:35] It turns out that a RecyclerView uses another class, an adapter, to control what data will be displayed, so we’ll create a new class
PokemonAdapter that extends the Adapter class. We’ll also need what’s called a view holder so we can modify the view and layout as
needed. In our adapter, we’ll de ne the PokedexViewHolder , which will have the generic row view, but also the LinearLayout and
TextView we added earlier inside each row. We’ll need a class R.id to get the unique ID for each of those views. Then we’ll override
f i h ld i d i h W ’ll l d 3/7
onCreateViewHolder , after our view holder is created, to create our view that represents a row. We’ll also need onBindViewHolder to set
the values of each row, given a position of the row.
[40:35] In our MainActivity le, we’ll add elds for our view and adapter, and also a LayoutManager . Now we can get the view in our
main layout and connect our adapter to it. We can run our project now, and see that each row takes up the whole screen, so we’ll change
the layout’s width and height to be wrap_content .
[44:00] We’ll create another activity, so we can display each Pokemon when they’re selected. We’ll create a new Activity > Empty Activity,
and generate a layout le. We’ll change the layout to a simpler LinearLayout , and add some TextView s inside, setting the text size and
padding.
[47:45] In our PokemonActivity view, we’ll set the values we get from an Intent onto those views. We can call
getIntent().getStringExtra() , built into AppCompatActivity , to get the variables passed into our view. We’ll need to pass along data in
our adapter in the onBindViewHolder method, with setTag on the view to set the current Pokemon object to our view holder. And we’ll
add an event listener, setOnClickListener , to get our Pokemon back from the tag and create an Intent we can pass to the next view with
startActivity.
[55:25] We can make our view nicer in the layout XML le with some more attributes like padding, built-in animations, and centered text.
And like printf in C, String.format in Java can take in a format string to give our number a certain number of digits.
Lesson 3
Now we’ll load data from the internet for our app.
We can use an API, application programming interface, to load data from the internet in our app. An API is like a set of code that someone
else has written, designed for you to use too.
In this case, we’ll be making requests to a website and getting data back in a format called JSON, JavaScript Object Notation.
An object in JSON might look like a dictionary of key-value pairs:
{
"course": "cs50",
"tracks": ["mobile", "web", "games"],
"year": 2019,
}
The values can be a string, an array, or a number as we see here, or some other data types.
[2:10] We’ll check out PokeAPI at pokeapi.co, and see that a URL we put in will return a lot of data in the format of JSON. The
documentation for the website has information about how we can get a list of data, so we try that.
[5:25] Android has a library called Volley for making requests, and we’ll include it in our build.gradle le so we can use it.
[6:35] We’ll also need to use a new pattern called try, catch, where a function that might fail or have an exception, can be “caught”, or
recovered from. When we catch an exception, we get an object of the Exception type, and we’ll be able to print out details of exactly
what happened.
[8:30] In our adapter, we’ll load our pokemon list with a new method, loadPokemon , and use the Volley library to make a
JsonObjectRequest . We’ll have an anonymous method that will be called once we get a response from the API, with a JSONObject that
4/7
we can parse into an array of pokemon . Since the response might not be what we expect, we’ll need to catch any exception we might get.
[14:40] Each result in the JSONArray will be a JSONObject , and similarly we can try to get the name and url from each of them and put
them in our Pokemon class.
[17:50] Once we’ve de ned our request, we also need a RequestQueue that has the Context of the app, so the Volley library can make
requests on behalf of our app properly. We’ll add our request to the queue, so we can actually load data.
[20:25] After we load our data, we need to refresh our RecyclerView with notifyDataSetChanged() from our adapter, and we also need to
add the permission to use the internet for our app in AndroidManifest.xml . We’ll x the capitalization of the name.
[24:40] We’ll use the url on the Pokemon data we got back to make another request when we want details about a particular Pokemon.
We’ll look at the view, and add two more TextView s to display the types of the Pokemon. In the Activity class for the Pokemon details
view, we’ll make another request, and parse the object in the response for the types array. Then, we can set the values of the all the
TextView s based on the data.
[33:30] The PokemonActivity class will need the url from our adapter, so our click listener can pass that in with the Intent object. And
our activity also needs to call our new load function, and actually make the request.
Lesson 4
We’ll build another app now, one that allows us to apply lters to images. We’ll create a new project with an Empty Activity in Android
Studio, and start by adding more views in our activity_main.xml layout. Since we’ll need to scroll, we’ll change the parent layout to a
ScrollView , and inside have a LinearLayout for our ImageView and a Button .
[4:40] In our activity, we’ll add the functionality for our button to load an image and display it. We’ll write a choosePhoto method, which
will call the built-in Android image gallery for selecting a photo. We’ll create an Intent and set the action and type. We’ll add a
requestCode , so we know how to handle the selected image in our app when the other activity nishes. Finally, we’ll have our Button
call the method when it’s clicked.
[10:15] In our activity, we’ll need to override the onActivityResult method to actually handle the data (image le) we get back. We’ll
make sure that the resultCode is okay, and that we got data back. Then, we’ll have some steps to get an image from the data object, by
getting the URI (like a URL), trying to open the le from it, and loading the le as a bitmap image.
[15:40] We’ll use the BitmapFactory to create our image object by decoding the le, and then close the le. Finally, we can set the image
on our ImageView to show what we’ve picked.
[18:05] We’ll use some third-party libraries, like glide-transformation , by adding them to our Gradle le by following their
documentation. And we need another library for some of the lters we want, so we’ll add that too.
[20:40] We’ll add a button to our view for applying a lter, and create methods in our activity by following the documentation. We’ll use
the example of loading an image into the Glide library, applying a transformation, and loading it into the ImageView .
[25:05] We’ll add two more in the same way, and factor out the common code, and just pass in different transformations depending on
which lter we want to apply. We also have to be careful with importing the right classes from the right packages. Now we can apply
different lters to our images.
L 5 5/7
Lesson 5
We’ll build a note-taking app that can save data to the device.
We’ll use SQLite, a simple database that saves data to a le but supports SQL queries.
We’ll need queries like:
CREATE TABLE
INSERT INTO
SELECT ... FROM
UPDATE ... SET
[4:25] We’ll open a new project again, with Empty Activity, and start by creating two views, one with a list of notes, and one for editing an
individual note. We’ll make a RecyclerView as before, and create an adapter to provide data for the view. We’ll also need another view for
each row, note_row , similar to our Pokemon app. Inside our adapter we’ll create a view holder to be able to set data on the views.
[11:05] We’ll look at the documentation for Android’s persistence (data storage) library, called Room. We’ll need to add the dependencies to
our Gradle le, including an annotationProcessor for our compiler to generate the library’s code.
[12:50] We’ll make a new model class, Note , with an id and contents . We’ll also add some annotations, like PrimaryKey and
ColumnInfo to specify how these elds will be stored in our database by the Room library. This is essentially the de nition of the table.
[14:55] In our adapter, we’ll override onBindViewHolder to get the contents of our note from the notes list, and getItemCount for the
total size of the list. In our main activity, we’ll connect our view, layout, and adapter with each other.
[18:15] Now that our view is ready, we’ll write a new class, DAO, for data access object, with the Room library, so we can actually load and
save note objects to our database. We’ll make a new class, NoteDao , which will actually be an interface that we annotate, and the Room
library will generate the actual code implementing these queries. For example, we’ll add an annotation, @Query , to the create() method
in our interface, without actually writing any code for it. Instead, we’ll write our SQL query in the annotation. Similarly, we can write a
method getAllNotes to return a list of notes. And in our save method’s query, we can easily use :contents and :id to safely
substitute variables into our query, avoiding SQL injection attacks. This class is how we’ll interact with our table.
[23:50] To use our DAO, we need a database class, and we’ll call it NoteDatabase . This will specify the database that our DAO can use. It
turns out that our database class is an abstract class, which means it has some methods that are implemented, and some methods that are
not, or abstract. Again, the Room library will generate the code that implements our abstract class.
[28:55] We can use all of this code in our main activity by rst connecting to this database and storing the connection as a public static
variable so all of our activities can share it. In our adapter, we can write a reload method to access the database and use the noteDao’s
getAllNotes method.
[32:35] And in our activity, after our view loads, we’ll run this reload method. We’ll also add a button to our layout with Google’s Android
Material UI package, and specify some attributes of it so it looks the way we want.
[37:00] We’ll add an onClickListener in our main activity to the button, which should create a new note, and reload the recycler view to
show it.
[38:35] Now we can create a NoteActivity to allow us to edit a speci c note, and we’ll use a EditText component in the layout to hold
the contents of our note. In our activity, we’ll want to load the contents of the note into the text editing container, and we’ll also save the
contents to the database when we go back to the recycler view in the app. We can load the note from the intent, and override onPause to
6/7
co te ts to t e database w e we go bac to t e ecycle v ew t e app. We ca load t e ote o t e te t, a d ove de o ause to
save the note when we leave the activity.
[42:45] In our recycler view’s adapter, we’ll set the onClickListener for each row’s container to create an Intent with the row’s note, and
pass it to our NoteActivity . Finally, when we come back to this view, we’ll also want to reload the notes by onResume .
[46:30] When we build and run our app, we see a crash, and the log tells us that our current note is null in the adapter, and it turns out
that we have to check for it after the view has been loaded, in the click event handler, not in the constructor.
[47:55] Finally, we’ll clean up the layout by adding padding and other aesthetics.
Conclusion
The Android documentation has lots of topics, so do use it to build even more interesting apps!
7/7
This is CS50x
OpenCourseWare
Pokédex
Distribution Code
To open the distribution code, extract the ZIP, open Android Studio, select “Import project”, and select the folder you extracted from the ZIP.
What To Do
Searching
Catching
Saving State
Sprites
Description
Searching
Let’s add some new functionality to our Pokédex app! First, let’s give users the ability to search the Pokédex for their favorite Pokémon.
To start, we’re going to use a built-in feature of Adapter called Filterable . This interface allows us to apply a lter to the data stored in our
Adapter , which is exactly what we need! We’ll lter out any Pokémon whose names don’t match the search text.
First, make sure that the adapter variable in MainActivity has the type PokedexAdapter , like this:
We’ll be calling methods that are speci c to our PokedexAdapter that don’t exist on the base Adapter class, so we need to use the
PokedexAdapter type.
Next, open up the PokedexAdapter class. We can specify that our PokedexAdapter implements Filterable by changing the class declaration
to:
Recall that an interface is just a list of methods that any class can implement. Now that we’ve implemented Filterable , we can add a new
method called getFilter to the PokedexAdapter .
@Override
public Filter getFilter() {
return new PokemonFilter();
}
Of course, we don’t have a class called PokemonFilter yet, so let’s create one! We can create this class inside of PokedexAdapter , just as we
did with PokedexViewHolder , like this:
1/5
private class PokemonFilter extends Filter {
@Override
protected FilterResults performFiltering(CharSequence constraint) {
// implement your search here!
}
@Override
protected void publishResults(CharSequence constraint, FilterResults results) {
}
}
You can implement your search inside performFiltering . The argument to this method, constraint , will be whatever text the user has typed
into the search bar, which you can use for your lter. The performFiltering method should return an instance of FilterResults . Here’s an
example:
@Override
protected FilterResults performFiltering(CharSequence constraint) {
// implement your search here!
FilterResults results = new FilterResults();
results.values = filteredPokemon; // you need to create this variable!
results.count = filteredPokemon.size();
return results
}
The instance of FilterResults that you return from performFiltering will then be passed to publishResults . Inside of publishResults ,
you probably want to store the results of the search in another class variable, so you don’t lose your copy of the list containing all Pokémon
(i.e., the pokemon variable). Assuming you call this variable List<Pokemon> filtered , then your implementation of publishResults might
look like this:
@Override
protected void publishResults(CharSequence constraint, FilterResults results) {
filtered = (List<Pokemon>) results.values;
notifyDataSetChanged();
}
Then, rather than using the pokemon variable inside of methods like onBindViewHolder and getItemCount , use your new filtered variable.
Now that the ltering logic is done, let’s add a search bar above our RecyclerView . On the left-hand side of Android Studio, expand the app
folder, and you should see a folder called res . Recall that this is where the XML les for our layouts are stored. Right click on res , then
select New > Android Resource Directory. Enter menu for both Directory name and Resource type , then press OK . You should now see a
new directory called menu underneath res .
Next, right click on that menu directory and select New > Menu resource le. Call this le main_menu.xml and then click OK . This new XML
le will contain the layout for our menu. Paste the below into that le:
<item android:id="@+id/action_search"
android:title="Search"
app:actionViewClass="androidx.appcompat.widget.SearchView"
app:showAsAction="always" />
</menu>
As you can see, we’re creating a new menu element with one item child. The item represents a search icon, that when pressed, will open up
a SearchView .
Now, we can wire up that SearchView to our MainActivity . First, we need to make MainActivity implement an interface called
SearchView.OnQueryTextListener . To tell Android that our main activity class implements SearchView.OnQueryTextListener , change the
declaration of the class to the below:
Next, to use the layout le we just created, we need to implement a method on our MainActivity called onCreateOptionsMenu .
2/5
@Override
public boolean onCreateOptionsMenu(Menu menu) {
getMenuInflater().inflate(R.menu.main_menu, menu);
MenuItem searchItem = menu.findItem(R.id.action_search);
SearchView searchView = (SearchView) searchItem.getActionView();
searchView.setOnQueryTextListener(this);
return true;
}
As you’d guess, this method is called when an activity is creating a menu. Let’s walk through this code line-by-line. First, we’re specifying that
this activity should use R.menu.main_menu , which is the name of the XML le we created. Then, we’re grabbing a reference to the item inside
our menu using its ID, action_search . Finally, we’re calling setOnQueryTextListener on the SearchView in order to specify that our search
code will be speci ed in our MainActivity class (which is what this references).
Now, our SearchView will automatically call methods on MainActivity when the user types text into the SearchView . Speci cally, a method
called onQueryTextChange will be called, and the argument passed to that method will be a String representing the current text of the
SearchView . We then want to pass that along to the PokemonFilter we created earlier, like this, so our UI will update:
@Override
public boolean onQueryTextChange(String newText) {
adapter.getFilter().filter(newText);
return false;
}
Along the same line, a method called onQueryTextSubmit will be called when the user presses the “submit” button on the keyboard, which you
can handle in the same way:
@Override
public boolean onQueryTextSubmit(String newText) {
adapter.getFilter().filter(newText);
return false;
}
At this point, everything should be wired up, so you can test out your new search functionality!
Catching
Any good Pokédex keeps track of which Pokémon have been caught and which haven’t. Let’s add that functionality to our Pokédex as well.
First, let’s add a new Button to the PokemonActivity . Open up the layout XML le, and then add a new <Button> element. You can set the
text of this button to whatever you’d like, but we’ll go with Catch for simplicity.
To handle taps on the Button , we can use the attribute android:onClick="toggleCatch" . Add that to your Button , and then a method called
public void toggleCatch(View view) will automatically be called whenever the user presses on the button.
Naturally, you’ll want to add that method to your PokemonActivity , like this:
Now, we can implement catching. To start, add a new boolean class variable that keeps track of whether or not the Pokémon is caught. If a
Pokémon is caught, change the text of the button to something like Release , and vice-versa when it’s released. The Button method
setText(String text) method will come in handy.
Saving State
You’ll notice that if you stop running your app and then run it again, your Pokédex will forget which Pokémon are caught and which aren’t! Let’s
x that by saving that state to disk.
As your last task, use the SharedPreferences class to save which Pokémon are caught. With this class, you can store state that will be
3/5
remembered each time your app launches, which is just what you need. How you store this state is up to you—you might consider storing a list
of all Pokémon that are caught, or you might consider using a map from Pokémon to boolean values.
Here’s an example:
getPreferences(Context.MODE_PRIVATE).edit().putString("course", "cs50").commit();
String course = getPreferences(Context.MODE_PRIVATE).getString("course", "cs50");
// course is equal to "cs50"
To test saving state, you should be able to catch a Pokémon, stop the simulator, start the simulator again, and still see that Pokémon as caught.
Sprites
Every Pokémon a cionado has noticed by now that our Pokédex doesn’t yet have arguably its most important feature: the ability to display
what each Pokémon looks like! Luckily for us, the API we chose contains links to images for each Pokémon.
Let’s add that functionality to our app. First, add a new ImageView to the layout for PokemonActivity . Give it a unique ID, and then create an
ImageView class variable inside of PokemonActivity , and use findViewById to map that variable to your layout.
Next, when parsing the response from the API call, take a look at the key called sprites . You’ll notice that it’s a dictionary, and the key
front_default contains a URL pointing to an image of a Pokémon. Use the value of that key to load in an image to your ImageView . You’ll
want to follow a similar pattern as before—use methods like getJSONObject and getString to parse the JSON strings into Java objects.
Once you have the URL of the image, you’ll need to download it from the Internet. To do so, we’ll use an Android built-in called AsyncTask . An
AsyncTask executes some code in the background, so your app doesn’t lock up as the image is downloading. To use an AsyncTask , create a
new class that looks like this:
@Override
protected void onPostExecute(Bitmap bitmap) {
// load the bitmap into the ImageView!
}
}
Let’s walk through this. On the rst line, we’re specifying that our AsyncTask takes a String as input, and will return a Bitmap . That makes
sense, since we’ll be passing in a URL as a String , and we expect a Bitmap object, which represents an image, in exchange. The
doInBackground method is where we’ll put the logic to actually download an image. You’ll notice that this method actually takes an array of
strings, but we only need to download one, so we’re just taking the rst element in that array with strings[0] .
After doInBackground completes, the method called onPostExecute will be called. The Bitmap argument that’s passed in represents a loaded
image, so load that into your ImageView using the method setImageBitmap .
Finally, you can use this new class to trigger a download of a string URL with:
You can test your code by selecting Pokémon from the list, and you should see images in the ImageView !
Description
Let’s add one last feature to our Pokédex: a description of each Pokémon. From the API documentation, we can see that we can use
/api/v2/pokemon-species/{id} to retrieve a description for a given Pokémon: https://pokeapi.co/docs/v2.html#pokemon-species
(https://pokeapi.co/docs/v2.html#pokemon-species). For instance, the URL https://pokeapi.co/api/v2/pokemon-species/133/ will give you
4/5
( p //p p / / p p ) , p p p p p p g y
the description text for everyone’s favorite Pokémon.
Speci cally, what we’re looking for can be found in the key called flavor_text_entries . This key happens to contain entries for several
different languages, but we’re just concerned with English for now. You might need a few additional structs to model the data for these new
keys.
After a user selects a Pokémon from the list, make a separate API call to this second endpoint to retrieve the description of the selected
Pokémon. Filter for just the rst English description, and then display it somewhere on the screen. (Some Pokémon have more than one English
description, and it suf ces to just display the rst one.) You’ll probably want to wire up a new TextView to display this nal piece of data.
You should see a few sentences about each Pokémon after selecting it from the list!
How to Submit
To submit your code with submit50 , you may either: (1) upload your code to CS50 IDE and run submit50 from inside of your IDE, or (2) install
submit50 on your own computer by running pip3 install submit50 (assuming you have Python 3 (https://www.python.org/downloads/)
installed).
Execute the below, logging in with your GitHub username and password when prompted. For security, you’ll see asterisks ( * ) instead of the
actual characters in your password.
submit50 cs50/problems/2020/x/tracks/android/pokedex
5/5
This is CS50x
OpenCourseWare
Fiftygram
Distribution Code
To open the distribution code, extract the ZIP, open Android Studio, select “Import project”, and select the folder you extracted from the ZIP.
What To Do
More Filters
Saving Photos
More Filters
We’ve added a few different lters together, but now try experimenting with your own! Add at least one new lter of your choosing to the app.
Be creative!
Saving Photos
Our app can apply lters to photos, but it would be nice if we could save those photos so we could post them elsewhere!
First, some bookkeeping. Android has a pretty strict permissions model, so your app will need to request permission to store a photo to the
user’s device. Different versions of Android handle these permissions differently, so for simplicity’s sake, make sure your app has a minimum
SDK version of 23. To set the minimum SDK version, open up build.gradle , and make sure you have:
minSdkVersion 23
If you don’t, just change the number next to minSdkVersion , and then click Sync now !
<uses-permission
android:name="android.permission.WRITE_EXTERNAL_STORAGE"
tools:remove="android:maxSdkVersion" />
This element tells Android that our app will need permission to write to external storage.
Finally, we need to actually request permission from the app. For this, we’ll implement an interface called
ActivityCompat.OnRequestPermissionsResultCallback like this:
Then, we can request permissions when the app loads by adding the following to onCreate :
This should pop-up a dialog that allows the user to allow or deny the permission. You can check the result of that dialog by adding the below
method:
@Override
public void onRequestPermissionsResult(int requestCode, String[] permissions, int[] grantResults) {
super.onRequestPermissionsResult(requestCode, permissions, grantResults);
}
That’s it for bookkeeping, so let’s implement our save functionality now! Add a new Button to the layout, and use android:onClick to wire it
up to a method in your MainActivity . Inside of that method, you’ll want to get a Bitmap of the modi ed image, and then use
MediaStore.Images.Media.insertImage to save the le.
To test, you can open up the Photos app in the emulator, and you should see ltered photos saved there.
How to Submit
To submit your code with submit50 , you may either: (1) upload your code to CS50 IDE and run submit50 from inside of your IDE, or (2) install
submit50 on your own computer by running pip3 install submit50 (assuming you have Python 3 (https://www.python.org/downloads/)
installed).
Execute the below, logging in with your GitHub username and password when prompted. For security, you’ll see asterisks ( * ) instead of the
actual characters in your password.
submit50 cs50/problems/2020/x/tracks/android/fiftygram
2/2
This is CS50x
OpenCourseWare
Notes
Distribution Code
To open the distribution code, extract the ZIP, open Android Studio, select “Import project”, and select the folder you extracted from the ZIP.
What To Do
Deleting Notes
Deleting Notes
So far, our Notes app can add and edit notes. Let’s add the ability for a user to delete a note when they no longer need it.
First, add a new method called delete to the NoteDao interface. You’ll probably want this method to take an id of the note to delete, and
use a DELETE query.
Next, add a button to your layout for deleting notes. Exactly what the UI looks like is up to you! (If you’re feeling ambitious, you can try
implementing a UI that allows a user to swipe on a note from the list to delete it, much like many email apps on Android.)
Finally, wire up that UI to a method that calls your new delete method on the NoteDao . Depending on your UI, you might nd the finish
method helpful—this method will dismiss the current activity and go back to the previous one.
To test your app, try creating a few notes and then deleting them, to make sure the right things get deleted!
How to Submit
To submit your code with submit50 , you may either: (1) upload your code to CS50 IDE and run submit50 from inside of your IDE, or (2) install
submit50 on your own computer by running pip3 install submit50 (assuming you have Python 3 (https://www.python.org/downloads/)
installed).
Execute the below, logging in with your GitHub username and password when prompted. For security, you’ll see asterisks ( * ) instead of the
actual characters in your password.
submit50 cs50/problems/2020/x/tracks/android/notes
1/2
2/2
This is CS50x
OpenCourseWare
Final Project
The climax of this course is its nal project. The nal project is your opportunity to take your newfound savvy with programming out for a spin
and develop your very own piece of software. So long as your project draws upon this course’s lessons, the nature of your project is entirely up
to you. You may implement your project in any language(s). You are welcome to utilize infrastructure other than the CS50 IDE. All that we ask is
that you build something of interest to you, that you solve an actual problem, that you impact your community, or that you change the world.
Strive to create something that outlives this course.
Inasmuch as software development is rarely a one-person effort, you are allowed an opportunity to collaborate with one or two classmates for
this nal project. Needless to say, it is expected that every student in any such group contribute equally to the design and implementation of
that group’s project. Moreover, it is expected that the scope of a two- or three-person group’s project be, respectively, twice or thrice that of a
typical one-person project. A one-person project, mind you, should entail more time and effort than is required by each of the course’s problem
sets.
Ideas
a web-based application using JavaScript, Python, and SQL, based in part on the web track’s distribution code
an iOS app using Swift
a game using Lua with LÖVE
an Android app using Java
a Chrome extension using JavaScript
a command-line program using C
a hardware-based application for which you program some device
…
How to Submit
Step 1 of 2
Create a README.md text le that explains your project and save it in a new folder called project in your ~/ directory. Note that your project
source code itself does not need to be submitted, but this README.md le must.
Execute the below from within your ~/project directory, logging in with your GitHub username and password when prompted. For security,
you’ll see asterisks instead of the actual characters in your password.
submit50 cs50/problems/2020/x/project
Step 2 of 2
Submit a short video (that’s no more than 2 minutes in length) in which you present your project to the world, as with slides, screenshots,
voiceover, and/or live action. Your video should somehow include your project’s title, your name, your city and country, and any other details
that you’d like to convey to viewers. See https://www.howtogeek.com/205742/how-to-record-your-windows-mac-linux-android-or-ios-screen/
for tips on how to make a “screencast,” though you’re welcome to use an actual camera. Upload your video to YouTube (or, if blocked in your
country, a similar site) and take note of its URL; it’s ne to ag it as “unlisted,” but don’t ag it as “private.”
1/2
That’s it! Your project should be graded within a few minutes. If you don’t see any results in your gradebook, best to resubmit (running the
above submit50 command) with only your README.md le this time. No need to resubmit your form.
2/2