|
1 | 1 | { |
2 | 2 | "cells": [ |
3 | 3 | { |
4 | | - "attachments": {}, |
5 | 4 | "cell_type": "markdown", |
6 | 5 | "metadata": {}, |
7 | 6 | "source": [ |
|
17 | 16 | ] |
18 | 17 | }, |
19 | 18 | { |
20 | | - "attachments": {}, |
21 | 19 | "cell_type": "markdown", |
22 | 20 | "metadata": {}, |
23 | 21 | "source": [ |
24 | 22 | "***" |
25 | 23 | ] |
26 | 24 | }, |
27 | 25 | { |
28 | | - "attachments": {}, |
29 | 26 | "cell_type": "markdown", |
30 | 27 | "metadata": {}, |
31 | 28 | "source": [ |
|
34 | 31 | }, |
35 | 32 | { |
36 | 33 | "cell_type": "code", |
37 | | - "execution_count": 5, |
| 34 | + "execution_count": 1, |
38 | 35 | "metadata": {}, |
39 | 36 | "outputs": [ |
40 | 37 | { |
|
67 | 64 | ] |
68 | 65 | }, |
69 | 66 | { |
70 | | - "attachments": {}, |
71 | 67 | "cell_type": "markdown", |
72 | 68 | "metadata": {}, |
73 | 69 | "source": [ |
74 | 70 | "***" |
75 | 71 | ] |
76 | 72 | }, |
77 | 73 | { |
78 | | - "attachments": {}, |
79 | 74 | "cell_type": "markdown", |
80 | 75 | "metadata": {}, |
81 | 76 | "source": [ |
82 | | - "You will solve the following exercises using Pure Python.\n", |
83 | | - "1. Count words in a text\n", |
| 77 | + "#### You will solve the following exercises using Pure Python.\n", |
| 78 | + "1. Count words in a text \n", |
84 | 79 | "2. Sort a list of words in various ways \n", |
85 | | - " • ascii order \n", |
86 | | - " • \"rhyming\" order \n", |
87 | | - "3. Extract useful info from a dictionary\n", |
88 | | - "4. Compute ngram statistics\n", |
89 | | - "5. Make a Concordance" |
| 80 | + " • ascii order \n", |
| 81 | + " • \"rhyming\" order \n", |
| 82 | + "3. Extract useful info for a dictionary \n", |
| 83 | + "4. Compute ngram statistics \n", |
| 84 | + "5. Make a Concordance " |
| 85 | + ] |
| 86 | + }, |
| 87 | + { |
| 88 | + "cell_type": "markdown", |
| 89 | + "metadata": {}, |
| 90 | + "source": [ |
| 91 | + "***" |
90 | 92 | ] |
91 | 93 | }, |
92 | 94 | { |
93 | | - "attachments": {}, |
94 | 95 | "cell_type": "markdown", |
95 | 96 | "metadata": {}, |
96 | 97 | "source": [ |
|
105 | 106 | }, |
106 | 107 | { |
107 | 108 | "cell_type": "code", |
108 | | - "execution_count": null, |
| 109 | + "execution_count": 2, |
109 | 110 | "metadata": {}, |
110 | 111 | "outputs": [], |
111 | 112 | "source": [ |
|
114 | 115 | }, |
115 | 116 | { |
116 | 117 | "cell_type": "code", |
117 | | - "execution_count": null, |
| 118 | + "execution_count": 3, |
118 | 119 | "metadata": {}, |
119 | 120 | "outputs": [], |
120 | 121 | "source": [ |
|
123 | 124 | }, |
124 | 125 | { |
125 | 126 | "cell_type": "code", |
126 | | - "execution_count": null, |
| 127 | + "execution_count": 4, |
127 | 128 | "metadata": {}, |
128 | 129 | "outputs": [], |
129 | 130 | "source": [ |
|
132 | 133 | }, |
133 | 134 | { |
134 | 135 | "cell_type": "code", |
135 | | - "execution_count": null, |
| 136 | + "execution_count": 5, |
136 | 137 | "metadata": {}, |
137 | 138 | "outputs": [], |
138 | 139 | "source": [ |
139 | 140 | "# d)" |
140 | 141 | ] |
141 | 142 | }, |
142 | 143 | { |
143 | | - "attachments": {}, |
144 | 144 | "cell_type": "markdown", |
145 | 145 | "metadata": {}, |
146 | 146 | "source": [ |
147 | 147 | "##### 2. Sorting and reversing lines of text\n", |
148 | 148 | "\n", |
149 | | - "a. Sort each line ignoring case\n", |
150 | | - "• sort –n Numeric order\n", |
151 | | - "• sort –r Reverse sort\n", |
152 | | - "• sort –nr Reverse numeric sort" |
| 149 | + "a. Sort each line alphabetically ignoring case \n", |
| 150 | + "b. sort in numeric ([ascii](https://python-reference.readthedocs.io/en/latest/docs/str/ASCII.html)) order \n", |
| 151 | + "c. Alphabetically reverse sort (ignoring case) \n", |
| 152 | + "d. Reverse numeric ([ascii](https://python-reference.readthedocs.io/en/latest/docs/str/ASCII.html)) sort " |
| 153 | + ] |
| 154 | + }, |
| 155 | + { |
| 156 | + "cell_type": "code", |
| 157 | + "execution_count": 6, |
| 158 | + "metadata": {}, |
| 159 | + "outputs": [], |
| 160 | + "source": [ |
| 161 | + "# a)" |
| 162 | + ] |
| 163 | + }, |
| 164 | + { |
| 165 | + "cell_type": "code", |
| 166 | + "execution_count": 7, |
| 167 | + "metadata": {}, |
| 168 | + "outputs": [], |
| 169 | + "source": [ |
| 170 | + "# b)" |
| 171 | + ] |
| 172 | + }, |
| 173 | + { |
| 174 | + "cell_type": "code", |
| 175 | + "execution_count": 8, |
| 176 | + "metadata": {}, |
| 177 | + "outputs": [], |
| 178 | + "source": [ |
| 179 | + "# c)" |
| 180 | + ] |
| 181 | + }, |
| 182 | + { |
| 183 | + "cell_type": "code", |
| 184 | + "execution_count": 9, |
| 185 | + "metadata": {}, |
| 186 | + "outputs": [], |
| 187 | + "source": [ |
| 188 | + "# d)" |
| 189 | + ] |
| 190 | + }, |
| 191 | + { |
| 192 | + "cell_type": "markdown", |
| 193 | + "metadata": {}, |
| 194 | + "source": [ |
| 195 | + "##### 3. Sorting and reversing lines of text\n", |
| 196 | + "\n", |
| 197 | + "a. Find the 50 most common words \n", |
| 198 | + "b. Find the words in the NYT that end in \"zz\" " |
| 199 | + ] |
| 200 | + }, |
| 201 | + { |
| 202 | + "cell_type": "code", |
| 203 | + "execution_count": 10, |
| 204 | + "metadata": {}, |
| 205 | + "outputs": [], |
| 206 | + "source": [ |
| 207 | + "# a)" |
| 208 | + ] |
| 209 | + }, |
| 210 | + { |
| 211 | + "cell_type": "code", |
| 212 | + "execution_count": 11, |
| 213 | + "metadata": {}, |
| 214 | + "outputs": [], |
| 215 | + "source": [ |
| 216 | + "# b)" |
| 217 | + ] |
| 218 | + }, |
| 219 | + { |
| 220 | + "cell_type": "markdown", |
| 221 | + "metadata": {}, |
| 222 | + "source": [ |
| 223 | + "##### 4. Compute ngrams and other statistics\n", |
| 224 | + "\n", |
| 225 | + "a. Find the 10 most common bigrams \n", |
| 226 | + "b. Find the 10 most common trigrams \n", |
| 227 | + "c. Count the lines, the words, and the characters\n", |
| 228 | + "d. How many all uppercase words are there in this NYT file?\n", |
| 229 | + "e, How many 4-letter words?\n", |
| 230 | + "f. How many different words are there with no vowels\n", |
| 231 | + "g. What subtypes do they belong to?\n", |
| 232 | + "h. How many “1 syllable” words are there" |
| 233 | + ] |
| 234 | + }, |
| 235 | + { |
| 236 | + "cell_type": "code", |
| 237 | + "execution_count": 12, |
| 238 | + "metadata": {}, |
| 239 | + "outputs": [], |
| 240 | + "source": [ |
| 241 | + "# a)" |
| 242 | + ] |
| 243 | + }, |
| 244 | + { |
| 245 | + "cell_type": "code", |
| 246 | + "execution_count": 13, |
| 247 | + "metadata": {}, |
| 248 | + "outputs": [], |
| 249 | + "source": [ |
| 250 | + "# b)" |
| 251 | + ] |
| 252 | + }, |
| 253 | + { |
| 254 | + "cell_type": "code", |
| 255 | + "execution_count": 14, |
| 256 | + "metadata": {}, |
| 257 | + "outputs": [], |
| 258 | + "source": [ |
| 259 | + "# c)" |
| 260 | + ] |
| 261 | + }, |
| 262 | + { |
| 263 | + "cell_type": "code", |
| 264 | + "execution_count": 15, |
| 265 | + "metadata": {}, |
| 266 | + "outputs": [], |
| 267 | + "source": [ |
| 268 | + "# d)" |
| 269 | + ] |
| 270 | + }, |
| 271 | + { |
| 272 | + "cell_type": "code", |
| 273 | + "execution_count": 16, |
| 274 | + "metadata": {}, |
| 275 | + "outputs": [], |
| 276 | + "source": [ |
| 277 | + "# e)" |
| 278 | + ] |
| 279 | + }, |
| 280 | + { |
| 281 | + "cell_type": "code", |
| 282 | + "execution_count": 17, |
| 283 | + "metadata": {}, |
| 284 | + "outputs": [], |
| 285 | + "source": [ |
| 286 | + "# f)" |
| 287 | + ] |
| 288 | + }, |
| 289 | + { |
| 290 | + "cell_type": "code", |
| 291 | + "execution_count": 18, |
| 292 | + "metadata": {}, |
| 293 | + "outputs": [], |
| 294 | + "source": [ |
| 295 | + "# g)" |
| 296 | + ] |
| 297 | + }, |
| 298 | + { |
| 299 | + "cell_type": "code", |
| 300 | + "execution_count": 19, |
| 301 | + "metadata": {}, |
| 302 | + "outputs": [], |
| 303 | + "source": [ |
| 304 | + "# h)" |
| 305 | + ] |
| 306 | + }, |
| 307 | + { |
| 308 | + "cell_type": "markdown", |
| 309 | + "metadata": {}, |
| 310 | + "source": [ |
| 311 | + "##### 5. Make a Concordance\n", |
| 312 | + "\n", |
| 313 | + "a. Create a concordance display for an arbitrary word. See the example below \n", |
| 314 | + "\n", |
| 315 | + "" |
| 316 | + ] |
| 317 | + }, |
| 318 | + { |
| 319 | + "cell_type": "code", |
| 320 | + "execution_count": null, |
| 321 | + "metadata": {}, |
| 322 | + "outputs": [], |
| 323 | + "source": [ |
| 324 | + "# a)" |
| 325 | + ] |
| 326 | + }, |
| 327 | + { |
| 328 | + "cell_type": "markdown", |
| 329 | + "metadata": {}, |
| 330 | + "source": [ |
| 331 | + "***" |
| 332 | + ] |
| 333 | + }, |
| 334 | + { |
| 335 | + "cell_type": "markdown", |
| 336 | + "metadata": {}, |
| 337 | + "source": [ |
| 338 | + "##### Extra Credit – Secret Message\n", |
| 339 | + "+ The answers to the extra credit exercises will reveal a secret message. \n", |
| 340 | + "+ We will be working with the following text file for these exercises: \n", |
| 341 | + "[Link to Text](https://web.stanford.edu/class/cs124/lec/secret_ec.txt) " |
| 342 | + ] |
| 343 | + }, |
| 344 | + { |
| 345 | + "cell_type": "markdown", |
| 346 | + "metadata": {}, |
| 347 | + "source": [ |
| 348 | + "##### Extra Credit Exercise 1\n", |
| 349 | + "• Find the 2 most common words in secret_ec.txt containing the letter e. \n", |
| 350 | + "• Your answer will correspond to the first two words of the secret message. " |
| 351 | + ] |
| 352 | + }, |
| 353 | + { |
| 354 | + "cell_type": "code", |
| 355 | + "execution_count": null, |
| 356 | + "metadata": {}, |
| 357 | + "outputs": [], |
| 358 | + "source": [] |
| 359 | + }, |
| 360 | + { |
| 361 | + "cell_type": "markdown", |
| 362 | + "metadata": {}, |
| 363 | + "source": [ |
| 364 | + "##### Extra Credit Exercise 2\n", |
| 365 | + "• Find the 2 most common bigrams in secret_ec.txt where the second word in the bigram ends with a consonant. \n", |
| 366 | + "• Your answer will correspond to the next four words of the secret message. " |
| 367 | + ] |
| 368 | + }, |
| 369 | + { |
| 370 | + "cell_type": "code", |
| 371 | + "execution_count": null, |
| 372 | + "metadata": {}, |
| 373 | + "outputs": [], |
| 374 | + "source": [] |
| 375 | + }, |
| 376 | + { |
| 377 | + "cell_type": "markdown", |
| 378 | + "metadata": {}, |
| 379 | + "source": [ |
| 380 | + "##### Extra Credit Exercise 3\n", |
| 381 | + "• Find all 5-letter-long words that only appear once in secret_ec.txt. \n", |
| 382 | + "• Concatenate your result. This will be the final word of the secret message. \n", |
| 383 | + "\n", |
| 384 | + "What is the secret message? " |
153 | 385 | ] |
| 386 | + }, |
| 387 | + { |
| 388 | + "cell_type": "code", |
| 389 | + "execution_count": null, |
| 390 | + "metadata": {}, |
| 391 | + "outputs": [], |
| 392 | + "source": [] |
154 | 393 | } |
155 | 394 | ], |
156 | 395 | "metadata": { |
157 | 396 | "kernelspec": { |
158 | | - "display_name": "Python 3", |
| 397 | + "display_name": "Python 3 (ipykernel)", |
159 | 398 | "language": "python", |
160 | 399 | "name": "python3" |
161 | 400 | }, |
|
169 | 408 | "name": "python", |
170 | 409 | "nbconvert_exporter": "python", |
171 | 410 | "pygments_lexer": "ipython3", |
172 | | - "version": "3.10.10" |
173 | | - }, |
174 | | - "orig_nbformat": 4 |
| 411 | + "version": "3.10.6" |
| 412 | + } |
175 | 413 | }, |
176 | 414 | "nbformat": 4, |
177 | | - "nbformat_minor": 2 |
| 415 | + "nbformat_minor": 4 |
178 | 416 | } |
0 commit comments