EECS 484 W11 Project 1 - Database Design For Social Network Data
EECS 484 W11 Project 1 - Database Design For Social Network Data
Due:January26,2011by10:30AM Overview
InProject1,youwilldesignarelationaldatabaseforstoringinformationaboutyourFacebooksocial network. Youwillbeginwithadetaileddescriptionofthecontent. Then,youwillneedto systematicallygothroughtheconceptualandlogicaldatabasedesignprocessyoulearnedaboutinclass.
Part1.ERDesign
Asastartingpoint,wehavedonetheinitialrequirementsanalysisforyou. Thefollowingisabrief descriptionofthedatathatyouwillstoreinyourdatabase. (Inreallife,youwouldprobablybeginwith muchfuzzierinformation.) UserInformation Therecanbeanunlimitednumberofusers.Eachuserhasthefollowinginformation: 1.1 Profile information Thisincludesthefollowingattributes:UserID,firstname,lastname,yearofbirth,monthof birth,dateofbirth,gender. 1.2 HometownLocation Ausershometownincludesthefollowingattributes:city,state,country. 1.3 CurrentLocation Exactlythesameattributesashometownlocation. 1.4 EducationHistory Educationhistoryisforcollegeprogramsandabove.Ausercouldhaveparticipatedin multipleeducationalprograms,andeachonewillhavethefollowingattributes:nameofthe institution(e.g.,UniversityofMichigan),yearofgraduation,concentration(e.g.,CS,EE, etc.),anddegree(e.g.,BS,MS,PhD,etc.). 1.5 Friendshipinformation Eachusercanhaveanynumberoffriends.EachfriendmustalsobeaFacebookuser. Photos PhotosisanimportantFacebookapplication.Eachphotohasthefollowingassociatedinformation: 2.1 Albuminformation EachphotoMUSTbelongtoanalbum.Analbumhasthefollowingattributes: album_ID,owner_ID(thisreferstotheownersFacebookID),album_name,cover_photo_ID 1
(thisreferstoaphotoID),album_created_time,album_modified_time,album_linkand album_visibility. 2.2 Otherinformation Eachphotohasthefollowingattributes:photo_ID,photo_caption,photo_created_time, photo_modified_time,andphoto_link. PhotoTags AphototagidentifiesaFacebookuserinaphoto.Ithasthefollowingassociatedattributes: 3.1 Tagsubject tag_subject_id(thisreferstoaFacebookuserID) 3.2 Tagcoordinates tag_x_coordinateandtag_y_coordinate 3.3 Timecreated tag_created_time Notethattherecanbemultipletagsatexactlythesame(x,y)location.However,therecanbeonlyONE tagforeachsubjectinthephoto;Facebookdoesntallowmultipletagsforthesamesubjectinasingle photo.Forexample,youcannottagLadyGagatwiceinaphoto,evenifsheappearsattwodifferent locations(whateverthatmeans). Events EventsisanotherusefulFacebookfeature. 4.1 Basiceventinformation event_ID,event_creator_id(Facebookuserwhocreatedtheevent),event_name, event_tagline,event_description,event_host,event_type,event_subtype, event_location,event_city,event_state,event_country,event_start_time,and event_end_time 4.2 Eventparticipants ParticipantsinaneventmustbeFacebookusers.Eachparticipantmusthavea confirmationstatusvalue(attending,declined,unsure,ornotreplied).
TaskforPart1
YourtaskinPart1istoperformConceptualDatabaseDesignusingERDiagrams. (TherearemanyER variants,butforthisproject,weexpectyoutousetheconventionsfromthetextbookandlecture.)
HintsforPart1
Forthispart,youneedtoidentifytheentitysetsandrelationshipsetsinareasonableway. Weexpect theretobemultiplecorrectsolutions. (RememberthatERdesignissubjective!)Yourgoalshouldbeto 2
Part2.ERDiagramtoRelationalSchema(LogicalDatabaseDesign)
Forthesecondpartoftheproject,yourtaskistoconvertyourERdiagramsintorelationaltables. You arerequiredtowriteSQLDDLstatementsforthispart.Youshouldturnintwofiles: 1. createTables.sql 2. dropTables.sql
HintsforPart2
YoushouldcaptureasmanyconstraintsfromyourERdiagramsaspossibleinyourcreateTables.sqlfile. InyourdropTables.sql,youshouldwritetheDROPTABLEstatementsnecessarytodestroythetables youhavecreated. UsingOracleSQL*Plus,youcanrunyour.sqlfileswiththefollowingcommands: sqlplus<accountName>/<password>@dropTables.sql sqlplus<accountName>/<password>@createTables.sql
Part3.GettingFacebookData
Thenext(andpotentiallymostfun!)partoftheprojectisactuallypopulatingyourdatabase. Forthis part,wewillprovideyouwithsomefakedata. Wewillalsoprovideyouwiththemeanstodownload datafromyourownFacebookaccountintoanOracledatabase.Thefollowingdescribeshowtologinto yourOracleaccount,howtoaccessthefakedata,andfinallyhowtodownloadyourowndata. LoggingintoyourOracleAccount First,connecttologin.engin.umich.eduusingSSHwithyourUmichaccount(Umichuniqnameand Kerberospassword). Thenexecute:
source/usr/caen/oracle/local/muscle sqlplus 3
AndentertheusernameandpasswordforyourOracleaccounttologin. TodisconnectfromOracleyoucanexecute:
EXIT
Trythisearly! IfyouhavetroubleaccessingyourOracleaccount,pleasespeaktotheGSI. RealdatafromyourFacebookaccount(optional) YoucangetyourrealFacebookdatausingthefollowinglink: http://apps.facebook.com/eecs_facebookdata/ YouwillbepromptedtologintoFacebook.Onceyouareloggedin,youneedtofirstentertheuser nameandpasswordofyourOracleaccount,whichisusedlatertopopulatethedataintoyouraccount. IfthemessageFirstloginFacebook.Ifyouareloggedinbuthavenotauthorizedtheapplication,please followthislinkshowsupatthispoint,youprobablyneedtograntpermissiontotheapplicationto accessyourdata.Followthelinkinthemessage.ThenclickontheFacebookicontograntthe applicationtherequiredpermissions. Ifthisisyourfirsttimerunningtheapplication,thedownloadprocesswillstartautomatically.Ifyou haveruntheapplicationbefore,youwillseetheresult.YoucanusetheRerunbuttonattheendofthe pagetoremovetheexistingdataanddownloadthedataagain. ThedatawillbestoredintoyourpersonalOraclespace.YoucanlogintoSQL*Plus(asdescribedinPart 3.1)tobrowsethedata. *NOTE:WehavenoticedsomeproblemsrunningthedatadownloadapplicationusingtheChrome browser. IfyouareusingChromeandhavingproblems,pleasetryanotherbrowser(FirefoxorIE)until willgetthisproblemfixed. Herearesomebasiccommandstobrowseyourdata. Viewalltheexistingtables: SELECTTABLE_NAMEFROMUSER_TABLES; Viewtheschemaofatable: DESCTABLE_NAME; Browseallthedatainatable: SELECT*FROMTABLE_NAME; Browsethefirstnrowsinatable: SELECT*FROMTABLE_NAMEWHEREROWNUM<N;
Facebookdatarawschema Foryourconvenience,wewillprovideyouwithFacebookdata(realandfake)inasetofOracle databasetables.Thesetablesactuallygiveyousomehintsonthepreviouspartsoftheassignment. However,thesetablesarehighlydenormalized(poorlydesigned),andwithoutanytableconstraints. Thetablenamesare: USER_INFORMATION ARE_FRIENDS PHOTO_INFORMATION TAG_INFORMATION EVENT_INFORMATION Thefieldsofthosetablesareasfollows: USER_INFORMATIONtable: 1. USER_ID ThisistheFacebookuniqueIDforusers 2. FIRST_NAME EveryuserMUSThaveafirstnameonfile 3. LAST_NAME EveryuserMUSThavealastnameonfile 4. YEAR_OF_BIRTH Someusermaynotprovidethisinformation 5. MONTH_OF_BIRTH Someusermaynotprovidethisinformation 6. DAY_OF_BIRTH Someusermaynotprovidethisinformation 7. GENDER Someusermaynotprovidethisinformation 5
8. HOMETOWN_CITY Someusermaynotprovidethisinformation 9. HOMETOWN_STATE Someusermaynotprovidethisinformation 10. HOMETOWN_COUNTRY Someusermaynotprovidethisinformation 11. CURRENT_CITY Someusermaynotprovidethisinformation 12. CURRENT_STATE Someusermaynotprovidethisinformation 13. CURRENT_COUNTRY Someusermaynotprovidethisinformation 14. INSTITUTION_NAME Someusermaynotprovidethisinformation. Asinglepersonmayhavestudiedinmultipleinstitutions(collegeandabove). 15. PROGRAM_YEAR Someusermaynotprovidethisinformation. Asinglepersonmayhaveenrolledinmultipleprograms. 16. PROGRAM_CONCENTRATION Someusermaynotprovidethisinformation. Thisislikeashortdescriptionoftheprogram. 17. PROGRAM_DEGREE Someusermaynotprovidethisinformation. ARE_FRIENDStable 1. USER1_ID 2. USER2_ID BothUSER1_IDandUSER2_IDrefertothevaluesintheUSER_IDfieldofthe USER_INFORMATIONtable.Iftwousersappearinthesamerow(arelation),itmeanstheyare friends,otherwisetheyarenotfriends. PHOTO_INFORMATIONtable 1. ALBUM_ID ALBUM_IDistheFacebookuniqueIDforalbums. 2. OWNER_ID UserIDofthealbumowner. 3. COVER_PHOTO_ID EachalbumMUSThaveacoverphoto.ThevaluesaretheFacebookuniqueIDsforphotos. 4. ALBUM_NAME 5. ALBUM_CREATED_TIME 6. ALBUM_MODIFIED_TIME 6
7. ALBUM_LINK TheURLdirectlytothealbum 8. ALBUM_VISIBILITY Oneofthefollowingvalues:EVERYONE,FRIENDS_OF_FRIENDS,FRIENDS,ONLY_ME, CUSTOM 9. PHOTO_ID ThisistheFacebookuniqueIDforphotos. 10. PHOTO_CAPTION Anarbitrarystringdescribingthephoto. 11. PHOTO_CREATED_TIME 12. PHOTO_MODIFIED_TIME 13. PHOTO_LINK TheURLdirectlytothephoto TAG_INFORMATIONtable 1. PHOTO_ID UniqueIdofthecorrespondingphoto 2. TAG_SUBJECT_ID UniqueIdofthecorrespondinguser 3. TAG_CREATED_TIME 4. TAG_X_COORDINATE 5. TAG_Y_COORDINATE EVENT_INFORMATIONtable 1. EVENT_ID ThisistheFacebookuniqueIDforevents. 2. EVENT_CREATOR_ID UniqueIdoftheuserwhocreatedthisevent 3. EVENT_NAME 4. EVENT_TAGLINE 5. EVENT_DESCRIPTION 6. EVENT_HOST 7. EVENT_TYPE Facebookhasafixedsetofeventtypestochoosefromadropdownmenu. 8. EVENT_SUBTYPE Facebookhasafixedsetofeventsubtypestochoosefromadropdownmenu. 9. EVENT_LOCATION Userenteredarbitrarystring.Forexample,mybackyard. 10. EVENT_CITY 11. EVENT_STATE 12. EVENT_COUNTRY 7
13. EVENT_START_TIME 14. EVENT_END_TIME Fakedata(guaranteed) WhetheryouhaveaFacebookaccountornot,everyonewillhaveaccesstoafakedataset. Thefake dataalsoincludesfivetableswithexactlythesameschemaasthoseusedtostoreyourrealFacebook data(see3.3). However,thesetablesarestoredintheGSIsaccount(heedokim),andhaveaprefixin thetablenames: PUBLIC_USER_INFORMATION PUBLIC_ARE_FRIENDS PUBLIC_PHOTO_INFORMATION PUBLIC_TAG_INFORMATION PUBLIC_EVENT_INFORMATION YoucanaccessthepublictablesforthefakedatausingGSIsaccountname(HEEDOKIM). Forexample, toaccessthePUBLIC_USER_INFORMATIONtable,youneedtorefertothetablenameas HEEDOKIM.PUBLIC_USER_INFORMATION.Youcancopythedataintoyourownaccountwiththe followingcommand: CREATETABLENEW_TABLE_NAMEAS(SELECT*FROMHEEDOKIM.TABLE_NAME);
Part4.PopulateYourDatabase
Forthefinalpartoftheproject,youwillpopulateyourdatabasewiththeFacebookdatawejust described. YoushouldturninthesetofSQLstatements(DML)toloaddatafromthepublictables (e.g.,PUBLIC_USER_INFORMATION,etc.)intoyourtables.Youshouldputallthestatementsintoafile calledloadData.sql.
HintsforPart4
Therewillbesomevariationsdependingontheschemathatyouchoose. Inmostcases,however,you canloadthedataintoyourschemausingverysimpleSQLcommands. Asanexample,suppose(whetherornotitisagooddesign)thatyoucreatedatableLOCATION,which containstheattributesLOC_ID,CITY,STATE,andCOUNTRY. Supposethatyouwantthistableto containalistingofallthedifferentlocations,withoutduplicates. Youmightloaddataintothetable usingthefollowingcommand(UNIONeliminatesduplicates):
INSERTINTOLOCATION(CITY,STATE,COUNTRY) SELECTDISTINCTHOMETOWN_CITY,HOMETOWN_STATE,HOMETOWN_COUNTRYFROM PUBLIC_USER_INFORMATION UNION SELECTDISTINCTCURRENT_CITY,CURRENT_STATE,CURRENT_COUNTRYFROMPUBLIC_USER_INFORMATION UNION SELECTDISTINCTEVENT_CITY,EVENT_STATE,EVENT_COUNTRYFROMPUBLIC_EVENT_INFORMATION;
CREATE TRIGGER loc_trigger BEFORE INSERT ON LOCATION FOR EACH ROW BEGIN SELECT loc_sequence.nextval into :new.LOC_ID from dual; END; . RUN;
Project1SubmissionChecklist
YouneedtoturninthefollowingfilesviaCToolsAssignments: (Pleaseputallyourfilesinasingleziportarfileandsubmitasinglefile) 1. AWord(docordocx)orPDFdocumentthatcontainsyourERDiagramfromPart1.(Ifyoulike, youmaydrawyourERdiagrambyhand,andsubmitanelectronicversionbyscanningthe drawing.) 2. 3SQLfiles a. createTables.sql(Part2) b. dropTables.sql(Part2) c. loadData.sql(Part4)