久久久久久精品无码人妻_青春草无码精品视频在线观_无码精品国产VA在线观看_国产色无码专区在线观看

代做IEMS 5730、代寫 c++,Java 程序設(shè)計

時間:2024-03-11  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標(biāo)簽:

掃一掃在手機打開當(dāng)前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關(guān)信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045

    久久久久久精品无码人妻_青春草无码精品视频在线观_无码精品国产VA在线观看_国产色无码专区在线观看

    久无码久无码av无码| 国产精品亚洲二区在线观看| 欧美 日韩 激情| 国内外成人免费在线视频| 国产日韩欧美大片| 91色国产在线| 国产美女在线一区| 青青草原播放器| 免费无码不卡视频在线观看| 在线播放免费视频| 欧美日韩亚洲一| 公共露出暴露狂另类av| 精品久久久久久中文字幕2017| 青青视频免费在线| 99热久久这里只有精品| 少妇高潮喷水久久久久久久久久| 特黄特黄一级片| 国产九九在线视频| 国产精品久久中文字幕| 一本—道久久a久久精品蜜桃| 成年人在线看片| 久久久久久www| 影音先锋男人的网站| 在线免费av播放| 国内外成人免费激情视频| 天堂а√在线中文在线| www.51色.com| 日本高清久久久| 91看片就是不一样| 欧美精品卡一卡二| 青青视频免费在线| 色黄视频免费看| 亚洲18在线看污www麻豆| 99视频免费播放| 国产特级黄色大片| 国产毛片久久久久久国产毛片| 污视频在线观看免费网站| 亚洲国产精品三区| 久草青青在线观看| 免费毛片小视频| 香港三级韩国三级日本三级| 青青草国产免费| 国产91视频一区| 午夜啪啪福利视频| 国产九九九视频| 激情久久综合网| 一区二区免费av| 国产精品v日韩精品v在线观看| 欧美一级黄色影院| 美女喷白浆视频| 国产免费999| 妞干网在线免费视频| 国产二区视频在线播放| 1024精品视频| 成人在线观看a| 成人免费观看毛片| 黄色一级大片在线观看| 日韩福利视频在线| 三上悠亚在线一区二区| 久久精品国产露脸对白| 色综合色综合色综合色综合| 欧美一级特黄a| www.五月天色| 欧美少妇一区二区三区| 99热久久这里只有精品| 乱妇乱女熟妇熟女网站| 无码无遮挡又大又爽又黄的视频| 欧美精品成人网| 亚洲人视频在线| 男人天堂成人网| 亚洲熟妇无码一区二区三区| 欧美性大战久久久久xxx| 日韩中文字幕组| 亚洲娇小娇小娇小| 一区中文字幕在线观看| 日韩国产成人无码av毛片| 欧美不卡在线播放| 欧美高清中文字幕| 精品视频在线观看一区| 三级4级全黄60分钟| 亚洲天堂国产视频| 国产专区在线视频| 116极品美女午夜一级| 九色porny91| av电影一区二区三区| 国产在线播放观看| 国产视频一区二区视频| 国产精品中文久久久久久| 欧美国产日韩激情| 久久这里只精品| 日韩精品一区二区三区四| 激情五月宗合网| 精品久久久99| 国产成人艳妇aa视频在线| 国产又大又硬又粗| 色噜噜狠狠一区二区三区狼国成人| 男人日女人的bb| 噼里啪啦国语在线观看免费版高清版| 日韩av片免费观看| 国产美女网站在线观看| 91高清国产视频| 日韩视频在线视频| 五月天视频在线观看| www.射射射| 亚洲网中文字幕| 欧美性大战久久久久xxx| 午夜天堂在线视频| 国产精品333| 欧美a级黄色大片| 999在线免费视频| 91网站在线观看免费| 手机看片福利盒子久久| 久久综合久久久久| 午夜免费看视频| 无码人妻丰满熟妇区96| 7777在线视频| 国产视频手机在线播放| 2018中文字幕第一页| 国产美女视频免费看| 久久久久久久久久久久久国产精品 | 欧美在线观看www| 一级片黄色免费| 男女曰b免费视频| 亚洲精品天堂成人片av在线播放| 欧美日韩在线视频一区二区三区| 午夜精品中文字幕| 国产亚洲天堂网| av 日韩 人妻 黑人 综合 无码| mm131亚洲精品| 久久久久久久久久久久久国产精品| 在线观看三级网站| 欧美大片久久久| 日日噜噜噜噜久久久精品毛片| 人妻无码久久一区二区三区免费| 日本成人xxx| 污视频免费在线观看网站| www.射射射| 一二三四中文字幕| 亚洲综合20p| 亚洲最大综合网| 任你操这里只有精品| 国产老熟妇精品观看| 日b视频免费观看| 欧美日韩午夜爽爽| 日日夜夜精品视频免费观看| 一区二区三区视频网| 日韩中文字幕免费在线| 91视频 -- 69xx| 黄色一级视频片| 国产在线精品91| 日本国产在线播放| 成 年 人 黄 色 大 片大 全| 人妻互换免费中文字幕| 91制片厂免费观看| 一级片免费在线观看视频| 亚洲欧美手机在线| 色综合五月婷婷| 亚洲av毛片在线观看| 日韩欧美中文视频| 日本美女久久久| 国产av第一区| 日本福利视频网站| 日韩黄色片在线| av日韩一区二区三区| xxxx18hd亚洲hd捆绑| 免费国产黄色网址| av动漫在线看| 黄色一级免费大片| 久久人人爽av| 天天干天天曰天天操| 99精品一级欧美片免费播放| 国产一区一区三区| 国产人妻人伦精品| 国产精品www在线观看| 国产乱子伦农村叉叉叉| 妺妺窝人体色www在线小说| 久久无码高潮喷水| 欧美精品性生活| 亚洲视频一二三四| 97超碰人人爱| 一本久道高清无码视频| 成年人视频观看| av网站在线不卡| 午夜探花在线观看| 一本久道高清无码视频| 免费观看日韩毛片| 国产精品视频分类| 小说区视频区图片区| 大陆极品少妇内射aaaaaa| 僵尸世界大战2 在线播放| 妺妺窝人体色www在线小说| 三级在线视频观看| 黄色免费高清视频| 青青青免费在线| 久热精品在线播放| av 日韩 人妻 黑人 综合 无码| 国产毛片视频网站| 色一情一区二区| 国产成人亚洲综合无码| 日本www在线播放|