23 KiB
This README is still not complete.
Homework 9 — TikTok Trends
In this assignment you will develop a program to display the trends page like TikTok does, let's call this program New York Trends. Please read the entire handout before starting to code the assignment.
Learning Objectives
- Practice using std::priority_queue.
- Practice using std::unordered_map, std::unordered_set.
- Practice using C++ exceptions.
Background
TikTok Discover
According to TikTok support: Discover is a page on TikTok that allows you to search and explore the wide variety of content in the TikTok community. In this feed you'll find trending videos, hashtags, creators, and sponsored content.
To access the Discover page via the mobile app, users just tap Discover, located at the bottom of phone screen.
To access the Discover page via your web browser, just go to https://www.tiktok.com/discover.
As can be seen from the above screenshot (taken on November 19th, 2023), on the Discover page, two lists of videos are displayed: trending hashtags (on the left) and trending sounds (on the right). And displaying these two lists of videos is the main task of this assignment.
Special Requirements
To be added.
Supported Commands
Your program will be run like this:
nytrends.exe input.json output.txt hashtag
nytrends.exe input.json output.txt sound
Here:
- nytrends.exe is the executable file name.
- input.json contains data collected from TikTok. In this README we will refer to this file as the json file.
- output.txt is where to print your output to. In this README we will refer to this file as the output file.
- this field will be either hashtag or sound. When this field is hashtag, your program should display the top 10 trending hashtags to the output file. When this field is sound, your program should display the top 10 trending sounds to the output file.
To summerize what your program does: your program reads data from the json file, analyze the data and find out the top 10 trending hashtags, or the top 10 trending sounds, and display them in the output file.
Format of input.json
input.json represents the json file. It stores posts we collected from TikTok. Each line of the json file represents one post, and each line is supposed to have the same format. And below is an example, which describes a post by Taylor Swift. (You can view her post here.)
{"id": "7301080543981096234", "text": "Never beating the sorcery allegations ✨🛬✨", "createTime": 1699915303, "createTimeISO": "2023-11-13T22:41:43.000Z", "authorMeta": {"id": "6881290705605477381", "name": "taylorswift", "nickName": "Taylor Swift", "verified": true, "signature": "This is pretty much just a cat account", "bioLink": "taylorswift.com", "avatar": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/13f2a0d585f3cd8578da0d18c36a18c4~c5_720x720.jpeg?x-expires=1700456400&x-signature=jkLwlnqFUpLwoYe6TvlGXZs%2FhP8%3D", "privateAccount": false, "region": "US", "following": 0, "fans": 22900000, "heart": 200400000, "video": 61, "digg": 2161}, "musicMeta": {"musicName": "original sound", "musicAuthor": "Taylor Swift", "musicOriginal": false, "playUrl": "https://v16-webapp-prime.us.tiktok.com/video/tos/useast5/tos-useast5-v-27dcd7-tx/o8fSJqV9lISAU8D0pBUFsRYEMSDGWxCKpgfSii/?a=1988&ch=0&cr=0&dr=0&er=0&lr=default&cd=0%7C0%7C0%7C0&br=250&bt=125&bti=ODszNWYuMDE6&ft=tlc-I-Inz7TfiVYZiyq8Z&mime_type=audio_mpeg&qs=6&rc=OTM0NTc4N2Y8NTxmZWZoOkBpank3bnQ5cmRkbzMzZzU8NEAzMzEzNl82XzExYTQxNTU0YSNeXjYyMmRjYDZgLS1kMS9zcw%3D%3D&btag=e00008000&expire=1700307894&l=202311180544290984F2C815B65729734D&ply_type=3&policy=3&signature=86fdf07638903cf00e885b900b5fe456&tk=0", "coverMediumUrl": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/13f2a0d585f3cd8578da0d18c36a18c4~c5_720x720.jpeg?x-expires=1700456400&x-signature=jkLwlnqFUpLwoYe6TvlGXZs%2FhP8%3D", "musicId": "7301080633693735726"}, "webVideoUrl": "https://www.tiktok.com/@taylorswift/video/7301080543981096234", "videoMeta": {"height": 576, "width": 1024, "duration": 24, "coverUrl": "https://p16-sign.tiktokcdn-us.com/obj/tos-useast5-p-0068-tx/06fe558eb09e460b8dd87c852dab1d64_1699915304?x-expires=1700456400&x-signature=e%2BxReps37YechC%2FN3YDMa5MW4Bs%3D", "definition": "540p", "format": "mp4", "downloadAddr": "https://v16-webapp-prime.us.tiktok.com/video/tos/useast5/tos-useast5-pve-0068-tx/o4ISEQDQRpSUArDlMF5QfSPe8WrE0EDgSwqjBk/?a=1988&ch=0&cr=3&dr=0&lr=tiktok_m&cd=0%7C0%7C1%7C3&cv=1&br=2176&bt=1088&bti=ODszNWYuMDE6&cs=0&ds=3&ft=_rKBMBnZq8Zmoc_CKQ_vjFy.VAhLrus&mime_type=video_mp4&qs=0&rc=OTM6Z2k8NDZpO2hlNWg6OUBpM2xlOm85cmdkbzMzZzczNEBeYDQwMi5fNV8xNDU0NDMuYSNyLWZnMmQ0XzZgLS1kMS9zcw%3D%3D&btag=e00008000&expire=1700307894&l=202311180544290984F2C815B65729734D&ply_type=2&policy=2&signature=13889ecbdab6dd7518b441cb427600c9&tk=tt_chain_token"}, "diggCount": 2400000, "shareCount": 19900, "playCount": 9700000, "commentCount": 22900, "mentions": [], "hashtags": []}
The line is enclosed with a pair of curly braces. And every line is supposed to have these same fields:
- id: TikTok assigns each post an id.
- text: each post has its text content and its video/audio content. The text content is stored here. Keep in mind that on TikTok, a post can't just include text information, it must contain a video. Therefore, in the remainder of this section, when we say the video or this video, we mean the video which comes with this post.
- createTime: a timestamp indicating when this post was created. This is the timestamp in Unix epoch format. It represents the number of seconds that have passed since January 1, 1970 (the Unix epoch) until the specified date and time.
- createTimeISO: still a timestamp indicating when this post was created. This is the same timestamp but presented in the ISO 8601 date and time format, which is more human friendly. Here, "T" is a separator indicating the beginning of the time portion; and "Z" indicates that the time is in Coordinated Universal Time (UTC).
- authorMeta: the author's information, which includes multiple items.
- musicMeta: information of the music used in the video. This also includes multiple items.
- webVideoUrl: the URL of this post.
- videoMeta: information of the video. This also includes multiple items.
- diggCount: how many likes this video has received.
- shareCount: how many times this video has been shared.
- playCount: how many times this video has been viewed.
- commentCount: how many comments users have made as a reaction to this video.
- mentions: whom the author of this post has mentioned in the post. This could include multiple items - if multiple users are mentioned.
- hashtags: the hashtags used in the text content of the post are also stored here separately. This could include multiple items - if multiple hashtags are used.
Each field is a key-value pair. As mentioned above, there are five fields which could include multiple items, and these five fields are: authorMeta, musicMeta, videoMeta, mentions, hashtags. We will describe each of these five fields next.
Author Meta
The word meta means meta data. Let's extract the authorMeta field from this same Taylor Swift post and take a closer look.
"authorMeta": {"id": "6881290705605477381", "name": "taylorswift", "nickName": "Taylor Swift", "verified": true, "signature": "This is pretty much just a cat account", "bioLink": "taylorswift.com", "avatar": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/13f2a0d585f3cd8578da0d18c36a18c4~c5_720x720.jpeg?x-expires=1700456400&x-signature=jkLwlnqFUpLwoYe6TvlGXZs%2FhP8%3D", "privateAccount": false, "region": "US", "following": 0, "fans": 22900000, "heart": 200400000, "video": 61, "digg": 2161},
TikTok uses the following sub-fields to describe each author (i.e., user):
- id: TikTok assigns each author an id.
- name: the user name. Not necessarily the real name; but of course celebrities would use their real name for their official account.
- nickName: each user can also have nick name.
- verified: official accounts are usually verified.
- signature: users can put a few words introducin this account.
- bioLink: users can put a link in their bio section.
- avatar: link to the account's profile picture.
- privateAccount: is this a private account? Private accounts are only visible to users who have the permission from the account owner.
- region: where this user is located.
- following: how many accounts this user is following. Taylor Swift does not follow anyone. Hence her following is 0.
- fans: how many followers this account has.
- heart: how many likes (in total) this account received.
- video: how many videos this account has posted.
- digg: how many likes this user has pressed.
Some of these sub-fields (such as name, nickName, verified, signature, bioLink, avatar, following, fans, heart) are directly visible on Taylor Swift's TikTok profile page, as shown in this following screenshot, taken on November 19th, 2023.
Music Meta
Let's extract the musicMeta field from this same Taylor Swift post and take a closer look.
"musicMeta": {"musicName": "original sound", "musicAuthor": "Taylor Swift", "musicOriginal": false, "playUrl": "https://v16-webapp-prime.us.tiktok.com/video/tos/useast5/tos-useast5-v-27dcd7-tx/o8fSJqV9lISAU8D0pBUFsRYEMSDGWxCKpgfSii/?a=1988&ch=0&cr=0&dr=0&er=0&lr=default&cd=0%7C0%7C0%7C0&br=250&bt=125&bti=ODszNWYuMDE6&ft=tlc-I-Inz7TfiVYZiyq8Z&mime_type=audio_mpeg&qs=6&rc=OTM0NTc4N2Y8NTxmZWZoOkBpank3bnQ5cmRkbzMzZzU8NEAzMzEzNl82XzExYTQxNTU0YSNeXjYyMmRjYDZgLS1kMS9zcw%3D%3D&btag=e00008000&expire=1700307894&l=202311180544290984F2C815B65729734D&ply_type=3&policy=3&signature=86fdf07638903cf00e885b900b5fe456&tk=0", "coverMediumUrl": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/13f2a0d585f3cd8578da0d18c36a18c4~c5_720x720.jpeg?x-expires=1700456400&x-signature=jkLwlnqFUpLwoYe6TvlGXZs%2FhP8%3D", "musicId": "7301080633693735726"},
TikTok uses the following sub-fields to describe each music:
- musicName: the name of this music.
- musicAuthor: the author of this music.
- musicOriginal: is this original music?
- playUrl: this url takes you to audio content of this music.
- coverMediumUrl: this url takes you to the cover page of this music.
- *musicId": TikTok assigns each music an id.
Video Meta
"videoMeta": {"height": 576, "width": 1024, "duration": 24, "coverUrl": "https://p16-sign.tiktokcdn-us.com/obj/tos-useast5-p-0068-tx/06fe558eb09e460b8dd87c852dab1d64_1699915304?x-expires=1700456400&x-signature=e%2BxReps37YechC%2FN3YDMa5MW4Bs%3D", "definition": "540p", "format": "mp4", "downloadAddr": "https://v16-webapp-prime.us.tiktok.com/video/tos/useast5/tos-useast5-pve-0068-tx/o4ISEQDQRpSUArDlMF5QfSPe8WrE0EDgSwqjBk/?a=1988&ch=0&cr=3&dr=0&lr=tiktok_m&cd=0%7C0%7C1%7C3&cv=1&br=2176&bt=1088&bti=ODszNWYuMDE6&cs=0&ds=3&ft=_rKBMBnZq8Zmoc_CKQ_vjFy.VAhLrus&mime_type=video_mp4&qs=0&rc=OTM6Z2k8NDZpO2hlNWg6OUBpM2xlOm85cmdkbzMzZzczNEBeYDQwMi5fNV8xNDU0NDMuYSNyLWZnMmQ0XzZgLS1kMS9zcw%3D%3D&btag=e00008000&expire=1700307894&l=202311180544290984F2C815B65729734D&ply_type=2&policy=2&signature=13889ecbdab6dd7518b441cb427600c9&tk=tt_chain_token"},
TikTok uses the following sub-fields to describe each music:
- height: how this video will be displayed - the height.
- width: how this video will be displayed - the width.
- duration: the duration of this video - how many seconds.
- coverUrl: this url takes you to the thumbnail view image of this video.
- definition: the definition of this video.
- format: the format of this video.
- downloadAddr: the url where you can download this video.
Mentions
Unliked the authorMeta, musicMeta, videoMeta which includes multiple sub-fields. mentions and hashtags are more like arrays which store objects of the same type. If multiple users are mentioned, then these users will appear in this mentions array; if no account is mentioned, like the case in this Taylor Swift post, then the mentions field will be stored like an empty array like this:
"mentions": []
Hashtags
If no hashtags are used in the text content of the post, this field will be stored like this - which is just an empty array.
"hashtags": []
If hashtags are used in the text content of the post, they will be stored in this hashtags array in this format:
"hashtags": [{"id": "1640230938585093", "name": "cleantok", "title": "Whether you're a daily cleaner or a once a month deep clean type, spring is here and it's the perfect time to get stuck into those therapeutic cleaning tasks. From scrubbing your sink, to descaling the dishwasher, deep cleaning your rugs, to refreshing the fridge - share your tips and show off how your spring clean is done.", "cover": "https://p16-amd-va.tiktokcdn.com/obj/musically-maliva-obj/9342f13cf27fe417b49e65a6f4cadcbe.png"}, {"id": "1655304719036422", "name": "cleaningtiktok", "title": "Start cleaning with #CleaningTikTok.", "cover": ""}, {"id": "170127", "name": "springcleaning", "title": "Whether it's minimizing or taking out the trash, get ready for some #SpringCleaning.", "cover": "https://p16-amd-va.tiktokcdn.com/obj/musically-maliva-obj/1629633553410053.PNG"}, {"id": "75424303", "name": "cleaninghacks", "title": "Show us how you keep things neat and tidy!", "cover": "https://p16-amd-va.tiktokcdn.com/obj/musically-maliva-obj/3320a6a94d0ad4bae1a025c0b3239481"}, {"id": "1614083057293334", "name": "cleaningasmr", "title": "", "cover": ""}, {"id": "15898164", "name": "cleaningproducts", "title": "", "cover": ""}]
This hashtags array stores multiple hashtags, and they are:
- cleantok
- cleaningtiktok
- springcleaning
- cleaninghacks
- cleaningasmr
- cleaningproducts
For each hash tag, TikTok maintains four sub-fields:
- id: TikTok assigns each hashtag an id.
- name: the name of the hashtag.
- title: some hashtags have a title. The initial creation of popular or trending hashtags is initiated by TikTok itself. TikTok's content moderation and curation teams may introduce new hashtags, along with associated titles, to highlight specific themes, challenges, or trends.
- cover: the url which takes you to the cover image of this hashtag page.
TikTok also maintains a web page for each hashtag, for example, the hashtag cleantok is maintained on this page:
and you can visit this page via https://www.tiktok.com/tag/cleantok.
Output File Format
All expected output files are provided. Among all the five operations mentioned above, only the display a comment operation would trigger a write to the output file.
When displaying the comments, we need to consider the displaying order of the comments. The rules are:
- existing comments: comments which are included in the json file are existing comments. And when displaying existing comments, a parent comment should be displayed (i.e., printed to the output file) before its children comments are displayed (i.e., printed to the output file). Two children comments which have the same parent should stay in the order as they are in the json file. For example, both A and B are existing commens, if comment A appears in line 1 of the json file, and comment B appears in line 4 of the json file, then comment A should be displayed (i.e., printed to the output file) before comment B is displayed (i.e., printed to the output file). Also, two comments which are both responses to the original video, should stay in the same order as they appear in the json file.
- newly added comments: for newly added comments, a parent comment should be displayed (i.e., printed to the output file) before its children comments are displayed (i.e., printed to the output file). Two children comments who have the same parent should stay in the same order as they are in the second input file.
- if a newly added comment is a reply to an existing comment, then it should be displayed right below that existing comment.
- if a newly added comment is a response to the original video, then this newly added comment should be displayed at the very bottom; in other words, it should be displayed after all existing comments are displayed.
- if two newly added comments, let's say A and B, both are responses to the original video, then both A and B should be displayed at the very bottom; but the order between A and B themselves, should stay the same as they appear in the second input file.
To summarize the rules, in this homework, no sorting is needed, but you need to make sure that a newly added comment should always be below its all existing siblings.
Useful Code
getline
- Unlike previous assignments where the input files only contain fields separated by spaces, in this assignment, fields are not separated by spaces, and therefore you may need a different way to read the input files. And the function getline will now come into play. To read the json file and store the whole json file into a std::string, you can use the following lines of code:
// assume inputFile is a std::string, containing the file name of the input file.
std::ifstream jsonFile(inputFile);
if (!jsonFile.is_open()) {
std::cerr << "Failed to open the JSON file." << std::endl;
exit(1);
}
std::string json_content;
std::string line;
while (std::getline(jsonFile, line)) {
json_content += line;
}
// don't need this json file anymore, as the content is read into json_content.
jsonFile.close();
After these lines, the whole content of the json file will be stored as a string in the std::string variable json_content. And you can then parse it to get each individual comment. In order to parse the json_content, which is a std::string, you will once again find that the std::string functions such as std::string::find(), and std::string::substr() to be very useful.
- The second input file contains comments, which may have spaces, and that makes it hard for you to use the >> operator to read the content of the file. Once again, the getline function can come into play. Let's say you want to read a line like this:
reply_to_comment UgxCAk2MEXaUMS8E5dx4AaABAg UgxCAk2MEXaUMS8E5dx4AaABAg.0 @user3 "I love this song!"
You can use the following lines of code:
// assuming opsFile is an std::ifstream object, which you use to open the second input file.
// assuming command, parent_id, id, author, comment are all std::string objects.
// read the command, the parent comment id, the child comment id, the user name.
opsFile >> command;
opsFile >> parent_id;
opsFile >> id;
opsFile >> user;
// skip any whitespace to get to the next non-whitespace character
opsFile >> std::ws;
// now, read the comment
if (opsFile.peek() == '"') {
// if the field starts with a double quote, read it as a whole string
opsFile.get(); // consume the opening double quote
std::getline(opsFile, comment, '"'); // read until the closing double quote
// opsFile >> comment; // read the quoted field
if (!comment.empty() && comment.back() == '"') {
comment.pop_back(); // remove the closing double quote
}
}
After executing the above lines, your command will be "reply_to_comment", your parent_id will be "UgxCAk2MEXaUMS8E5dx4AaABAg", your id will be "UgxCAk2MEXaUMS8E5dx4AaABAg.0", your user will be "@user3", your comment will be "I love this song!".
Program Requirements & Submission Details
In this assignment, you are required to use std::priority_queue. You can also use any other data structures we have already learned, such as std::string, std::vector, std::list, std::map, std::set, std::pair, std::unordered_map, std::unordered_set, std::stack, std::queue.
Use good coding style when you design and implement your program. Organize your program into functions: don’t put all the code in main! Be sure to read the Homework Policies as you put the finishing touches on your solution. Be sure to make up new test cases to fully debug your program and don’t forget to comment your code! Use the provided template README.txt file for notes you want the grader to read. You must do this assignment on your own, as described in the Collaboration Policy & Academic Integrity page. If you did discuss the problem or error messages, etc. with anyone, please list their names in your README.txt file.
Due Date: 11/30/2023, Thursday, 23:59pm.
Instructor's Code
Rubric
To be added.


