CALIFORNIA STATE UNIVERSITY BAKERSFIELD
Computer and Electrical Engineering and Computer Science


Senior Project: 2019-2020

Under the direction of Dr. Chengwei Lei


Web Crawling and Community Review to Prevent Misleading Links

Adam Arreguin, Adrian Gutierrez, Joel Staggs, Kenny Taylor

Project Description

As the world wide web has grown, the amount of misleading and commercialization has exploded. One form of this is commonly known as 'clickbait', web links with an inflamatory or controversial title intended to manipulate users into clicking through. This project is intended to provide users a valuable view into the reputation of websites and warn users of misleading and malicious links prior to visiting a site.

The project consists of a browser plugin, website, backend database, JSON-based web API, and backend analyzer scripts. When the user installs the browser plugin, they are provided the target site title and score when hovering over a link. If a user right-clicks a link and selects 'Explore Link', a sidebar is opened in the browser showing recent user comments on the target page and allows the user to add comments. When hovering over a link, the browser plugin submits the target URL to the server API. The API checks to see if a recent score is available for that link. If so, that score is immediately returned to the plugin. If not, the backend analyzer code is called, retrieving a copy of the page on the server, scoring it, caching the score in the database, then returning the score to the plugin.

The website www.datadogsanalytics.com is provided as an additional user interface and provides a social aspect to the platform. Logged-in users can follow other users' comments and subscribe to all comments from a specific domains. When a user logs in, they are presented with a feed of the 20 most recent comments from users and domains to which they subscribe. The website also provides a site crawling functionality. When users submit a URL to the crawler, the server will score that page and all links from that page to a predefined depth (currently two levels). Crawler results are displayed to the user in real time via a websockets implementation and cached in the database for future use.

PowerPoint Presentation Video 4/29/2020

Project Screenshots

project_img
Browser Extension Popup and Sidebar

project_img
User Feed on Website

project_img
User Feed on Website

project_img
Cached Score Statistics

Expo Poster

Download PDF here
Poster Presentation Video 4/29/2020

Project Website

https://www.datadogsanalytics.com

Test user: username 'testuser', password 'datadogs2020' (has followers and feed)
Register a new account here

Firefox Plugin Download

Download plugin here

Installation Instructions:
1) Save the datadogs.zip file using the link above
2) Rename the file from datadogs.zip to datadogs.xpi
3) Open Firefox and type 'about:debugging' into the address bar (without quotes)
4) Select 'This Firefox' on the left side
5) Select 'Load temporary add-on and browse to find datadogs.xpi

Demonstration Videos

PowerPoint Presentation and Demo Videos