Create a Newsletter Sourcing Data using MongoDB
Last Updated :
03 Apr, 2023
There are many news delivery websites available like ndtv.com. In this article, let us see the very useful and interesting feature of how to get the data from ndtv.com via scraping feature i.e. extracting the contents from ndtv.com and storing them into MongoDB. MongoDB is a NoSQL Documentum model database.
Using Mongoose, Node JS, and Cheerio, the NDTV news website is scraped and data is loaded into the Mongo DB database. This is a full-stack JavaScript app built using MongoDB, Mongoose, Node.js, Express.js, Handlebars.js, HTML, and CSS. It scrapes the [NDTV](https://ndtv.com/) homepage and stores article titles and links.
Module Installation: Install the required modules using the following command.
npm install body-parser
npm install cheerio
npm install express
npm install express-handlebars
npm install mongoose
npm install request
Project Structure: It will look like this.

Implementation:
Filename: server.js: This is the important file required to start the app running. To call the ndtv site, scrape the data, and store it in MongoDB database.
JavaScript
// First specifying the required dependencies
// Express is a minimal and flexible Node.js
// web application framework that provides a
// robust set of features for web and mobile
// applications
const express = require("express");
// To communicate with mongodb, we require "mongoose"
const mongoose = require("mongoose");
// As we need to call ndtv website and access
// the urls, we require "request"
const request = require("request");
// Cheerio parses markup and provides an
// API for traversing/manipulating the
// resulting data structure
const cheerio = require("cheerio");
// Node.js body parsing middleware.
// Parse incoming request bodies in a
// middleware before your handlers,
// available under the req.body property.
const bodyParser = require("body-parser");
const exphbs = require("express-handlebars");
// We can explicitly set the port number
// provided no other instances running
// on that port
const PORT = process.env.PORT || 3000;
// Initialize Express
const app = express();
// Use body-parser for handling form submissions
app.use(bodyParser.urlencoded({
extended: false
}));
// We are getting the output in the
// form of application/json
app.use(bodyParser.json({
type: "application/json"
}));
// Serve the public directory
app.use(express.static("public"));
// Use promises with Mongo and connect to
// the database
// Let us have our mongodb database name
// to be ndtvnews By using Promise,
// Mongoose async operations, like .save()
// and queries, return thenables.
mongoose.Promise = Promise;
const MONGODB_URI = process.env.MONGODB_URI
|| "mongodb://localhost/ndtvnews";
mongoose.connect(MONGODB_URI);
// Use handlebars
app.engine("handlebars", exphbs({
defaultLayout: "main"
}));
app.set("view engine", "handlebars");
// Hook mongojs configuration to the db variable
const db = require("./models");
// We need to filter out NdtvArticles from
// the database that are not saved
// It will be called on startup of url
app.get("/", function (req, res) {
db.Article.find({
saved: false
},
function (error, dbArticle) {
if (error) {
console.log(error);
} else {
// We are passing the contents
// to index.handlebars
res.render("index", {
articles: dbArticle
});
}
})
})
// Use cheerio to scrape stories from NDTV
// and store them
// We need to do this on one time basis each day
app.get("/scrape", function (req, res) {
request("https://ndtv.com/", function (error, response, html) {
// Load the html body from request into cheerio
const $ = cheerio.load(html);
// By inspecting the web page we know how to get the
// title i.e. headlines of news.
// From view page source also we can able to get it.
// It differs in each web page
$("h2").each(function (i, element) {
// The trim() removes whitespace because the
// items return \n and \t before and after the text
const title = $(element).find("a").text().trim();
console.log("title", title);
const link = $(element).find("a").attr("href");
console.log("link", link);
// If these are present in the scraped data,
// create an article in the database collection
if (title && link) {
db.Article.create({
title: title,
link: link
},
function (err, inserted) {
if (err) {
// Log the error if one is
// encountered during the query
console.log(err);
} else {
// Otherwise, log the inserted data
console.log(inserted);
}
});
// If there are 10 articles, then
// return callback to the frontend
console.log(i);
if (i === 10) {
return res.sendStatus(200);
}
}
});
});
});
// Route for retrieving all the saved articles.
// User has the option to save the article.
// Once it is saved, "saved" column in the
// collection is set to true.
// Below routine helps to find the articles
// that are saved
app.get("/saved", function (req, res) {
db.Article.find({
saved: true
})
.then(function (dbArticle) {
// If successful, then render with
// the handlebars saved page
// this time saved.handlebars is
// called and that page is rendered
res.render("saved", {
articles: dbArticle
})
})
.catch(function (err) {
// If an error occurs, send the
// error back to the client
res.json(err);
})
});
// Route for setting an article to saved
// In order to save an article, this routine is used.
// _id column in collection is unique and it will
// determine the uniqueness of the news
app.put("/saved/:id", function (req, res) {
db.Article.findByIdAndUpdate(
req.params.id, {
$set: req.body
}, {
new: true
})
.then(function (dbArticle) {
// This time saved.handlebars is
// called and that page is rendered
res.render("saved", {
articles: dbArticle
})
})
.catch(function (err) {
res.json(err);
});
});
// Route for saving a new note to the db and
// associating it with an article
app.post("/submit/:id", function (req, res) {
db.Note.create(req.body)
.then(function (dbNote) {
let articleIdFromString =
mongoose.Types.ObjectId(req.params.id)
return db.Article.findByIdAndUpdate(
articleIdFromString, {
$push: {
notes: dbNote._id
}
})
})
.then(function (dbArticle) {
res.json(dbNote);
})
.catch(function (err) {
// If an error occurs, send it
// back to the client
res.json(err);
});
});
// Route to find a note by ID
app.get("/notes/article/:id", function (req, res) {
db.Article.findOne({ "_id": req.params.id })
.populate("notes")
.exec(function (error, data) {
if (error) {
console.log(error);
} else {
res.json(data);
}
});
});
app.get("/notes/:id", function (req, res) {
db.Note.findOneAndRemove({ _id: req.params.id },
function (error, data) {
if (error) {
console.log(error);
}
res.json(data);
});
});
// Listen for the routes
app.listen(PORT, function () {
console.log("App is running");
});
Steps to run the application: Run the server.js file using the following command.
node server.js
Output: We will see the following output on the terminal screen.
App is running
Now open any browser and go to http://localhost:3000/, we will get a similar page like below.

To get the news from ndtv.com, we need to click on Get New Articles. This will internally call our /scrape path. Once this call is done, in MongoDB, under ndtvnews database, articles named collection got filled with the data as shown below:
articles collectionHere, the initially saved attribute will be false, id is automatically got created in MongoDB and this is the unique identification of a document in a collection. This attribute only helps to view a document, save a document, etc.
Extracted articles are displayed in this formatOn clicking on View article on NDTV, it will navigate to the respective article. This is getting possible only because of id attribute which is present in the articles collection. So, when we click on View article on NDTV, as it is a hyperlink, directly that document _id value is internally picked up and the link is displayed. When the Save article is clicked, the _Id value will be the identification part for that article.
Working: Entire working model of the project is explained in the video:
Conclusion: It is easier and simpler to scrape any news website and display the title contents alone along with a link that follows to proceed, and we can save the article and check out the saved articles easily.
Reference: https://github.com/raj123raj/NdtvNewsScraperUsingMongoDB
Similar Reads
How to create new Mongodb database using Node.js ?
mongodb module: This Module is used to performing CRUD(Create Read Update Read) Operations in MongoDb using Node.js. We cannot make a database only. We have to make a new Collection to see the database. The connect() method is used for connecting the MongoDb server with the Node.js project. Please r
1 min read
Create Database using MongoDB Compass
MongoDB is a popular NoSQL database that uses a document-oriented storage model, differing from traditional relational databases. Instead of storing data in tables with rows and columns, MongoDB stores data as documents in BSON format. MongoDB Compass is a graphical user interface (GUI) that allows
6 min read
MongoDB - Create Database using Mongo Shell
MongoDB is a popular NoSQL database that uses collections and documents, which are highly flexible and scalable. Unlike relational databases (RDBMS), MongoDB does not use tables and rows but stores data in a more dynamic, JSON-like format. In this article, we'll explore how to create a MongoDB datab
4 min read
Create Newsletter app using MailChimp and NodeJS
Nowadays, every business uses email marketing to promote their business and send regular updates to their users via email. Maybe, you are also subscribers of some websites such as GeeksforGeeks and many more. It's simple if you subscribe, you will get regular email from their side, and if you unsubs
5 min read
How to Connect to a MongoDB Database Using Node.js
MongoDB is a NoSQL database used to store large amounts of data without any traditional relational database table. To connect to a MongoDB database using NodeJS we use the MongoDB library "mongoose". Steps to Connect to a MongoDB Database Using NodeJSStep 1: Create a NodeJS App: First create a NodeJ
4 min read
How to Create an Email Newsletter ?
To create an Email Newsletter you need to use HTML and CSS. HTML will make the structure of the newsletter body and CSS will make its style looks good. Email newsletters are used to inform the reader or enthusiast geeks who are keenly interested in your content. If the user subscribed the newsletter
4 min read
Spring Boot - CRUD Operations using MongoDB
CRUD stands for Create, Read/Retrieve, Update, and Delete and these are the four basic operations that we perform on persistence storage. CRUD is data-oriented and the standardized use of HTTP methods. HTTP has a few methods which work as CRUD operations and do note they are very vital from a develo
5 min read
Create Newsletter Subscription in Blogs And News Website
Adding a newsletter subscription feature to your blogs and news websites can enhance user engagement and keep your audience updated with the latest content. This article will guide you through implementing a newsletter subscription system, covering both backend and frontend aspects.Why Offer a Newsl
6 min read
Create Relationship in MongoDB
In MongoDB, managing relationships between data is crucial for structuring and querying databases effectively. Relationships can be handled using embedded documents, references and the $lookup aggregation stage, each offering different advantages depending on the use case.In this article, We will le
7 min read
Grouping Search Results Using Facets in MongoDB
Faceted search in MongoDB organizes data based on multiple attributes, like categories or tags, using the aggregation framework. This technique enhances data navigation and analysis. To implement faceted search, access MongoDB Cloud, have basic MongoDB query knowledge, and use MongoDB Compass for vi
4 min read