AI Assignment
AI Assignment
Crawling Process:
After configuring Nutch, I navigated to the nutch-0.9/bin directory in the Terminal to initiate the crawl
with this command:
“./nutch crawl urls -dir Crawled_Data -depth 3 -topN 10”
Here, the Crawled_Data folder stores the crawled data, while depth and topN control the depth and
number of pages to be crawled, respectively.
Deployment on Apache Tomcat:
With the crawled data prepared, I copied the nutch-0.9.war file to the Tomcat webapps directory
(/Users/priyanshu/Downloads/apache-tomcat-9.0.82/webapps). Then, I modified the search.dir
property in nutch-site.xml to point to Crawled_Data, enabling the search engine to access the
indexed data.
Upon starting the Apache Tomcat server, I accessed the search engine at
http://localhost:8080/nutch-0.9/. Entering a search query like “b.tech” yielded 9 results from the
crawled data, demonstrating the search engine's functionality.
Challenges and Solutions:
During the setup, I encountered an error with Search.jsp, specifically at line 151. Adding an escape
sequence to include the header.html file resolved this issue. Afterward, the Apache Tomcat server
was restarted to reflect these change
Result:
Upon completing the configuration and necessary adjustments, the Apache Nutch search engine
successfully displayed a homepage with its logo and a search bar. Entering search queries yielded
results based on the indexed web content. For instance, querying "practice" returned 42 relevant
results, proving the search engine’s functionality.