Webcrawl is a web crawling tool built with Typescript, Next.js, React Hooks, Node.js, Express, D3, Puppeteer, and SASS. It allows you to input a URL of the website you'd like to crawl and visualize the crawled links in both JSON and a D3 tree.
-
Web Crawling: Enter the URL of the website you want to crawl, and Webcrawl will fetch and organize the links.
-
Visualization: Switch between viewing the crawled links in a structured JSON format or explore an interactive D3 tree, offering a graphical representation of the website structure.
-
Frontend:
- React with Next.js
- D3 for data visualization
- Typescript for enhanced code maintainability
-
Backend:
- Node.js with Express
- Puppeteer for web scraping
-
Styling:
- SASS for styling the user interface
-
Installation:
- Clone the repository:
https://github.com/GarimaB06/Webcrawl.git
- Install dependencies:
npm install
- Clone the repository:
-
Run the Application:
- Start the backend server:
npm run server-start
(Navigate tohttp://localhost:3001
). - Start the frontend application:
npm run dev
(Navigate tohttp://localhost:3000
).
- Start the backend server:
-
Access the Application:
- Open your browser and navigate to
http://localhost:3000
. - Input the URL of the website you want to crawl.
- Open your browser and navigate to
-
Visualize Results:
- Explore the crawled links in the provided JSON format or the D3 tree visualization.
-
Frontend:
- React
- Next.js
- D3
- Typescript
-
Backend:
- Node.js
- Express
- Puppeteer
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or create a pull request.
This project is licensed under the MIT License.