View on GitHub

salmonJS

Web Crawler in Node.js to spider dynamically whole websites.

download .ZIPdownload .TGZ

salmonJS

Build Status Dependency Status Coverage Status Code Climate Bitdeli Badge

NPM

salmonJS - Web Crawler in Node.js to spider dynamically whole websites.

Web Crawler in Node.js to spider dynamically whole websites.

IMPORTANT: This is a DEVELOPMENT tool, therefore SHOULD NOT be used against a website you DO NOT OWN!

It helps you to map / process entire websites, spidering them and parsing each page in a smart way. It follows all the links and test several times the form objects. In this way is possible to check effectively the whole website.

What's this for?

This project was born with the aim of improve the legacy code, but it's not strictly restricted only to that.

salmonJS will crawl every page from an entry-point URL, retrieving all the links in the page and firing all the events bound to any DOM element in the page in order to process all the possible combination automatically. The only "limitation" of an automatic robot is the user input, so for that cases has been implemented the test case files where it's possible to define custom input values (e.g.: POST variables for forms, input values for javascript prompts, etc).

With this in mind the usage of salmonJS could be different based on your own needs, like checking legacy code for dead code or profiling the web app performance.

Here below few suggestions about its usage:

Features

Dependencies

Here the list of main dependencies:

Installation

You can install it directly from npm:

[user@hostname ~]$ npm install salmonjs -g

or you can download the source code from GitHub and run these commands:

[user@hostname ~/salmonjs]$ npm install

Configuration

Change the file src/config.js accordingly to your needs.

Test Cases

Here an example of a test case file:

; Test Case File
; generated by salmonJS v0.4.0 (http://fabiocicerchia.github.io/salmonjs) at Sat, 01 Jan 1970 00:00:00 GMT
; url = http://www.example.com
; id = http___www_example_com

[GET]
variable1=value1

[POST]
variable1=value1
variable2=value2
variable3=@/path/to/file.ext ; use @ in front to use the upload feature (the file MUST exists)

[COOKIE]
name=value

[HTTP_HEADERS]
header=value

[CONFIRM]
Message=true ; true = OK, false = Cancel

[PROMPT]
Question="Answer"

Usage

              __                         _____ _______
.-----.---.-.|  |.--------.-----.-----._|     |     __|
|__ --|  _  ||  ||        |  _  |     |       |__     |
|_____|___._||__||__|__|__|_____|__|__|_______|_______|

salmonJS v0.4.0

Copyright (C) 2013 Fabio Cicerchia <info@fabiocicerchia.it>

Web Crawler in Node.js to spider dynamically whole websites.
Usage: ./bin/salmonjs

Options:
  --uri              The URI to be crawled                                                       [required]
  -c, --credentials  Username and password for HTTP authentication (format "username:password")
  -d, --details      Store details for each page                                                 [default: false]
  -f, --follow       Follows redirects                                                           [default: false]
  -p, --proxy        Proxy settings (format: "ip:port" or "username:password@ip:port")
  --disable-stats    Disable anonymous report usage stats                                        [default: false]
  --help             Show the help

Examples

[user@hostname ~]$ salmonjs --uri "http://www.google.com"
[user@hostname ~]$ salmonjs --uri "www.google.com"
[user@hostname ~]$ salmonjs --uri "/tmp/file.html"
[user@hostname ~]$ salmonjs --uri "file.html"

Tests

[user@hostname ~/salmonjs]$ npm test

How it works

Bugs

For a list of bugs please go to the GitHub Issue Page.

Licence

Copyright (C) 2013 Fabio Cicerchia info@fabiocicerchia.it

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.