Yacy – P2P decentralized and open source search engine

I’ve recently introduced Searx, a meta search engine to install on your own server and respecting your privacy.

Although Searx is a great service to protect your privacy, the main issue with meta search engines is that they are based on the index of big search engines like Google, Yahoo, Bing, etc….They actually don’t build their own database of websites. It means they heavily rely on uncontrollable source of information and if Google or any others decide to remove 1 website from their index…., well, the website disappear for you too…

And this is where Yacy step in.

Yacy is an opensource search engine, fully decentralized and using Peer to Peer.

yacy decentralized peer to peer search engineEvery node can crawl the web to index billions of web pages and share their index through P2P to others nodes.

The network does not store user search requests and it is impossible for anyone to censor the content of shared index.

You can have a try on their official demo page.

It’s seems they have indexed around 1.4billion documents (And growing) and more than 600 peer operators are contributing every month. About 130,000 search queries are performed with this network each day.

This is obviously very far from what Google, Bing and others have.

So let’s see how to install Yacy on your own server to help indexing the world wide web!

Yacy is very simple to install, but it requires Java.

Also, the more memory, bandwidth, space you allocate to Yacy, the better. (But you can set what you want tho)

 

Installation

Lucky you if you have Ubuntu/Debian type of system. They are maintaining a repository for you.

**1) Add their repository to your source.list

**

In root (or with sudo in front), run:

echo 'deb http://debian.yacy.net ./' > /etc/apt/sources.list.d/yacy.list

then add the repository key:

wget http://debian.yacy.net/yacy_orbiter_key.asc -O- | apt-key add -
**2) Update your source and install OpenJDK (Java) and Yacy

**

Still in root

apt-get update
apt-get install openjdk-7-jre-headless 
apt-get install yacy

You will need to set the name of your node, with an admin password and a network type (freeworld for a public network).

yacy installation

Then you will be able to set the initial java memory (default 180mb, you can add more or less depending of your configuration). For my case, since I got quite a lot of memory, I’ve set as 512mb and 1500mb as maximum Java memory. But you can modify back these values later on the web interface.

When done, Yacy would have started already.

Simply go to http://YourIP:8090 to access to your own search engine!

You can click on the administrator buttom and log in with “admin” as username and your previously created password.

 

Yacy admin interface

You will be able to change the language of the interface, set a name, etc…

Then simply go to Load Web Pages, Crawler to launch your first website crawl

 

yacy crawling

If you want to improve Yacy performance, they have a dedicated page to explain all the possible setting that can be tweaked.

One possible next step for you could be to set up a subdomain like search.domain.tld to access to your Yacy instance. You can easily do it with a virtualhost.

 

Virtualhost configuration

What you will need:

1) Create a A redirection in your DNS server/registrar with something like search.domain.tld to your IP

2) Have SSL certificate ready. If not you can read this tutorial. (Optional)

3) Install Mod Proxy HTML and activate proxy HTTP

In root, run:

apt-get install libapache2-mod-proxy-html
a2enmod proxy_http

4) Create your virtualhost as following:

In /etc/apache2/sites-enabled/, create a file called yacy (In root):

nano /etc/apache2/sites-enabled/yacy

and paste the following content:

<VirtualHost *:80>
        ServerAdmin webmaster@domain.tld
        ServerName search.domain.tld
        Redirect / https://search.domain.tld
</VirtualHost>

<IfModule mod_ssl.c>
<VirtualHost *:443>
        SSLEngine on
        SSLCertificateFile /etc/ssl/certs/myblog.pem
        SSLCertificateKeyFile /etc/ssl/private/myblog.key

        ServerAdmin webmaster@domain.tld
        ServerName search.domain.tld

        ProxyRequests Off
        Order deny,allow
        Allow from all

        ProxyPass / https://localhost:8090/
        ProxyPassReverse / https://localhost:8090/

</VirtualHost>
</IfModule>

Adapt the content (Servername, webmaster email, SSL certificate, directory,…).

When done, save the file (CTRL+X then Yes) and reload apache: (In root)

/etc/init.d/apache2 reload

From now one, all the connections to search.domain.tld will be redirected first to https://search.domain.tld, which will be redirected to localhost:8090, your Yacy instance.

And voila! Ready to freely crawl the web!

banner

Gravatar

Loves to discover web-based apps to install on his own server@home and write articles about it

0 Comments:

Add a comment