I’ve recently introduced Searx, a meta search engine to install on your own server and respecting your privacy.
Although Searx is a great service to protect your privacy, the main issue with meta search engines is that they are based on the index of big search engines like Google, Yahoo, Bing, etc….They actually don’t build their own database of websites. It means they heavily rely on uncontrollable source of information and if Google or any others decide to remove 1 website from their index…., well, the website disappear for you too…
And this is where Yacy step in.
Yacy is an opensource search engine, fully decentralized and using Peer to Peer.
The network does not store user search requests and it is impossible for anyone to censor the content of shared index.
You can have a try on their official demo page.
It’s seems they have indexed around 1.4billion documents (And growing) and more than 600 peer operators are contributing every month. About 130,000 search queries are performed with this network each day.
This is obviously very far from what Google, Bing and others have.
So let’s see how to install Yacy on your own server to help indexing the world wide web!
Yacy is very simple to install, but it requires Java.
Also, the more memory, bandwidth, space you allocate to Yacy, the better. (But you can set what you want tho)
Lucky you if you have Ubuntu/Debian type of system. They are maintaining a repository for you.
**1) Add their repository to your source.list
In root (or with sudo in front), run:
echo 'deb http://debian.yacy.net ./' > /etc/apt/sources.list.d/yacy.list
then add the repository key:
wget http://debian.yacy.net/yacy_orbiter_key.asc -O- | apt-key add -
**2) Update your source and install OpenJDK (Java) and Yacy
Still in root
apt-get update apt-get install openjdk-7-jre-headless apt-get install yacy
You will need to set the name of your node, with an admin password and a network type (freeworld for a public network).
Then you will be able to set the initial java memory (default 180mb, you can add more or less depending of your configuration). For my case, since I got quite a lot of memory, I’ve set as 512mb and 1500mb as maximum Java memory. But you can modify back these values later on the web interface.
When done, Yacy would have started already.
Simply go to http://YourIP:8090 to access to your own search engine!
You can click on the administrator buttom and log in with “admin” as username and your previously created password.
You will be able to change the language of the interface, set a name, etc…
Then simply go to Load Web Pages, Crawler to launch your first website crawl
If you want to improve Yacy performance, they have a dedicated page to explain all the possible setting that can be tweaked.
One possible next step for you could be to set up a subdomain like search.domain.tld to access to your Yacy instance. You can easily do it with a virtualhost.
What you will need:
1) Create a A redirection in your DNS server/registrar with something like search.domain.tld to your IP
2) Have SSL certificate ready. If not you can read this tutorial. (Optional)
3) Install Mod Proxy HTML and activate proxy HTTP
In root, run:
apt-get install libapache2-mod-proxy-html a2enmod proxy_http
4) Create your virtualhost as following:
In /etc/apache2/sites-enabled/, create a file called yacy (In root):
and paste the following content:
<VirtualHost *:80> ServerAdmin email@example.com ServerName search.domain.tld Redirect / https://search.domain.tld </VirtualHost> <IfModule mod_ssl.c> <VirtualHost *:443> SSLEngine on SSLCertificateFile /etc/ssl/certs/myblog.pem SSLCertificateKeyFile /etc/ssl/private/myblog.key ServerAdmin firstname.lastname@example.org ServerName search.domain.tld ProxyRequests Off Order deny,allow Allow from all ProxyPass / https://localhost:8090/ ProxyPassReverse / https://localhost:8090/ </VirtualHost> </IfModule>
Adapt the content (Servername, webmaster email, SSL certificate, directory,…).
When done, save the file (CTRL+X then Yes) and reload apache: (In root)
From now one, all the connections to search.domain.tld will be redirected first to https://search.domain.tld, which will be redirected to localhost:8090, your Yacy instance.
And voila! Ready to freely crawl the web!