Skip to main content

Yacy – P2P decentralized and open source search engine

I’ve recently introduced Searx, a meta search engine to install on your own server and respecting your privacy.

Although Searx is a great service to protect your privacy, the main issue with meta search engines is that they are based on the index of big search engines like Google, Yahoo, Bing, etc….They actually don’t build their own database of websites. It means they heavily rely on uncontrollable source of information and if Google or any others decide to remove 1 website from their index…., well, the website disappear for you too…

And this is where Yacy step in.

Yacy is an opensource search engine, fully decentralized and using Peer to Peer.

yacy decentralized peer to peer search engineEvery node can crawl the web to index billions of web pages and share their index through P2P to others nodes.

The network does not store user search requests and it is impossible for anyone to censor the content of shared index.

You can have a try on their official demo page.

It’s seems they have indexed around 1.4billion documents (And growing) and more than 600 peer operators are contributing every month. About 130,000 search queries are performed with this network each day.

This is obviously very far from what Google, Bing and others have.

So let’s see how to install Yacy on your own server to help indexing the world wide web!

Yacy is very simple to install, but it requires Java.

Also, the more memory, bandwidth, space you allocate to Yacy, the better. (But you can set what you want tho)



Lucky you if you have Ubuntu/Debian type of system. They are maintaining a repository for you.

1) Add their repository to your source.list

In root (or with sudo in front), run:

then add the repository key:

2) Update your source and install OpenJDK (Java) and Yacy

Still in root

You will need to set the name of your node, with an admin password and a network type (freeworld for a public network).

yacy installation

Then you will be able to set the initial java memory (default 180mb, you can add more or less depending of your configuration). For my case, since I got quite a lot of memory, I’ve set as 512mb and 1500mb as maximum Java memory. But you can modify back these values later on the web interface.

When done, Yacy would have started already.

Simply go to http://YourIP:8090 to access to your own search engine!

You can click on the administrator buttom and log in with “admin” as username and your previously created password.


Yacy admin interface

You will be able to change the language of the interface, set a name, etc…

Then simply go to Load Web Pages, Crawler to launch your first website crawl


yacy crawling

If you want to improve Yacy performance, they have a dedicated page to explain all the possible setting that can be tweaked.

One possible next step for you could be to set up a subdomain like search.domain.tld to access to your Yacy instance. You can easily do it with a virtualhost.


Virtualhost configuration

What you will need:

1) Create a A redirection in your DNS server/registrar with something like search.domain.tld to your IP

2) Have SSL certificate ready. If not you can read this tutorial. (Optional)

3) Install Mod Proxy HTML and activate proxy HTTP

In root, run:

4) Create your virtualhost as following:

In /etc/apache2/sites-enabled/, create a file called yacy (In root):

and paste the following content:

Adapt the content (Servername, webmaster email, SSL certificate, directory,…).

When done, save the file (CTRL+X then Yes) and reload apache: (In root)

From now one, all the connections to search.domain.tld will be redirected first to https://search.domain.tld, which will be redirected to localhost:8090, your Yacy instance.

And voila! Ready to freely crawl the web!


Searx – Meta search engine respecting privacy, for your server

Privacy is for me very important, not that I have something special to hide, but I want to be sure I control what I do while not being set in a box (Profiling).

The search engine is probably the tool I am using the most in daily basis and I long used Google Search….But on top of not respecting your privacy (It’s their job of course), it will also build a profile of you and will change the ranking of the search results based on it. Then you quickly browse on the same websites every day with few new contents.

But there is actually plenty of alternatives. The most popular is probably DuckDuckGo, a meta search engine respecting your privacy (Does not track you,…)

Note that a meta search engine differs from a search engine. To be short, a search engine will crawl the web and index it in a database, then you will search inside this database. A meta search engine does not crawl and index, but will aggregate the results from others search engine instead.

Running a search engine actually needs significant resources (Space, IO, etc…) and requirement more development than a meta search. That’s why there is few alternatives as search engine but many as meta search engine.

And you can also run your own meta search engine on your own server. This is what Searx is all about.

searx meta search engine

Supporting different languages, searx can also be easily customized such as the selection of the search engines or categories. You can also set the output results as rss feed, or csv and even json.

You can directly give it a try on their official instance,

searx installation steps

And here is how to install it on Ubuntu/Debian based system.



You will first need to have a LAMP server, if you don’t know what it is, or don’t have it, please see this tutorial.

1) Install the system dependencies

In root (or with sudo), run:

2) Clone their latest version into your /var/www and change the ownership of the folder by searx user

Still in root,


3) Install searx dependencies in a virtualenv

Still in root,


4) Generate a secret key to set in the setting.yml file
If you want to make some additional modification like the port number, etc… simply edit the settings.yml manually.

At this point searx is not demonized ; uwsgi allows this.


5) Configure and activate uwsgi

Create the configuration file /etc/uwsgi/apps-available/searx.ini

and copy the following content:

And finally activate the uwsgi

6) Create a dedicated virtualhost with uwsgi configured

Here is my virtualhost as exemple

Restart apache

And you should be good to go!

If you face any issue during the installation, feel free to ask questions in the comment section or directly on their issues tracker.

Google Announcement

List of alternatives to Google and co for your own server

search engine logoGoogle Search having probably more than 70% market share followed by Bing and Yahoo, Google tends to be unavoidable….but it exists quite a lot of robust alternatives and I wanted to share some of them here.


Obvisouly DuckDuckGo have become quite popular after all the revelation about PRISM, NSA surveillance, etc….. and after testing it for several weeks, I’m quite happy with it, although its was tough to change my habits and I even felt the relevancy of answers was not so good as I was used to the “selected” (Called it restricted) contents from Google (based on my previous history). But actually I’m now very happy  to see the web from a fresh eye!

If you’re not a big fan of DuckDuckGo, I recommend you to give a try to Ixquick or StartPage. (Many more exists tho)

But hey, this is a blog on self hosting or managing your own server…DuckDuckGo is great, but how about having my own search engine?

Basically, you can have 2 types of search engine:

1) The Meta search engine, using the indexes of others search engine (They crawl the web and the meta search engine will use their database to deliver you the content).

It’s usually a light application with a good accuracy (Large number of indexed websites, usually from Google, Yahoo or Bing). Hence you got most of the web under a single click with some added features (Compared to google, etc..) such as privacy or even collaborative. However, you fully rely on the 3rd party databases. If Google remove a website from its database, then you cannot see it.

2) The “real” search engine, clearly like Google and co. Meaning you will need to crawl the web and index it before doing a search. The benefits are a total censorship-free, independence and privacy, however as you can expect, crawling the web is a long job and you won’t be able to compete with Google’s billions of pages indexed and million servers….

Actually, expect for intranet or if you want to have your own search engine restricted in few websites that you can crawl by yourself, I don’t see much reasonable alternatives except the great Yacy (Peer to Peer search engine) that I will detail (and write a tutorial) later.

Here is the list of meta/search engines that worth a try:

1) Meta Search Engine

Searx, opensource meta search engine protecting your privacy with parallel query for faster response, Duckduckgo like !bang functionality with engine shortcuts and many more.

Seeks, opensource collaborative distributed (p2p) search engine that ranks results by social consensus (filter)

MySearch, opensource simple meta search engine (Minimalist design) and respecting your privacy.

2) Search Engine

Yacy, decentralized and censorship free, Yacy allow you to crawl your part of the internet and share your index on p2p. The more people running a node, the faster and more complete it become. (My favourite Search Engine!)


Well, I have to say I didn’t find many still maintained search engines so far, but I’ll continue to look for. if you know any other good ones, please share them on the comment section!


google search engine question why