Towards better documentation search #6307

squidfunk · 2023-11-07T14:07:05Z

Background

As you may have read in one of my recent comments, we're currently revising our search implementation. The current search is based on Lunr.js, which is also the search engine that MkDocs has been using the time Material for MkDocs started in 2016. In the beginning, we felt that this was a good fit, as Lunr.js allows searching in the browser without the need for an external service. This makes deploying documentation much simpler, since search is and should always be a central component to each and every good documentation site.

In the past years, we've invested hundreds of hours into making search better. With the help of our awesome sponsors, we were able to ship rich search previews, support for more sophisticated tokenizers, support for Chinese, as well as better highlighting. Additionally, we made search almost twice as fast. However, in order to progress, and solve the many open issues that are related to search, we decided to throw out Lunr.js. There are several reasons for that, the most important of which that it is unmaintained since 2020. Additionally, Lunr.js only allows ranking with BM25, which is a good basis, but almost all issues that are related to weird rankings are caused by the fact that BM25 is not ideal for stable typeahead search. It was meant for full-word retrieval and is almost impossible to tame for the many different use cases that we've seen in the wild. Again, we've invested a lot of time to improve the situation, but we've reached an end where this doesn't make sense anymore.

This is the reason why we're currently releasing so few new features, because we're putting our entire energy in finishing the new search implementation. We're already almost en-par with Lunr.js' functionality, but now have an entirely modular architecture, which will allow us to swap out everything. Yes, I mean everything: the ranking algorithm, wildcard matching, the inverted index implementation, yada, yada, yada. Solving the documentation search problem is a personal affair for me. I really hate that there's not yet a solution that works reliably, can run anywhere, and is modular so it can be easily customized.

This is what we're building.

As you may already suspect, this is a pretty big project, which is why it is taking so long. We feel, it is the perfect moment to venture into this problem, because we gathered a lot of use cases that we can now balance and optimize for. However, please understand that this takes time, so I kindly ask you to be a little more patient. Development on this project is after all 99% done by me, @squidfunk, and we're rewriting something that millions of users are using each and every day. That needs care.

Where we're currently at

First of all: search will be a separate, new project! This means you will be able to use the same engine in your other projects as well. Additionally, here's a non-exhaustive list of things we're planning to ship in the first version:

Modular engines: Search should not only allow to search for text in an inverted index, but also support new use cases like nearest neighbors on vector embeddings. We designed the new search so that multiple engines can be configured for the same set of documents, e.g. store text and title in an inverted index, and store embeddings in a vector store – all from the same document. They should then be searched and ranked together. Additionally, document fields can be tokenized differently, and the tokenizing algorithm can be based on a regular expression, or a function, allowing for maximum flexibility.
- Search: Return Results for URLs #5936
Powerful plugin system: Plugins are first-class-citizens! The new search is completely modularized. For example, the inverted index itself does not compute scores – it's implemented as a plugin. This means, alternative ranking plugins can be implemented. The plugin architecture is dead simple, but insanely powerful. From my current knowledge, I know of nothing that could not be implemented as a plugin.
- Search: customize behavior with hooks #4980
Document metadata – authors should be able to configure which parts of document metadata should be included with the documents, so that documents can be indexed with custom metadata. Currently, only text, title, location and tags are included. The new search should allow to configure which fields are indexed how, i.e., how they should render in search results, if they should render at all (think keywords or aliases), etc. This would also allow to slice the search into different sections, e.g. for the blog, API reference, etc., by allowing the author to render those as tabs in the search bar.
Better accuracy – the current implementation uses Lunr.js, which uses OR to combine terms. This is not ideal for document search, as users reported repeatedly that they expect to narrow the number of search results with more terms entered. The new search will make it easy to switch to AND as a default combinator.
Detect misspellings – if a typo is entered, e.g. instlal, the engine should detect the typo and correct it to install. Many engines support this, so we should find a way to do the same.
Offline-first – it goes without saying that one of the highest priorities is that search will keep working offline. The new search implementation will, of course, still not need a server.
Span queries – searches like "single page application" should be ranked higher when those words appear together. This removes the need for exact search within quotes, which is something many non-technical users don't even know is possible in search engines like Google or Bing. The goal is that entering a few words should be enough, no special syntax should be needed.
Compound words – the current search allows to index words like PascalCase as Pascal and Case by using clever lookaheads, but it also means that searching for the entire term in lowercase pascalcase will not return any results. This should be fixed in a way that both can be found.
- Search: find PascalCase, pascalcase and case at the same time #6632
Document hierarchy – the search index should be organized hierarchically, so that the explicit navigation structure and implicit table of contents hierarchy yield more context to search results, helping to disambiguate repetitive documentation.
- Search: breadcrumbs to signal context of a document #3787

Here's a list of ideas, partially based on open change requests, which we will implement after the first version is out and reached a stable state. We believe all of those features will be great additions:

This list is far from complete. We have so many more ideas, which we'll share when the time has come. We'll keep this issue updated, so feel free to subscribe or check back from time to time. We hope to push our the first candidate before the end of this year! Thank you for your patience and for your trust in Material for MkDocs.

The text was updated successfully, but these errors were encountered:

strausmann · 2023-11-09T06:38:34Z

Great ideas and very great features for searching. The most important thing for us is that the search continued to work completely offline and without a web server. We use MkDocs as documentation, it has to work offline on the plane or on the ship.

squidfunk · 2023-11-09T06:51:26Z

@strausmann that is our priority. It will definitely work offline (our prototype already does), but we'll also add interesting new features like search federation (merging search indexes with other MkDocs sites) for which you obviously need to be online. All of those are optional and will degrade gracefully when offline, of course.

strausmann · 2023-11-09T06:56:37Z

The search federation is of course one of the most interesting features for us too. The documentation also runs on a web server. Several mkdocs instances run side by side for different topics. If the search for one mkdocd now also returns the contents of the other instances, that's brilliant. Of course, these instances then run on a web server in a closed environment.

squidfunk · 2023-11-09T06:59:16Z

Thanks for sharing your setup – that sounds like a perfect test case once we have a prototype. If you like, you can subscribe to #5230 and give it a try once we have the first version out ☺️

strausmann · 2023-11-09T07:00:34Z

Very happy, we would like to test it. we are excited.

squidfunk · 2023-11-12T13:55:42Z

1st search preview is ready in #6321 – We encourage you to try it on your project and give feedback in #6321 ☺️

Note

It's still the same UI/UX, as we're currently focusing on internals. However, this PR fundamentally changes the search results, so we'd be interested to learn if you feel that it works better or worse in your documentation project. We'll be continuing to work on the internals and other parts mentioned in the OP while awaiting your feedback ☺️

squidfunk · 2023-11-20T14:06:07Z

2nd search preview is ready in #6372 – We encourage you to try it on your project and give feedback in #6372 ☺️

Note

It's still the same UI/UX, as we're currently focusing on internals. However, this PR fundamentally changes the search results, so we'd be interested to learn if you feel that it works better or worse in your documentation project. We'll be continuing to work on the internals and other parts mentioned in the OP while awaiting your feedback ☺️

AutonomousCat · 2023-11-30T00:20:38Z

I started using a Python library that uses MkDocs and this theme, and the search experience has been a bit overly stressful compared to what I'm used to with Sphinx, so I'm glad to see a search overhaul is already started.

My number 1 feature request would be a dedicated search page, and an option for the search bar to take you to it. I feel like the small window approach is not possible to fit all projects, for example API wrappers, where the results are large, but rightly so. It simply takes too much effort to go through all the "<#> more on this page" and scrolling through, only to possibly pass what you're looking for multiple times because of the small area.

That's really my main issue with search.

ctalr-jb · 2023-12-18T15:29:39Z

I'm loving the direction of this new search implementation. With the previews so far, I'm seeing a massive improvement in performance on larger sets of docs. Aside from the return of previous features, I'm definitely interested in seeing the "document metadata" and "federated search" ideas come to fruition for my own use cases.

Aruelius · 2023-12-21T09:21:22Z

I have been using mkdocs-material for over a year, it's good, but the support for Chinese search is not perfect. Both jieba and Lunr.js have very limited support for Chinese, and I know you have also been working on improving Chinese search, thank you very much!

In fact, when I was preparing to write the 2.0 document version of my project, I think that I need to use some new framework, such as Nextra, VitePress, dumi etc., cause that framework natively support Chinese search.

But I just saw this issue, I was excited, I got hope and I'll wait for better search to be released.

Merry Christmas and Happy New Year!

squidfunk · 2023-12-21T09:51:38Z

@Aruelius could you share some links to SSGs that support better Chinese search than Material for MkDocs? We're very interested in improving support, and checking some existing solutions is always a good idea. As we're writing everything from scratch, now is the best time to investigate. Please don't only share links to the SSGs, but to resources that explain how search works in those SSGs, i.e., documentation pages, blog posts, repositories. Thank you!

Also please understand that "Chinese search is not perfect" is very hard for me to turn into actionable items. I don't speak Chinese. I'm essentially trying to improve search for a language I don't understand. I will need support from Chinese speaking users. Let's create a better search experience together.

squidfunk · 2023-12-21T09:58:29Z

FWIW, a quick search surfaced that Vitepress supports two search providers:

Local search, implemented using minisearch, which does not support Chinese
Algolia search which is a hosted solution (does not work offline or without Internet connection), which supports Chinese and other languages, but has the drawback of being a third-party service. While their free offering for Open Source projects is nice (DocSearch), their paid solution is ridiculously expensive.

We're aiming to build one of the most powerful search solutions that are Open Source (not like Algolia) and can run in the browser or on the Edge, but in-browser search just cannot compete with a hosted solution. That being said, if you would like to work together on this, I'd be happy to know exactly what you expect from Chinese search, what doesn't work correctly, and maybe if you found any Open Source solutions for this problem, because IMHO, to-date, Material for MkDocs is one of the very, very few SSGs that support Chinese search at all without a third-party service.

Nonetheless, we need to improve it!

Aruelius · 2023-12-21T10:03:27Z

@Aruelius could you share some links to SSGs that support better Chinese search than Material for MkDocs? We're very interested in improving support, and checking some existing solutions is always a good idea. As we're writing everything from scratch, now is the best time to investigate. Please don't only share links to the SSGs, but to resources that explain how search works in those SSGs, i.e., documentation pages, blog posts, repositories. Thank you!

Also please understand that "Chinese search is not perfect" is very hard for me to turn into actionable items. I don't speak Chinese. I'm essentially trying to improve search for a language I don't understand. I will need support from Chinese speaking users. Let's create a better search experience together.

I'm happy to contribute everything I can do.

This is flexsearch(https://github.com/nextapps-de/flexsearch) which is what Nextra is using, maybe it can help you.

Lexachoc · 2024-02-09T19:28:06Z

I am new to material for MkDocs. The built-in search is good to use until I have a Markdown page with symbols and latex equations, as below:

Symbol	Description
A_in	absorbance $A_{in}=-\ln[I/I_0]=-\ln\tau_{in}$
$A_{in}$	absorbance $A_{in}=-\ln[I/I_0]=-\ln\tau_{in}$

I can only search for the symbols in the browser using the Ctrl F function to the first row but not the second row with Ain.
But both rows cannot be searched by entering Ain. That's not intuitive for me.

So it would be very useful if the search bar had the ability to search for the sub (sup) string, like the built-in Ctrl F in the browser, or even better, to search for Latex

I would expect to enter Ain and get the result of the preview with rendered symbols (equations) instead of the Latex syntax.

NFanoe · 2024-04-26T09:59:29Z

Have you considered some kind of faceted search? When we search for something, we get a ton of API stuff first. It would be great to be able to filter that away, or filter it in, based on maybe a metadata tag or even just a path.

squidfunk · 2024-04-26T10:29:38Z

Yes, filters (facetted search) will definitely be supported ☺️

NFanoe · 2024-07-12T12:31:47Z

I missed both previews. Any news of a new preview date (or a release)? 🥇

squidfunk · 2024-07-12T13:44:54Z

Yes, this year. Sorry for the silence – we're working very hard on another huge topic right now that has to predate the new search functionality, and we'll be resuming adding the finishing touches immediately after that. There will be a huge announcement later this year. We'll announce this here as well 🤟

squidfunk · 2024-08-19T08:33:41Z

Update: please see our latest blog post.

This is not the announcement I was talking about in my last comment, but a heads up on why things take longer than expected. We felt it was time to shed some light on what's happening behind the scenes ☺️

samuelcolvin · 2024-11-18T19:13:05Z

Better search in mkdocs material is desperately needed! What we have know is not fit for purpose.

@squidfunk do you have a timeline for better seach?

squidfunk · 2024-11-19T06:22:31Z

Yes, search is going to be released at the beginning of next year, after we've made the announcement on the new direction of Material for MkDocs we're currently working on. We've talked about the big problems we're solving right now in our latest blog post (as linked above). Please know that we're working very hard behind the scenes – in fact, I have nothing else on my table right now, besides regular maintenance of this project.

Please understand that we cannot release the search before we release the next big update that we're working on. It will be a game changer and set new standards for creating and building technical documentation. We're absolutely sure about that.

squidfunk added the announcement Issue announces news or new features label Nov 7, 2023

squidfunk pinned this issue Nov 7, 2023

HonkingGoose mentioned this issue Nov 8, 2023

Weird search result order renovatebot/renovatebot.github.io#337

Open

squidfunk linked a pull request Nov 12, 2023 that will close this issue

Search: Research Preview 1 🧪 #6321

Closed

This was referenced Nov 14, 2023

Search: highlight the first result to indicate current selection #6333

Open

Projects plugin cache not being updated #6306

Closed

HonkingGoose mentioned this issue Nov 25, 2023

Search failures at https://www.eclipse.org/openj9/docs/ eclipse-openj9/openj9-docs#831

Open

squidfunk mentioned this issue Nov 29, 2023

Instant loading: returning to index page breaks search and other features #6275

Closed

4 tasks

squidfunk mentioned this issue Feb 5, 2024

super-high cost when searching for documents apache/pekko#1097

Open

squidfunk mentioned this issue Feb 12, 2024

The info plugin does not includes inherited configurations #6750

Closed

4 tasks

unverbuggt mentioned this issue Feb 22, 2024

mkdocs-material & encrypted search: is it possible to leverage material's custom_dir to deploy your patched index.ts? unverbuggt/mkdocs-encryptcontent-plugin#66

Open

squidfunk mentioned this issue Mar 2, 2024

Search bar location in main page #6858

Closed

4 tasks

etiennebacher mentioned this issue Mar 4, 2024

Improve search on website pola-rs/r-polars#686

Open

StevenMaude mentioned this issue Mar 8, 2024

Search returns "raw" schemas before other schemas opensafely/documentation#1455

Closed

AstreaTSS mentioned this issue Mar 9, 2024

[DOCS]: Improve Search Experience [ONGOING] interactions-py/interactions.py#1628

Open

This was referenced Mar 20, 2024

Preview: Live Edit – Feedback wanted! #2110

Open

Using only spaces as search tokenizer fails to process words with '-' character #6958

Closed

squidfunk mentioned this issue Apr 12, 2024

Sanitizing search entry titles mkdocs/mkdocs#3560

Open

squidfunk mentioned this issue Apr 21, 2024

mkdocs validation warnings when referencing sub-project file via absolute link #6879

Closed

4 tasks

squidfunk removed a link to a pull request Apr 26, 2024

Search: Research Preview 1 🧪 #6321

Closed

squidfunk mentioned this issue May 3, 2024

Allow selecting "icons" or "emojis" in the "Icons, Emojis" search box #6628

Closed

4 tasks

waylan mentioned this issue May 7, 2024

Break search plugin out into separate package mkdocs/mkdocs#3698

Open

squidfunk unpinned this issue Jun 18, 2024

pedro-psb mentioned this issue Jul 1, 2024

When searching for Changelog, the results don't specify what component they are for pulp/pulp-docs#62

Open

psifertex mentioned this issue Jul 10, 2024

disable (or customize) "more on this page" in search results #7355

Closed

4 tasks

squidfunk mentioned this issue Jul 12, 2024

Update Vulnerable Underscore.js 1.8.3 #7358

Closed

4 tasks

stollero mentioned this issue Jul 24, 2024

Breadcrumbs for Search #7392

Closed

4 tasks

squidfunk mentioned this issue Oct 10, 2024

Direct link when searching for a short code astral-sh/ruff#13684

Closed

This was referenced Dec 6, 2024

Use mkdocs-material theme for docs PrairieLearn/PrairieLearn#10988

Closed

Use mkdocs-material for docs (New PR) PrairieLearn/PrairieLearn#11002

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Towards better documentation search #6307

Towards better documentation search #6307

squidfunk commented Nov 7, 2023 •

edited

Loading

strausmann commented Nov 9, 2023

squidfunk commented Nov 9, 2023

strausmann commented Nov 9, 2023

squidfunk commented Nov 9, 2023

strausmann commented Nov 9, 2023

squidfunk commented Nov 12, 2023 •

edited

Loading

squidfunk commented Nov 20, 2023

AutonomousCat commented Nov 30, 2023

ctalr-jb commented Dec 18, 2023

Aruelius commented Dec 21, 2023

squidfunk commented Dec 21, 2023 •

edited

Loading

squidfunk commented Dec 21, 2023 •

edited

Loading

Aruelius commented Dec 21, 2023

Lexachoc commented Feb 9, 2024

NFanoe commented Apr 26, 2024

squidfunk commented Apr 26, 2024

NFanoe commented Jul 12, 2024

squidfunk commented Jul 12, 2024

squidfunk commented Aug 19, 2024

samuelcolvin commented Nov 18, 2024

squidfunk commented Nov 19, 2024

Towards better documentation search #6307

Towards better documentation search #6307

Comments

squidfunk commented Nov 7, 2023 • edited Loading

Background

Where we're currently at

strausmann commented Nov 9, 2023

squidfunk commented Nov 9, 2023

strausmann commented Nov 9, 2023

squidfunk commented Nov 9, 2023

strausmann commented Nov 9, 2023

squidfunk commented Nov 12, 2023 • edited Loading

squidfunk commented Nov 20, 2023

AutonomousCat commented Nov 30, 2023

ctalr-jb commented Dec 18, 2023

Aruelius commented Dec 21, 2023

squidfunk commented Dec 21, 2023 • edited Loading

squidfunk commented Dec 21, 2023 • edited Loading

Aruelius commented Dec 21, 2023

Lexachoc commented Feb 9, 2024

NFanoe commented Apr 26, 2024

squidfunk commented Apr 26, 2024

NFanoe commented Jul 12, 2024

squidfunk commented Jul 12, 2024

squidfunk commented Aug 19, 2024

samuelcolvin commented Nov 18, 2024

squidfunk commented Nov 19, 2024

squidfunk commented Nov 7, 2023 •

edited

Loading

squidfunk commented Nov 12, 2023 •

edited

Loading

squidfunk commented Dec 21, 2023 •

edited

Loading

squidfunk commented Dec 21, 2023 •

edited

Loading