Header Image

DocFetcher Server 1.1 Release

DocFetcher Server 1.1 is out. This releases is mostly a collection of usability improvements to round out some of the rough edges of the first release. Most items on the changelog are more or less significant, so in this release announcement there isn't really much to do except copy and paste the entire changelog:

  • The application could not be run on older Linux distributions due to a glibc compatibility issue. It now runs on Linux distributions with glibc 2.17 or newer.
  • Among the indexing settings, there's now a new setting for skipping content indexing for all files that are bigger than a certain maximum file size. The filenames of these files can still be indexed. With this new setting, it's now possible to skip large files that may cause the application to run out of memory during indexing.
  • In the Admin Area in the indexes table, there's a new column "Visible". By ticking and unticking the checkboxes in that column, you can control which indexes are transmitted to the clients, and which indexes are kept only on the server side. This is useful if you have some very large and rarely used indexes; keeping them on the server side when they're not needed will make the web interface load faster.
  • In the result table, you can now press the arrow-up and arrow-down keys to navigate to the previous or next result, with the contents of the preview pane updated accordingly. However, for this to work properly, you need to turn off "Automatically scroll to first match in preview pane" in the user preferences.
  • For tablet users, there's now a button above the result table for downloading the first selected result. Before, downloading results was only possible via the result table's context menu, which is difficult to open on tablets.
  • For technical reasons, the users that are counted towards the application's client limit are identified by browser session rather than by IP address. Among other things, this means accessing the web interface from multiple browsers on the same computer counts as multiple clients rather than as a single client. For some users, this can be quite inconvenient. In addition, it was also inconvenient for a single user to switch between different computers. For these use cases, there's now a workaround called "session stealing", which means that when the client limit is reached, new users may take over existing sessions, thus kicking their previous owners out of the web interface. The intended use is for users to kick "themselves" out in order to more easily switch between browsers and/or computers. For instances of DocFetcher Server with a client limit greater than 1, session stealing is off by default and must be enabled in the Admin Area on the Access tab.
  • It's now possible to inject custom CSS and JavaScript into the web interface, on the server side, via the files misc/custom.css and misc/custom.js. Note that no support or HTML stability guarantees for such customizations is provided, as explained in the comments in the custom.css and custom.js files.

DocFetcher Server 1.0 Release

After over a year of development, DocFetcher Server 1.0 is now finally out!

For those not in the know, DocFetcher Server is a cousin of DocFetcher and DocFetcher Pro, featuring a proper implementation of the long-requested multi-user and remote-access support that is poorly implemented in DocFetcher and unavailable in DocFetcher Pro.

In essence, DocFetcher Server is a background process that runs on a server computer, indexes files on that computer, and makes those files searchable and downloadable for one or more clients through their web browsers. Typically, you'd want to deploy this kind of software on a server machine in a private or company network, or deploy it on a rented server machine for remote access to your files. Accordingly, DocFetcher Server is geared more towards businesses than individuals, and this combined with the substantially more complex server technology under the hood is why the software is situated at a higher price range than DocFetcher Pro.

During the first 3 weeks after launch, until August 21, 2022, DocFetcher Server will be available at a reduced price (15% off). This will hopefully compensate for any early-release bugs. If you do find any bugs, please help getting them fixed by reporting them to Support.

To be sure, development of DocFetcher Server took much longer than initially expected. Like, how hard can it be to build a web UI on top of the existing DocFetcher Pro core? --- Well, turns out, very hard, for two reasons: First, the existing desktop UI turned out to be a lot bigger and deeper than expected, and taking it to the web revealed all kinds of hidden features that took a considerable amount of time to reimplement. And second, designing a web UI turned out to be not only considerably more complex than, but also vastly different from designing a desktop UI, so that in the end very little existing UI code could be reused. --- You may not realize this, but a web UI is basically an HTML page pretending to be a user interface. Naturally, all kinds of hacks are involved to make this happen.

Unfortunately, because of these difficulties, DocFetcher Server currently lacks some of the more advanced and/or less frequently used features of DocFetcher Pro, notably the ability to load and save indexing settings, CSV export of search results and indexing errors, and the file size and Custom Types filters. For a complete list of missing features, see this page. --- Reimplementing all of these would probably have added three months or more to the development process, and the madness had to stop somewhere.

Speaking of the past, DocFetcher Server was formerly announced as "DocFetcher Pro Server", but in the end the "Pro" was dropped for the sake of brevity. It's still "Pro" software though, even more so than DocFetcher Pro!

So, now that DocFetcher Server is out, what about the future of the DocFetcher project? Of course, there are plans for DocFetcher Pro 2.0 and DocFetcher Server 2.0, but no, don't expect them to come out in the near future. The thing is, the DocFetcher project has been on a development sprint for over two years now (since early 2020), producing as results DocFetcher Pro, a major bugfix release of DocFetcher, and DocFetcher Server. During this sprint, cleanup and maintenance activities were mostly left by the wayside, and this is not sustainable in the long run.

Consequently, the DocFetcher project will now enter a prolonged cleanup and maintenance phase, which will probably produce very little in terms of visible, flashy results, but will produce a lot of internal changes that contribute significantly to the long-term future of the project. To give just two concrete examples:

  1. Since the very beginning, before 2007, DocFetcher has been developed in a development environment known as the Eclipse IDE. Unfortunately, the latter has fallen out of favor with the development community for some time now, and some vital tools needed to develop DocFetcher are no longer being updated. That's why the DocFetcher project sooner or later needs to get off that sinking ship and migrate to a new development environment. On that occasion, the currently used programming language Scala needs to be upgraded from the aging Scala 2 to the new Scala 3 as well. All this takes a lot of work and will produce exactly zero new features.
  2. DocFetcher Pro and DocFetcher Server will be equipped with so-called "unit tests", which is programming jargon for automated testing of the software. Among other things, unit tests serve as a safeguard against new major features and changes breaking existing functionality. --- That's just what's needed when new major features and changes finally get implemented for DocFetcher Pro 2.0 and DocFetcher Server 2.0. Unit tests themselves produce exactly zero new features, unfortunately, but they support the addition of new features down the road.

So, hopefully, DocFetcher Server 1.0 was worth the long wait, for those who were waiting, and hopefully whatever big thing comes next will be worth the wait too. Until then, you can expect to see some more bugfixing in DocFetcher, DocFetcher Pro and DocFetcher Server.

On a final note, until now everyone who bought DocFetcher Pro was automatically subscribed to the DocFetcher Pro newsletter. With the arrival of DocFetcher Server, this has to change a little: The DocFetcher Pro newsletter will cover both DocFetcher Pro and DocFetcher Server news, but if you bought only one of them, you will only receive the subset of the newsletter that pertains to the product you bought. If you want the full newsletter instead, you have to subscribe manually on the Subscribe page. This news article right here will be the first and last article about a DocFetcher Server release that DocFetcher Pro users will receive, unless they subscribe to the full newsletter.

DocFetcher Pro 1.16 Release

DocFetcher Pro 1.16 has just been released. A crash related to RAR archives was fixed, and in the result table the date display in the "Last Modified" column was changed to a fixed "yyyy-MM-dd, HH:mm" format. This format no longer depends on the system locale.

As for DocFetcher Pro Server, the upcoming search server based on DocFetcher Pro, the release timeframe needs to be pushed back again, unfortunately. It will not be ready by the end of Q1 2022, and is now tentatively scheduled for Q2 or Q3 2022. The delay this time was mostly due to a single major distraction: Having to relocate to another country for personal reasons. This caused development to go off the rails for a while. Since then, things have quieted down again and the project is back on track.

Moreover, the release estimate of Q2 or Q3 is now a little more conservative, as development of DocFetcher Pro Server turned out to be a huge undertaking that basically requires rewriting half the code base. To be more specific, in order to transform DocFetcher Pro, a traditional desktop application, into a proper web application, the entire user interface needs to be swapped out and rewritten.

At present, DocFetcher Pro Server is in an early alpha stage and contains all the basic features, but still suffers from too many holes and problems to be ready for real-world use. This is now what remains to be tackled before the software can be released.

DocFetcher Pro 1.15 Release

DocFetcher Pro 1.15 has just been released. This is just a minor maintenance release with fixes for three bugs: Two crashes related to tar.gz and 7z archives, respectively, and one minor issue with the preview pane. For details, please see the changelog.

There hasn't been much activity on DocFetcher Pro lately, mainly because of the focus on the upcoming DocFetcher Pro Server. Another reason is that most bugs in DocFetcher Pro seem to have already been fixed.

For those not in the know, DocFetcher Pro Server is a variant of DocFetcher Pro that is intended to be run as an indexing and search server, and that can be accessed by one or more clients remotely through the web browser.

Regarding the release date of DocFetcher Pro Server, there's bad news, unfortunately: It will likely not be ready by the end of 2021, as previously announced. The release is now tentatively scheduled for Q1 2022. There isn't any single big reason why this happened, just many small reasons that accumulated and conspired to cause DocFetcher Pro Server to fall behind schedule. That being said, the project continues to trot along at a steady pace.

As to the current state of DocFetcher Pro Server, there's now a working prototype that runs on all supported platforms, i.e., Windows, Linux and macOS. Searching, filtering by file location and the preview pane all work. However, there are also many important gaps that still need to be filled --- e.g., a login screen with accompanying user and password management, so you can access your search server over the internet without giving everybody else on the internet access to the server as well.

DocFetcher Pro 1.14 Release

DocFetcher Pro 1.14 has just been released. It's been about two months since the previous release, and since then a grab bag of fixes and changes had accumulated. All in all, three bugs have been fixed, and the various changes mostly revolve around preventing more bugs and crashes. The most notable changes are as follows:

  1. It turns out that some programs, such as Thunderbird, like using files without file extension as gigantic data dumps, e.g., "Trash" or "Trash-1". These files can often be 1 GB or more in size. However, DocFetcher Pro will by default try to index files without file extension as plain text files, and will likely fall over and die if it runs into these gigantic files. To prevent this, there's now a new setting on the indexing dialog that by default will make the program exclude files having no file extension and being bigger than a certain file size from indexing. (And if you're wondering why this new file size limit is applied only to files without file extension, and not to all files, that's because a reasonable value for the file size limit depends on the file type. For instance, Excel files may reasonably be capped at, say, 100 MB, while PST files should be allowed to be 2 GB in size or more. A file-type-dependent file size limit may be added in a future major release.)
  2. In case of a fatal indexing crash, the program now reports the path of the last file it worked on. This helps locate the file that likely caused the crash.
  3. Previously, indexes that could not be loaded due to a broken tree.xml file in them were silently ignored. Now such index loading failures will be reported at startup. This helps you with identifying and removing broken indexes that are uselessly occupying disk space.
  4. Result table: For emails, the value in the size column now includes not only the email body size, but also the size of any attachments. Note that you must rebuild your indexes before this new size value is displayed.

For the full list of fixes and changes, please see the changelog.

DocFetcher Pro 1.13 Release

DocFetcher Pro 1.13 has just been released.

The bundled Java runtime has been downgraded from Java 16 to Java 11 due to some reports of stability issues. --- Apparently, it's unwise to live on the bleeding edge of Java technology. The downgrade has virtually no impact on performance: In the previously discussed benchmark, going from Java 16 to Java 11 results in only about 100 ms of additional index loading time.

In addition to the Java runtime downgrade, this release fixes a handful of crashes and some other issues. For details, see the changelog.

DocFetcher Pro 1.12 Release: Java Downgrade

Yesterday's release of DocFetcher Pro 1.12 brought an upgrade of the bundled Java runtime from Java 8 to Java 16. Since then, there have been reports of macOS users being unable to launch the new 1.12 release, specifically users running macOS 11.3 and 11.4.

The likely reason for this launch issue is an incompatibility between the latest macOS versions and Java 16, and downgrading to Java 11 seems to resolve the problem. Therefore, the DocFetcher Pro 1.12 release files for macOS have been replaced on Gumroad. They have the same filenames as before, but now come bundled with Java 11. Give these new files a try if you experienced the 1.12 launch issue. The release files for Windows and Linux remain unchanged, if you've bought any of those.

In terms of performance, there's virtually no difference between Java 11 and Java 16. Specifically, these are the measured index loading times with the benchmark from the previous post:

  • DocFetcher Pro 1.11 + Java 8: 63.048 s (100%)
  • DocFetcher Pro 1.12 + Java 8: 15.316 s (24%)
  • DocFetcher Pro 1.12 + Java 11: 8.824 s (14%)
  • DocFetcher Pro 1.12 + Java 16: 8.726 s (14%)

As you can see from the figures, the downgrade from Java 16 to Java 11 results in an increase of index loading time by a measly 100 ms, which may very well be a statistical fluke. As before, all load times are averages over 10 test runs each.

DocFetcher Pro 1.12 Release

DocFetcher Pro 1.12 has just been released. This release fixes a 7z- and a tar-archive-related crash, and brings dramatic performance improvements with respect to the handling of large indexes. The performance improvements were implemented partly in response to user feedback, and partly in preparation for the upcoming DocFetcher Pro Server, which will likely have to deal with much larger data sets than the current non-server version.

For some context: DocFetcher Pro is a full rewrite of the original DocFetcher, so under the hood they are actually two completely different programs. During the rewrite, the emphasis was on correctness and long-term maintainability, while performance was less of a concern. As a result, DocFetcher Pro used to be, up until now, generally slower than DocFetcher, although this difference in performance was not noticeable until you tried to index hundreds of thousands of files. When you did index that much data, the program would exhibit various kinds of slowness: It took forever to load indexes during startup, exiting the program became slow, and checking and unchecking folders in the Search Scope pane also became slow.

In the new 1.12 release, these performance issues were addressed with a major restructuring of some performance-critical code sections, numerous smaller code optimizations here and there, and an upgrade of the bundled Java runtime from Java 8 to Java 16. In a benchmark with an index comprising about 2.8 million files, these changes led to dramatic performance improvements, notably a reduction of index loading time from 63.048 s down to 8.726 s, which is a reduction down to about 14%, or 1/6. (Note: The two loading times are averages over 10 test runs each.) Other index-related operations are now also significantly faster.

DocFetcher Pro 1.11 Release

DocFetcher Pro 1.11 has just been released. The previously announced 7z v0.4 support is now implemented, meaning DocFetcher Pro can now read 7z archives created with the latest versions of 7-Zip.

In addition to 7z v0.4 support, an Outlook indexing crash has been fixed, and a --disable-auto-index-update launch parameter has been added. For details, see the changelog.

Some users have expressed dissatisfaction with DocFetcher Pro's lack of a global hotkey, which in the Windows and Linux versions of DocFetcher can be pressed to bring the program window to the front. In DocFetcher Pro, this is in fact also possible, with the help of additional software. For details, see the question "Can you bring back the global hotkey from DocFetcher?" on the DocFetcher Pro FAQ.

With last week's release of DocFetcher 1.1.24 and the addition of 7z v0.4 support, the next major item on the development agenda is now DocFetcher Pro Server. As stated before, this will be a huge undertaking, so it's going to take quite a while, and furthermore, no reliable release date can be given at this time. That being said, it will probably be finished before the year is over.

DocFetcher Pro 1.10 Release

DocFetcher Pro 1.10 has just been released. This time, two problems were fixed: First, performance issues and a crash related to the preview pane line numbers that were introduced in the previous 1.9 release. And second, with type-ahead search enabled, phrase search and proximity search worked incorrectly when combined with wildcards. For details, see the changelog.

The development agenda posted earlier still applies.