News – DocFetcher Pro

DocFetcher Pro 1.20 Release

— Posted on December 24, 2024 by Nam-Quang Tran Subscribe to news

DocFetcher Pro 1.20 is out. This is an emergency bugfix addressing the following issues:

On macOS 15 Sequoia, the GUI was broken.
The application couldn’t be started if the system language didn’t match any of the languages into which the GUI was translated.
On macOS with right-to-left GUI layout (i.e., Arabic or Hebrew), the columns of the search result table were not right-to-left.

For the full list of changes in this release and previous releases, please see the changelog.

DocFetcher Pro 1.19 Release

— Posted on December 21, 2024 by Nam-Quang Tran Subscribe to news

DocFetcher Pro 1.19 is out. Besides various minor bugfixes and changes, this new release features GUI translations for 23 languages:

Arabic (MSA)
Chinese (Simplified)
Chinese (Traditional)
Danish
Dutch
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Italian
Japanese
Korean
Norwegian Bokmål
Polish
Portuguese
Spanish
Swedish
Turkish
Ukrainian
Vietnamese

Formerly, DocFetcher Pro was English-only and thus lagged behind the free DocFetcher in the GUI translation department. Now it has caught up with and surpassed DocFetcher, featuring 9 additional GUI translations. Moreover, it allows changing the GUI language in the preferences.

The user manual hasn’t been translated yet. This is currently in the works.

In related news, DocFetcher Pro now offers two new word segmentation options in the preferences, „Chinese“ and „Japanese“. These need to be selected to get usable search results when dealing with Chinese and Japanese text, respectively.

As for bugfixes and changes, arguably the most noteworthy are:

The minimum/maximum file size filter had a bug where if you entered something like 5 MB, then closed and restarted the application, it would still display 5 MB, but internally use 5 KB, leading to incorrect search result filtering. Curiously, this bug has been around since the first version of DocFetcher Pro without anybody noticing and reporting it. — And it probably would’ve remained, had it not been for the massive GUI overhaul during the GUI translation.
Judging by the number of support emails, one part of the DocFetcher Pro GUI that is most poorly understood by users is the purpose of the Custom Types pane, and specifically the purpose of the „Other“ checkbox in it. If you want to know how it works, please refer to the page „Custom Types“ in the user manual. Suffice it to say here that if there are no checkboxes other than the „Other“ checkbox in the Custom Types pane, and that checkbox is unticked, all search results are filtered out. The new DocFetcher Pro version detects this specific case and shows a helpful error message instead of just displaying no results, thereby putting this little bit of confusion to rest.
The indexing of filenames has been slightly improved. Now it works consistently regardless of whether or not whitespaces are used as separators in the filename.

For the full list of bugfixes and changes in this release, please see the changelog.

DocFetcher Server 1.3 Release

— Posted on August 06, 2023 by Nam-Quang Tran Subscribe to news

DocFetcher Server 1.3 is out. This release comes with an assortment of bugfixes and two new features that could be considered quite important depending on your use case.

The first new feature is that the web interface can now be opened with a ?q= URL parameter to immediately run a search with the query specified via the URL parameter. Here’s an example of what the full URL to the web interface might look like: https://example.com/search/?q=dog cat

One use case for the new URL parameter is launching DocFetcher Server searches programmatically or from a terminal. Another use case is that you can now select some text on a website, and then with the help of a browser extension quickly initiate a DocFetcher Server search with the selected text as the query.

The second new feature is new path mapping facility. This addresses the following use case: Let’s say you have a DocFetcher Server instance running on a Linux server and indexing documents located or mounted at /path/to/documents. Furthermore, the clients connecting to the DocFetcher Server instance all happen to be running on Windows. The problem is that the indexed documents are also accessible to the clients, but at a completely different mount point, e.g., X:\docs, and for some reason or other the clients need to directly open the files under X:\docs, rather than download copies of them through the web interface. Handling this use case was not possible in previous DocFetcher Server versions, now it is. In the new version, you can configure the DocFetcher Server instance to modify file paths on their way to the clients so that for example /path/to/documents is seen by clients as X:\docs.

An important caveat here is that due to the fact that the web interface is forced to run in a browser sandbox, it is not capable of directly opening files under X:\docs with the client’s local viewer application (such as Microsoft Word). So the new DocFetcher Server version comes with an additional workaround to make the process of directly opening files less painful: You can now click the file icon of a result to copy the result’s file path to the clipboard. Afterwards, you can paste the copied file path into your file manager to open the file. (Note: The action to perform when clicking the file icon can be set in the Admin Area. The default action is to download the file, not to copy its path.)

As for bugfixes in the new DocFetcher Server version, perhaps the most notable ones concern the checkbox states in the Search Scope pane, the processing of PDF annotations, and RTF bodies of Outlook emails:

As you probably know, to filter search results by location, you can tick and untick the checkboxes in the Search Scope pane in the bottom-left of the web interface. In previous versions, there was a major bug though: The checkbox states were reset after the associated index was updated or rebuilt on the server side. This became a noticeable problem if indexes were updated frequently.
Previously, there were certain PDF annotations that were indexed and searchable, but not shown in the preview pane. Also, PDF annotations placed on empty pages were completely ignored.
Previously, the RTF bodies of Outlook emails were completely ignored. Now they are properly indexed just like plain text and HTML email bodies.

For the full list of changes in this release, please see the changelog.

DocFetcher Pro 1.18 Release

— Posted on August 06, 2023 by Nam-Quang Tran Subscribe to news

DocFetcher Pro 1.18 is out. This release contains an assortment of bugfixes that accumulated since the previous release. The most notable bugfixes are:

Previously, there were certain PDF annotations that were indexed and searchable, but not shown in the preview pane. Also, PDF annotations placed on empty pages were completely ignored.
Previously, the RTF bodies of Outlook emails were completely ignored. Now they are properly indexed just like plain text and HTML email bodies.

For the full list of bugfixes in this release, please see the changelog.

DocFetcher Pro 1.17 Release

— Posted on December 22, 2022 by Nam-Quang Tran Subscribe to news

DocFetcher Pro 1.17 is out. It’s been a while since the previous release (about 9 months), mostly due to DocFetcher Server, and since then a number of bugfixes and a few small feature additions have accumulated. In total, 8 bugs and crashes were fixed, some related to indexing, some related to the Search Scope pane, and some related to the preview pane.

As for features, the new release allows setting a file size limit on the files to be indexed, e.g., „don’t index files bigger than 500 MB“. The significance of this size limit is that it greatly reduces the risk of the application running out of memory while trying to index enormous files. The other new features are a new „Copy With Path“ entry in the context menu of the preview pane, and an encoding override setting for HTML files in the Advanced Settings file.

For a detailed listing of the bugfixes and features, please see the changelog.

DocFetcher Server 1.2 Release

— Posted on December 21, 2022 by Nam-Quang Tran Subscribe to news

DocFetcher Server 1.2 is out. This release fixes two indexing-related crashes and a major bug concerning index updates. The bug, as explained in the changelog: If a subfolder within an indexed folder was renamed, moved or removed, the files in that subfolder were not properly updated in the index, causing obsolete files to show up in the search results along the current ones. The obsolete files will disappear once you upgrade to the new release and update all your indexes.

It appears that recently, an unknown number of emails sent to the official support email address (support .. docfetcherpro.com) were lost. They were likely blocked on their way through Google’s servers for unknown reasons. If you sent an email to the support address and didn’t get a response, that’s why. Please resend your email to the address currently listed on the support page if your issue still persists. Apologies for the inconvenience!

DocFetcher Server 1.1 Release

— Posted on August 30, 2022 by Nam-Quang Tran Subscribe to news

DocFetcher Server 1.1 is out. This releases is mostly a collection of usability improvements to round out some of the rough edges of the first release. Most items on the changelog are more or less significant, so in this release announcement there isn’t really much to do except copy and paste the entire changelog:

The application could not be run on older Linux distributions due to a glibc compatibility issue. It now runs on Linux distributions with glibc 2.17 or newer.
Among the indexing settings, there’s now a new setting for skipping content indexing for all files that are bigger than a certain maximum file size. The filenames of these files can still be indexed. With this new setting, it’s now possible to skip large files that may cause the application to run out of memory during indexing.
In the Admin Area in the indexes table, there’s a new column „Visible“. By ticking and unticking the checkboxes in that column, you can control which indexes are transmitted to the clients, and which indexes are kept only on the server side. This is useful if you have some very large and rarely used indexes; keeping them on the server side when they’re not needed will make the web interface load faster.
In the result table, you can now press the arrow-up and arrow-down keys to navigate to the previous or next result, with the contents of the preview pane updated accordingly. However, for this to work properly, you need to turn off „Automatically scroll to first match in preview pane“ in the user preferences.
For tablet users, there’s now a button above the result table for downloading the first selected result. Before, downloading results was only possible via the result table’s context menu, which is difficult to open on tablets.
For technical reasons, the users that are counted towards the application’s client limit are identified by browser session rather than by IP address. Among other things, this means accessing the web interface from multiple browsers on the same computer counts as multiple clients rather than as a single client. For some users, this can be quite inconvenient. In addition, it was also inconvenient for a single user to switch between different computers. For these use cases, there’s now a workaround called „session stealing“, which means that when the client limit is reached, new users may take over existing sessions, thus kicking their previous owners out of the web interface. The intended use is for users to kick „themselves“ out in order to more easily switch between browsers and/or computers. For instances of DocFetcher Server with a client limit greater than 1, session stealing is off by default and must be enabled in the Admin Area on the Access tab.
It’s now possible to inject custom CSS and JavaScript into the web interface, on the server side, via the files misc/custom.css and misc/custom.js. Note that no support or HTML stability guarantees for such customizations is provided, as explained in the comments in the custom.css and custom.js files.

DocFetcher Server 1.0 Release

— Posted on July 30, 2022 by Nam-Quang Tran Subscribe to news

After over a year of development, DocFetcher Server 1.0 is now finally out!

For those not in the know, DocFetcher Server is a cousin of DocFetcher and DocFetcher Pro, featuring a proper implementation of the long-requested multi-user and remote-access support that is poorly implemented in DocFetcher and unavailable in DocFetcher Pro.

In essence, DocFetcher Server is a background process that runs on a server computer, indexes files on that computer, and makes those files searchable and downloadable for one or more clients through their web browsers. Typically, you’d want to deploy this kind of software on a server machine in a private or company network, or deploy it on a rented server machine for remote access to your files. Accordingly, DocFetcher Server is geared more towards businesses than individuals, and this combined with the substantially more complex server technology under the hood is why the software is situated at a higher price range than DocFetcher Pro.

During the first 3 weeks after launch, until August 21, 2022, DocFetcher Server will be available at a reduced price (15% off). This will hopefully compensate for any early-release bugs. If you do find any bugs, please help getting them fixed by reporting them to Support.

To be sure, development of DocFetcher Server took much longer than initially expected. Like, how hard can it be to build a web UI on top of the existing DocFetcher Pro core? — Well, turns out, very hard, for two reasons: First, the existing desktop UI turned out to be a lot bigger and deeper than expected, and taking it to the web revealed all kinds of hidden features that took a considerable amount of time to reimplement. And second, designing a web UI turned out to be not only considerably more complex than, but also vastly different from designing a desktop UI, so that in the end very little existing UI code could be reused. — You may not realize this, but a web UI is basically an HTML page pretending to be a user interface. Naturally, all kinds of hacks are involved to make this happen.

Unfortunately, because of these difficulties, DocFetcher Server currently lacks some of the more advanced and/or less frequently used features of DocFetcher Pro, notably the ability to load and save indexing settings, CSV export of search results and indexing errors, and the file size and Custom Types filters. For a complete list of missing features, see this page. — Reimplementing all of these would probably have added three months or more to the development process, and the madness had to stop somewhere.

Speaking of the past, DocFetcher Server was formerly announced as „DocFetcher Pro Server“, but in the end the „Pro“ was dropped for the sake of brevity. It’s still „Pro“ software though, even more so than DocFetcher Pro!

So, now that DocFetcher Server is out, what about the future of the DocFetcher project? Of course, there are plans for DocFetcher Pro 2.0 and DocFetcher Server 2.0, but no, don’t expect them to come out in the near future. The thing is, the DocFetcher project has been on a development sprint for over two years now (since early 2020), producing as results DocFetcher Pro, a major bugfix release of DocFetcher, and DocFetcher Server. During this sprint, cleanup and maintenance activities were mostly left by the wayside, and this is not sustainable in the long run.

Consequently, the DocFetcher project will now enter a prolonged cleanup and maintenance phase, which will probably produce very little in terms of visible, flashy results, but will produce a lot of internal changes that contribute significantly to the long-term future of the project. To give just two concrete examples:

Since the very beginning, before 2007, DocFetcher has been developed in a development environment known as the Eclipse IDE. Unfortunately, the latter has fallen out of favor with the development community for some time now, and some vital tools needed to develop DocFetcher are no longer being updated. That’s why the DocFetcher project sooner or later needs to get off that sinking ship and migrate to a new development environment. On that occasion, the currently used programming language Scala needs to be upgraded from the aging Scala 2 to the new Scala 3 as well. All this takes a lot of work and will produce exactly zero new features.
DocFetcher Pro and DocFetcher Server will be equipped with so-called „unit tests“, which is programming jargon for automated testing of the software. Among other things, unit tests serve as a safeguard against new major features and changes breaking existing functionality. — That’s just what’s needed when new major features and changes finally get implemented for DocFetcher Pro 2.0 and DocFetcher Server 2.0. Unit tests themselves produce exactly zero new features, unfortunately, but they support the addition of new features down the road.

So, hopefully, DocFetcher Server 1.0 was worth the long wait, for those who were waiting, and hopefully whatever big thing comes next will be worth the wait too. Until then, you can expect to see some more bugfixing in DocFetcher, DocFetcher Pro and DocFetcher Server.

On a final note, until now everyone who bought DocFetcher Pro was automatically subscribed to the DocFetcher Pro newsletter. With the arrival of DocFetcher Server, this has to change a little: The DocFetcher Pro newsletter will cover both DocFetcher Pro and DocFetcher Server news, but if you bought only one of them, you will only receive the subset of the newsletter that pertains to the product you bought. If you want the full newsletter instead, you have to subscribe manually on the Subscribe page. This news article right here will be the first and last article about a DocFetcher Server release that DocFetcher Pro users will receive, unless they subscribe to the full newsletter.

DocFetcher Pro 1.16 Release

— Posted on March 15, 2022 by Nam-Quang Tran Subscribe to news

DocFetcher Pro 1.16 has just been released. A crash related to RAR archives was fixed, and in the result table the date display in the „Last Modified“ column was changed to a fixed „yyyy-MM-dd, HH:mm“ format. This format no longer depends on the system locale.

As for DocFetcher Pro Server, the upcoming search server based on DocFetcher Pro, the release timeframe needs to be pushed back again, unfortunately. It will not be ready by the end of Q1 2022, and is now tentatively scheduled for Q2 or Q3 2022. The delay this time was mostly due to a single major distraction: Having to relocate to another country for personal reasons. This caused development to go off the rails for a while. Since then, things have quieted down again and the project is back on track.

Moreover, the release estimate of Q2 or Q3 is now a little more conservative, as development of DocFetcher Pro Server turned out to be a huge undertaking that basically requires rewriting half the code base. To be more specific, in order to transform DocFetcher Pro, a traditional desktop application, into a proper web application, the entire user interface needs to be swapped out and rewritten.

At present, DocFetcher Pro Server is in an early alpha stage and contains all the basic features, but still suffers from too many holes and problems to be ready for real-world use. This is now what remains to be tackled before the software can be released.

DocFetcher Pro 1.15 Release

— Posted on November 09, 2021 by Nam-Quang Tran Subscribe to news

DocFetcher Pro 1.15 has just been released. This is just a minor maintenance release with fixes for three bugs: Two crashes related to tar.gz and 7z archives, respectively, and one minor issue with the preview pane. For details, please see the changelog.

There hasn’t been much activity on DocFetcher Pro lately, mainly because of the focus on the upcoming DocFetcher Pro Server. Another reason is that most bugs in DocFetcher Pro seem to have already been fixed.

For those not in the know, DocFetcher Pro Server is a variant of DocFetcher Pro that is intended to be run as an indexing and search server, and that can be accessed by one or more clients remotely through the web browser.

Regarding the release date of DocFetcher Pro Server, there’s bad news, unfortunately: It will likely not be ready by the end of 2021, as previously announced. The release is now tentatively scheduled for Q1 2022. There isn’t any single big reason why this happened, just many small reasons that accumulated and conspired to cause DocFetcher Pro Server to fall behind schedule. That being said, the project continues to trot along at a steady pace.

As to the current state of DocFetcher Pro Server, there’s now a working prototype that runs on all supported platforms, i.e., Windows, Linux and macOS. Searching, filtering by file location and the preview pane all work. However, there are also many important gaps that still need to be filled — e.g., a login screen with accompanying user and password management, so you can access your search server over the internet without giving everybody else on the internet access to the server as well.