Feature #208
Similarity check
| Status: | New | Start: | ||
| Priority: | High | Due date: | ||
| Assigned to: | % Done: | 0% |
||
| Category: | extension ideas | |||
| Target version: | 2.X | |||
Description
http://libpuzzle.pureftpd.org/project/libpuzzle/php
There are tips for indexing for large image collections
http://download.pureftpd.org/pub/pure-ftpd/misc/libpuzzle/doc/README
History
Updated by velocity37 - 769 days ago
I have used Photool IMatch before, which includes many methods (slow and fast) for finding similar looking images, my results were less than great. While such things sound like they would be very useful, my personal experience has proven otherwise.
I plugged in a good twenty-thousand images, and ran several modes, my results were pretty much the same in each instance.
- Grayscale images were not effectively filtered in ANY modes. "Similar" images shown were just other greyscale images, and there are LOADS of them.
- I got MANY false hits when running the fast/medium scanning modes; I wouldn't imagine that this library would be much less or much more accurate than these modes.
There are also other things to consider with "similar" images. Perhaps the newly uploaded "similar" image is a higher-quality version, or is a 24-bit JPG while the older version was a 256-color dithered GIF. Is the user to choose which version is better? Would the user necessarily choose the right one given the option? Is an admin to make the choice, and the upload to be put on hold?
I suggest you try IMatch yourself, and you'll likely find what I did: It works, I mean, you'll find some duplicates, but there are so many false positives and it takes so much time to filter through the potential results that in the end, it's not even worth the effort.
I was only running a 20,000 image database in IMatch then, I could only imagine the number of false hits on a large-scale Shimmie.