Address the poor performance of the existing unique-name generation (#17944)

* Address the poor performance of the existing unique-name generation

As described in Issue 16849, the existing Tools::getUniqueName method
requires calling code to form a vector of existing names to be avoided.

This leads to poor performance both in the O(n) cost of building such a
vector and also getUniqueName's O(n) algorithm for actually generating
the unique name (where 'n' is the number of pre-existing names).

This has  particularly noticeable cost in documents with large numbers
of DocumentObjects because generating both Names and Labels for each new
object incurs this cost. During an operation such as importing this
results in an O(n^2) time spent generating names.

The other major cost is in the saving of the temporary backup file,
which uses name generation for the "files" embedded in the Zip file.
Documents can easily need several such "files" for each object in the
document.

This update includes the following changes:

Create UniqueNameManager to keep a list of existing names organized in
a manner that eases unique-name generation. This class essentially acts
as a set of names, with the ability to add and remove names and check if
a name is already there, with the added ability to take a prototype name
and generate a unique form for it which is not already in the set.

Eliminate Tools::getUniqueName

Make DocumentObject naming use the new UniqueNameManager class

Make DocumentObject Label naming use the new UniqueNameManager class.
Labels are not always unique; unique labels are generated if the
settings at the time request it (and other conditions). Because of this
the Label management requires additionally keeping a map of counts
for labels which already exist more than once.
These collections are maintained via notifications of value changes on
the Label properties of the objects in the document.

Add Document::containsObject(DocumentObject*) for a definitive
test of an object being in a Document. This is needed because
DocumentObjects can be in a sort of limbo (e.g. when they are in the
Undo/Redo lists) where they have a parent linkage to the Document but
should not participate in Label collision checks.

Rename Document.getStandardObjectName to getStandardObjectLabel
to better represent what it does.

Use new UniqueNameManager for Writer internal filenames within the zip
file.

Eliminate unneeded Reader::FileNames collection. The file names
already exist in the FileList collection elements. The only existing
use for the FileNames collection was to determine if there were any
files at all, and with FileList and FileNames being parallel
vectors, they both had the same length so FileList could be used
for this test..

Use UniqueNameManager for document names and labels. This uses ad hoc
UniqueNameManager objects created on the spot on the assumption that
document creation is relatively rare and there are few documents, so
although the cost is O(n), n itself is small.

Use an ad hoc UniqueNameManager to name new DymanicProperty entries.
This is only done if a property of the proposed name already exists,
since such a check is more-or-less O(log(n)), almost never finds a
collision, and avoids the O(n) building of the UniqueNameManager.
If there is a collision an ad-hoc UniqueNameManager is built
and discarded after use.
The property management classes have a bit of a mess of methods
including several to populate various collection types with all
existing properties. Rather than introducing yet another such
collection-specific method to fill a UniqueNameManager, a
visitProperties method was added which calls a passed function for
each property. The existing code would be simpler if existing
fill-container methods all used this.
Ideally the PropertyContainer class would keep a central directory of
all properties ("static", Dynamic, and exposed by ExtensionContainer and
other derivations) and a permanent UniqueNameManager. However the
Property management is a bit of a mess making such a change a project
unto itself.

The unit tests for Tools:getUniqueName have been changed to test
UniqueNameManager.makeUniqueName instead.
This revealed a small regression insofar as passing a prototype name
like "xyz1234" to the old code would yield "xyz1235" whether or
not "xyz1234" already existed, while the new code will return the next
name above the currently-highest name on the "xyz" model, which could
be "xyz" or "xyz1".

* Correct wrong case on include path

* Implement suggested code changes
Also change the semantics of visitProperties to not have any short-circuit return

* Remove reference through undefined iterator

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix up some comments for DOxygen

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
Kevin Martin
2024-12-13 11:54:46 -05:00
committed by GitHub
parent b1f93bc51e
commit 83202d8ad6
28 changed files with 721 additions and 435 deletions

View File

@@ -33,130 +33,214 @@
#include "Interpreter.h"
#include "Tools.h"
namespace Base
void Base::UniqueNameManager::PiecewiseSparseIntegerSet::Add(uint value)
{
struct string_comp
etype newSpan(value, 1);
iterator above = Spans.lower_bound(newSpan);
if (above != Spans.end() && above->first <= value) {
// The found span includes value so there is nothing to do as it is already in the set.
return;
}
// Set below to the next span down, if any
iterator below;
if (above == Spans.begin()) {
below = Spans.end();
}
else {
below = above;
--below;
}
if (above != Spans.end() && below != Spans.end()
&& above->first - below->first + 1 == below->second) {
// below and above have a gap of exactly one between them, and this must be value
// so we coalesce the two spans (and the gap) into one.
newSpan = etype(below->first, below->second + above->second + 1);
Spans.erase(above);
above = Spans.erase(below);
}
if (below != Spans.end() && value - below->first == below->second) {
// value is adjacent to the end of below, so just expand below by one
newSpan = etype(below->first, below->second + 1);
above = Spans.erase(below);
}
else if (above != Spans.end() && above->first - value == 1) {
// value is adjacent to the start of above, so juse expand above down by one
newSpan = etype(above->first - 1, above->second + 1);
above = Spans.erase(above);
}
// else value is not adjacent to any existing span, so just make anew span for it
Spans.insert(above, newSpan);
}
void Base::UniqueNameManager::PiecewiseSparseIntegerSet::Remove(uint value)
{
// s1 and s2 must be numbers represented as string
bool operator()(const std::string& s1, const std::string& s2)
{
if (s1.size() < s2.size()) {
return true;
}
if (s1.size() > s2.size()) {
return false;
}
return s1 < s2;
etype newSpan(value, 1);
iterator at = Spans.lower_bound(newSpan);
if (at == Spans.end() || at->first > value) {
// The found span does not include value so there is nothing to do, as it is already not in
// the set.
return;
}
static std::string increment(const std::string& s)
{
std::string n = s;
int addcarry = 1;
for (std::string::reverse_iterator it = n.rbegin(); it != n.rend(); ++it) {
if (addcarry == 0) {
break;
}
int d = *it - 48;
d = d + addcarry;
*it = ((d % 10) + 48);
addcarry = d / 10;
}
if (addcarry > 0) {
std::string b;
b.resize(1);
b[0] = addcarry + 48;
n = b + n;
}
return n;
if (at->second == 1) {
// value is the only in this span, just remove the span
Spans.erase(at);
}
};
class unique_name
else if (at->first == value) {
// value is the first in this span, trim the lower end
etype replacement(at->first + 1, at->second - 1);
Spans.insert(Spans.erase(at), replacement);
}
else if (value - at->first == at->second - 1) {
// value is the last in this span, trim the upper end
etype replacement(at->first, at->second - 1);
Spans.insert(Spans.erase(at), replacement);
}
else {
// value is in the moddle of the span, so we must split it.
etype firstReplacement(at->first, value - at->first);
etype secondReplacement(value + 1, at->second - ((value + 1) - at->first));
// Because erase returns the iterator after the erased element, and insert returns the
// iterator for the inserted item, we want to insert secondReplacement first.
Spans.insert(Spans.insert(Spans.erase(at), secondReplacement), firstReplacement);
}
}
bool Base::UniqueNameManager::PiecewiseSparseIntegerSet::Contains(uint value) const
{
public:
unique_name(std::string name, const std::vector<std::string>& names, int padding)
: base_name {std::move(name)}
, padding {padding}
{
removeDigitsFromEnd();
findHighestSuffix(names);
}
std::string get() const
{
return appendSuffix();
}
private:
void removeDigitsFromEnd()
{
std::string::size_type pos = base_name.find_last_not_of("0123456789");
if (pos != std::string::npos && (pos + 1) < base_name.size()) {
num_suffix = base_name.substr(pos + 1);
base_name.erase(pos + 1);
}
}
void findHighestSuffix(const std::vector<std::string>& names)
{
for (const auto& name : names) {
if (name.substr(0, base_name.length()) == base_name) { // same prefix
std::string suffix(name.substr(base_name.length()));
if (!suffix.empty()) {
std::string::size_type pos = suffix.find_first_not_of("0123456789");
if (pos == std::string::npos) {
num_suffix = std::max<std::string>(num_suffix, suffix, Base::string_comp());
}
}
}
}
}
std::string appendSuffix() const
{
std::stringstream str;
str << base_name;
if (padding > 0) {
str.fill('0');
str.width(padding);
}
str << Base::string_comp::increment(num_suffix);
return str.str();
}
private:
std::string num_suffix;
std::string base_name;
int padding;
};
} // namespace Base
std::string
Base::Tools::getUniqueName(const std::string& name, const std::vector<std::string>& names, int pad)
{
if (names.empty()) {
return name;
}
Base::unique_name unique(name, names, pad);
return unique.get();
iterator at = Spans.lower_bound(etype(value, 1));
return at != Spans.end() && at->first <= value;
}
std::string Base::Tools::addNumber(const std::string& name, unsigned int num, int d)
std::tuple<uint, uint> Base::UniqueNameManager::decomposeName(const std::string& name,
std::string& baseNameOut,
std::string& nameSuffixOut) const
{
std::stringstream str;
str << name;
if (d > 0) {
str.fill('0');
str.width(d);
auto suffixStart = std::make_reverse_iterator(GetNameSuffixStartPosition(name));
nameSuffixOut = name.substr(name.crend() - suffixStart);
auto digitsStart = std::find_if_not(suffixStart, name.crend(), [](char c) {
return std::isdigit(c);
});
baseNameOut = name.substr(0, name.crend() - digitsStart);
uint digitCount = digitsStart - suffixStart;
if (digitCount == 0) {
// No digits in name
return std::tuple<uint, uint> {0, 0};
}
str << num;
return str.str();
else {
return std::tuple<uint, uint> {
digitCount,
std::stoul(name.substr(name.crend() - digitsStart, digitCount))};
}
}
void Base::UniqueNameManager::addExactName(const std::string& name)
{
std::string baseName;
std::string nameSuffix;
uint digitCount;
uint digitsValue;
std::tie(digitCount, digitsValue) = decomposeName(name, baseName, nameSuffix);
baseName += nameSuffix;
auto baseNameEntry = UniqueSeeds.find(baseName);
if (baseNameEntry == UniqueSeeds.end()) {
// First use of baseName
baseNameEntry =
UniqueSeeds.emplace(baseName, std::vector<PiecewiseSparseIntegerSet>()).first;
}
if (digitCount >= baseNameEntry->second.size()) {
// First use of this digitCount
baseNameEntry->second.resize(digitCount + 1);
}
PiecewiseSparseIntegerSet& baseNameAndDigitCountEntry = baseNameEntry->second[digitCount];
// Name should not already be there
assert(!baseNameAndDigitCountEntry.Contains(digitsValue));
baseNameAndDigitCountEntry.Add(digitsValue);
}
std::string Base::UniqueNameManager::makeUniqueName(const std::string& modelName,
int minDigits) const
{
std::string namePrefix;
std::string nameSuffix;
decomposeName(modelName, namePrefix, nameSuffix);
std::string baseName = namePrefix + nameSuffix;
auto baseNameEntry = UniqueSeeds.find(baseName);
if (baseNameEntry == UniqueSeeds.end()) {
// First use of baseName, just return it with no unique digits
return baseName;
}
// We don't care about the digit count of the suggested name, we always use at least the most
// digits ever used before.
int digitCount = baseNameEntry->second.size() - 1;
uint digitsValue;
if (digitCount < minDigits) {
// Caller is asking for more digits than we have in any registered name.
// We start the longer digit string at 000...0001 even though we might have shorter strings
// with larger numeric values.
digitCount = minDigits;
digitsValue = 1;
}
else {
digitsValue = baseNameEntry->second[digitCount].Next();
}
std::string digits = std::to_string(digitsValue);
if (digitCount > digits.size()) {
namePrefix += std::string(digitCount - digits.size(), '0');
}
return namePrefix + digits + nameSuffix;
}
void Base::UniqueNameManager::removeExactName(const std::string& name)
{
std::string baseName;
std::string nameSuffix;
uint digitCount;
uint digitsValue;
std::tie(digitCount, digitsValue) = decomposeName(name, baseName, nameSuffix);
baseName += nameSuffix;
auto baseNameEntry = UniqueSeeds.find(baseName);
if (baseNameEntry == UniqueSeeds.end()) {
// name must not be registered, so nothing to do.
return;
}
auto& digitValueSets = baseNameEntry->second;
if (digitCount >= digitValueSets.size()) {
// First use of this digitCount, name must not be registered, so nothing to do.
return;
}
digitValueSets[digitCount].Remove(digitsValue);
// an element of digitValueSets may now be newly empty and so may other elements below it
// Prune off all such trailing empty entries.
auto lastNonemptyEntry =
std::find_if(digitValueSets.crbegin(), digitValueSets.crend(), [](auto& it) {
return it.Any();
});
if (lastNonemptyEntry == digitValueSets.crend()) {
// All entries are empty, so the entire baseName can be forgotten.
UniqueSeeds.erase(baseName);
}
else {
digitValueSets.resize(digitValueSets.crend() - lastNonemptyEntry);
}
}
bool Base::UniqueNameManager::containsName(const std::string& name) const
{
std::string baseName;
std::string nameSuffix;
uint digitCount;
uint digitsValue;
std::tie(digitCount, digitsValue) = decomposeName(name, baseName, nameSuffix);
baseName += nameSuffix;
auto baseNameEntry = UniqueSeeds.find(baseName);
if (baseNameEntry == UniqueSeeds.end()) {
// base name is not registered
return false;
}
if (digitCount >= baseNameEntry->second.size()) {
// First use of this digitCount, name must not be registered, so not in collection
return false;
}
return baseNameEntry->second[digitCount].Contains(digitsValue);
}
std::string Base::Tools::getIdentifier(const std::string& name)
{
if (name.empty()) {