Address the poor performance of the existing unique-name generation (part 1) (#18589)

* Change Address the poor performance of the existing unique-name generation

As described in Issue 16849, the existing Tools::getUniqueName method
requires calling code to form a vector of existing names to be avoided.

This leads to poor performance both in the O(n) cost of building such a
vector and also getUniqueName's O(n) algorithm for actually generating
the unique name (where 'n' is the number of pre-existing names).

This has  particularly noticeable cost in documents with large numbers
of DocumentObjects because generating both Names and Labels for each new
object incurs this cost. During an operation such as importing this
results in an O(n^2) time spent generating names.

The other major cost is in the saving of the temporary backup file,
which uses name generation for the "files" embedded in the Zip file.
Documents can easily need several such "files" for each object in the
document.

This is the first part of the correction, adding an efficient class for
managing sets of unique names.

New class UniqueNameManager keeps a list of existing names organized in
a manner that eases unique-name generation. This class essentially acts
as a set of names, with the ability to add and remove names and check if
a name is already there, with the added ability to take a prototype name
and generate a unique form for it which is not already in the set.

a new unit test for UniqueNameManager has been added as well.

There is a small regression, compared to the existing unique-name code,
insofar as passing a prototype name like "xyz1234" to the old code would
yield "xyz1235" whether or not "xyz1234" already existed, while the new
code will return the next name above the currently-highest name on the
"xyz" model, which could be "xyz" or "xyz1" or "xyz0042"..
This commit is contained in:
Kevin Martin
2025-01-13 11:57:53 -05:00
committed by GitHub
parent 9459b85081
commit 6c6904453d
5 changed files with 537 additions and 0 deletions

View File

@@ -267,6 +267,7 @@ SET(FreeCADBase_CPP_SRCS
Translate.cpp
Type.cpp
TypePyImp.cpp
UniqueNameManager.cpp
Uuid.cpp
Vector3D.cpp
VectorPyImp.cpp
@@ -332,6 +333,7 @@ SET(FreeCADBase_HPP_SRCS
Tools3D.h
Translate.h
Type.h
UniqueNameManager.h
Uuid.h
Vector3D.h
ViewProj.h

View File

@@ -0,0 +1,270 @@
/***************************************************************************
* Copyright (c) 2024 Kevin Martin <kpmartin@papertrail.ca> *
* *
* This file is part of FreeCAD. *
* *
* FreeCAD is free software: you can redistribute it and/or modify it *
* under the terms of the GNU Lesser General Public License as *
* published by the Free Software Foundation, either version 2.1 of the *
* License, or (at your option) any later version. *
* *
* FreeCAD is distributed in the hope that it will be useful, but *
* WITHOUT ANY WARRANTY; without even the implied warranty of *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU *
* Lesser General Public License for more details. *
* *
* You should have received a copy of the GNU Lesser General Public *
* License along with FreeCAD. If not, see *
* <https://www.gnu.org/licenses/>. *
* *
***************************************************************************/
#include "PreCompiled.h"
#ifndef _PreComp_
#include <tuple>
#include <vector>
#include <string>
#include <set>
#endif
#include "UniqueNameManager.h"
void Base::UniqueNameManager::PiecewiseSparseIntegerSet::add(unsigned int value)
{
SpanType newSpan(value, 1);
// Look for the smallest entry not less than newSpan.
// Bear in mind that overlapping spans are neither less than nor greater than ech other.
auto above = spans.lower_bound(newSpan);
if (above != spans.end() && above->first <= value) {
// A span was found that includes value so there is nothing to do as it is already in the
// set.
return;
}
// Set below to the next span below 'after', if any, otherwise to spans.end().
// Logically, we want spans.begin()-1 so 'below' is always the entry before 'after',
// but that is not an allowed reference so spans.end() is used
std::set<SpanType, Comparer>::iterator below;
if (above == spans.begin()) {
// No spans are less than newSpan
// (possibly spans is empty, in which case above == spans.end() too)
below = spans.end();
}
else {
// At least one span is less than newSpan,
// and 'above' is the next span above that
// (or above == spans.end() if all spans are below newSpan)
below = above;
--below;
}
// Determine whether the span above (if any) and/or the span below (if any)
// are adjacent to newSpan and if so, merge them appropriately and remove the
// original span(s) that was/were merged, updating newSpan to be the new merged
// span.
if (above != spans.end() && below != spans.end()
&& above->first - below->first + 1 == below->second) {
// below and above have a gap of exactly one between them, and this must be value
// so we coalesce the two spans (and the gap) into one.
newSpan = SpanType(below->first, below->second + above->second + 1);
spans.erase(above);
above = spans.erase(below);
}
else if (below != spans.end() && value - below->first == below->second) {
// value is adjacent to the end of below, so just expand below by one
newSpan = SpanType(below->first, below->second + 1);
above = spans.erase(below);
}
else if (above != spans.end() && above->first - value == 1) {
// value is adjacent to the start of above, so just expand above down by one
newSpan = SpanType(above->first - 1, above->second + 1);
above = spans.erase(above);
}
// else value is not adjacent to any existing span, so just make anew span for it
spans.insert(above, newSpan);
}
void Base::UniqueNameManager::PiecewiseSparseIntegerSet::remove(unsigned int value)
{
SpanType newSpan(value, 1);
auto at = spans.lower_bound(newSpan);
if (at == spans.end() || at->first > value) {
// The found span does not include value so there is nothing to do, as it is already not in
// the set.
return;
}
if (at->second == 1) {
// value is the only in this span, just remove the span
spans.erase(at);
}
else if (at->first == value) {
// value is the first in this span, trim the lower end
SpanType replacement(at->first + 1, at->second - 1);
spans.insert(spans.erase(at), replacement);
}
else if (value - at->first == at->second - 1) {
// value is the last in this span, trim the upper end
SpanType replacement(at->first, at->second - 1);
spans.insert(spans.erase(at), replacement);
}
else {
// value is in the moddle of the span, so we must split it.
SpanType firstReplacement(at->first, value - at->first);
SpanType secondReplacement(value + 1, at->second - ((value + 1) - at->first));
// Because erase returns the iterator after the erased element, and insert returns the
// iterator for the inserted item, we want to insert secondReplacement first.
spans.insert(spans.insert(spans.erase(at), secondReplacement), firstReplacement);
}
}
bool Base::UniqueNameManager::PiecewiseSparseIntegerSet::contains(unsigned int value) const
{
auto at = spans.lower_bound(SpanType(value, 1));
return at != spans.end() && at->first <= value;
}
std::tuple<std::string, std::string, unsigned int, unsigned int>
Base::UniqueNameManager::decomposeName(const std::string& name) const
{
auto suffixStart = getNameSuffixStartPosition(name);
auto digitsStart = std::find_if_not(suffixStart, name.crend(), [](char c) {
return std::isdigit(c);
});
unsigned int digitCount = digitsStart - suffixStart;
return std::tuple<std::string, std::string, unsigned int, unsigned int> {
name.substr(0, name.crend() - digitsStart),
name.substr(name.crend() - suffixStart),
digitCount,
digitCount == 0 ? 0U : std::stoul(name.substr(name.crend() - digitsStart, digitCount))};
}
bool Base::UniqueNameManager::sameBaseName(const std::string& first,
const std::string& second) const
{
auto firstSuffixStart = getNameSuffixStartPosition(first);
auto secondSuffixStart = getNameSuffixStartPosition(second);
if (firstSuffixStart - first.crbegin() != secondSuffixStart - second.crbegin()) {
// The suffixes are different lengths
return false;
}
auto firstDigitsStart = std::find_if_not(firstSuffixStart, first.crend(), [](char c) {
return std::isdigit(c);
});
auto secondDigitsStart = std::find_if_not(secondSuffixStart, second.crend(), [](char c) {
return std::isdigit(c);
});
return std::equal(firstDigitsStart, first.crend(), secondDigitsStart, second.crend());
}
void Base::UniqueNameManager::addExactName(const std::string& name)
{
auto [baseName, nameSuffix, digitCount, digitsValue] = decomposeName(name);
baseName += nameSuffix;
auto baseNameEntry = uniqueSeeds.find(baseName);
if (baseNameEntry == uniqueSeeds.end()) {
// First use of baseName
baseNameEntry =
uniqueSeeds.emplace(baseName, std::vector<PiecewiseSparseIntegerSet>()).first;
}
if (digitCount >= baseNameEntry->second.size()) {
// First use of this digitCount
baseNameEntry->second.resize(digitCount + 1);
}
PiecewiseSparseIntegerSet& baseNameAndDigitCountEntry = baseNameEntry->second[digitCount];
if (baseNameAndDigitCountEntry.contains(digitsValue)) {
// We already have at least one instance of the name.
// Increment the duplicateCounts entry for that name,
// making one if none is present with an initial count of 1
// representing the singleton element we already have.
duplicateCounts.try_emplace(name, 1U).first->second++;
return;
}
baseNameAndDigitCountEntry.add(digitsValue);
}
std::string Base::UniqueNameManager::makeUniqueName(const std::string& modelName,
std::size_t minDigits) const
{
auto [namePrefix, nameSuffix, digitCount, digitsValue] = decomposeName(modelName);
std::string baseName = namePrefix + nameSuffix;
auto baseNameEntry = uniqueSeeds.find(baseName);
if (baseNameEntry == uniqueSeeds.end()) {
// First use of baseName, just return it with no unique digits
return baseName;
}
// We don't care about the digit count or value of the suggested name,
// we always use at least the most digits ever used before.
digitCount = baseNameEntry->second.size() - 1;
if (digitCount < minDigits) {
// Caller is asking for more digits than we have in any registered name.
// We start the longer digit string at 000...0001 even though we might have shorter strings
// with larger numeric values.
digitCount = minDigits;
digitsValue = 1;
}
else {
digitsValue = baseNameEntry->second[digitCount].next();
}
std::string digits = std::to_string(digitsValue);
if (digitCount > digits.size()) {
namePrefix += std::string(digitCount - digits.size(), '0');
}
return namePrefix + digits + nameSuffix;
}
void Base::UniqueNameManager::removeExactName(const std::string& name)
{
auto duplicateCountFound = duplicateCounts.find(name);
if (duplicateCountFound != duplicateCounts.end()) {
// The name has duplicates. Decrement the duplicate count.
if (--duplicateCountFound->second <= 1) {
// After removal, there are no duplicates, only a single instance.
// Remove the duplicateCounts entry.
duplicateCounts.erase(duplicateCountFound);
}
return;
}
auto [baseName, nameSuffix, digitCount, digitsValue] = decomposeName(name);
baseName += nameSuffix;
auto baseNameEntry = uniqueSeeds.find(baseName);
if (baseNameEntry == uniqueSeeds.end()) {
// name must not be registered, so nothing to do.
return;
}
auto& digitValueSets = baseNameEntry->second;
if (digitCount >= digitValueSets.size()) {
// First use of this digitCount, name must not be registered, so nothing to do.
return;
}
digitValueSets[digitCount].remove(digitsValue);
// an element of digitValueSets may now be newly empty and so may other elements below it
// Prune off all such trailing empty entries.
auto lastNonemptyEntry =
std::find_if(digitValueSets.crbegin(), digitValueSets.crend(), [](auto& it) {
return !it.empty();
});
if (lastNonemptyEntry == digitValueSets.crend()) {
// All entries are empty, so the entire baseName can be forgotten.
uniqueSeeds.erase(baseName);
}
else {
digitValueSets.resize(digitValueSets.crend() - lastNonemptyEntry);
}
}
bool Base::UniqueNameManager::containsName(const std::string& name) const
{
if (duplicateCounts.find(name) != duplicateCounts.end()) {
// There are at least two instances of the name
return true;
}
auto [baseName, nameSuffix, digitCount, digitsValue] = decomposeName(name);
baseName += nameSuffix;
auto baseNameEntry = uniqueSeeds.find(baseName);
if (baseNameEntry == uniqueSeeds.end()) {
// base name is not registered
return false;
}
if (digitCount >= baseNameEntry->second.size()) {
// First use of this digitCount, name must not be registered, so not in collection
return false;
}
return baseNameEntry->second[digitCount].contains(digitsValue);
}

View File

@@ -0,0 +1,155 @@
/***************************************************************************
* Copyright (c) 2024 Kevin Martin <kpmartin@papertrail.ca> *
* *
* This file is part of FreeCAD. *
* *
* FreeCAD is free software: you can redistribute it and/or modify it *
* under the terms of the GNU Lesser General Public License as *
* published by the Free Software Foundation, either version 2.1 of the *
* License, or (at your option) any later version. *
* *
* FreeCAD is distributed in the hope that it will be useful, but *
* WITHOUT ANY WARRANTY; without even the implied warranty of *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU *
* Lesser General Public License for more details. *
* *
* You should have received a copy of the GNU Lesser General Public *
* License along with FreeCAD. If not, see *
* <https://www.gnu.org/licenses/>. *
* *
***************************************************************************/
#ifndef SRC_BASE_UNIQUENAMEMANAGER_H_
#define SRC_BASE_UNIQUENAMEMANAGER_H_
#ifndef FC_GLOBAL_H
#include <FCGlobal.h>
#endif
#include <vector>
#include <string>
#include <set>
#include <map>
#include <algorithm>
#include <utility>
#include <tuple>
// ----------------------------------------------------------------------------
namespace Base
{
/// @brief
/// This class acts as a multiset of strings with the capability of generating
/// other names that are not in the collection. Unique names are generated
/// by inserting digits into a base name. Normally these are appended to the
/// base name, but a derived class can override getNameSuffixStartPosition
/// to place the digits elsewhere in the base name.
// This class does not support copy-/move-construction
// NOLINTNEXTLINE(cppcoreguidelines-special-member-functions)
class BaseExport UniqueNameManager
{
protected:
// This method returns a reverse const iterator on name that references the position
// of the start of the suffix (or name.crbegin() if no suffix).
// It must return the same suffix length (name.crbegin() - returnValue) for both
// unique names (one containing digits) and the corresponding base name (with no digits).
virtual std::string::const_reverse_iterator
getNameSuffixStartPosition(const std::string& name) const
{
return name.crbegin();
}
private:
class PiecewiseSparseIntegerSet
{
public:
PiecewiseSparseIntegerSet() = default;
private:
// Each pair being <lowest, count> represents the span of integers from lowest to
// (lowest+count-1) inclusive
using SpanType = std::pair<unsigned int, unsigned int>;
// This span Comparer class is analogous to std::less and treats overlapping spans as being
// neither greater nor less than each other
class Comparer
{
public:
bool operator()(SpanType lhs, SpanType rhs) const
{
// The equality case here is when lhs is below and directly adjacent to rhs.
return lhs.second + lhs.first <= rhs.first;
}
};
// spans is the set of spans. Adjacent spans are coalesced so there are always gaps between
// the entries.
std::set<SpanType, Comparer> spans;
public:
void add(unsigned int value);
void remove(unsigned int value);
bool contains(unsigned int value) const;
bool empty() const
{
return spans.empty();
}
void clear()
{
spans.clear();
}
unsigned int next() const
{
if (spans.empty()) {
return 0;
}
auto last = spans.end();
--last;
return last->first + last->second;
}
};
// Keyed as uniqueSeeds[baseName][digitCount][digitValue] iff that seed is taken.
// We need the double-indexing so that Name01 and Name001 can both be indexed, although we only
// ever allocate off the longest for each name i.e. uniqueSeeds[baseName].size()-1 digits.
std::map<std::string, std::vector<PiecewiseSparseIntegerSet>> uniqueSeeds;
// Counts of inserted strings that have duplicates, i.e. more than one instance in the
// collection. This does not contain entries for singleton names.
std::map<std::string, unsigned int> duplicateCounts;
/// @brief Break a uniquified name into its parts
/// @param name The name to break up
/// @return a tuple(basePrefix, nameSuffix, uniqueDigitCount, uniqueDigitsValue);
/// The two latter values will be (0,0) if name is a base name without uniquifying digits.
std::tuple<std::string, std::string, unsigned int, unsigned int>
decomposeName(const std::string& name) const;
public:
UniqueNameManager() = default;
virtual ~UniqueNameManager() = default;
/// Check if two names are unique forms of the same base name
bool sameBaseName(const std::string& first, const std::string& second) const;
/// Register a name in the collection.
/// This collection acts as a multiset, so multiple registrations of the same
/// name are counted, and an equal number of removals is necessary to fully
/// remove the name, allowing it to be returned by makeUniqueName
void addExactName(const std::string& name);
/// Using the given modelName as a model, generate a name that is not already
/// in the collection of names. The model name may already contain uniquifying digits
/// which will be stripped and replaced with other digits as needed.
std::string makeUniqueName(const std::string& modelName, std::size_t minDigits = 0) const;
/// Remove a registered name so it can be generated again. If the name was added
/// several times this decrements the count, and the name is only fully removed when
/// the count of removes equals or exceeds the count of adds.
void removeExactName(const std::string& name);
/// Test if the given name is already in the collection
bool containsName(const std::string& name) const;
/// @brief Empty (clear) out the contents from this collection
void clear()
{
uniqueSeeds.clear();
duplicateCounts.clear();
}
};
} // namespace Base
#endif // SRC_BASE_UNIQUENAMEMANAGER_H_

View File

@@ -22,6 +22,7 @@ target_sources(
${CMAKE_CURRENT_SOURCE_DIR}/Tools.cpp
${CMAKE_CURRENT_SOURCE_DIR}/Tools2D.cpp
${CMAKE_CURRENT_SOURCE_DIR}/Tools3D.cpp
${CMAKE_CURRENT_SOURCE_DIR}/UniqueNameManager.cpp
${CMAKE_CURRENT_SOURCE_DIR}/Unit.cpp
${CMAKE_CURRENT_SOURCE_DIR}/Vector3D.cpp
${CMAKE_CURRENT_SOURCE_DIR}/ViewProj.cpp

View File

@@ -0,0 +1,109 @@
/***************************************************************************
* Copyright (c) 2024 Kevin Martin <kpmartin@papertrail.ca> *
* *
* This file is part of FreeCAD. *
* *
* FreeCAD is free software: you can redistribute it and/or modify it *
* under the terms of the GNU Lesser General Public License as *
* published by the Free Software Foundation, either version 2.1 of the *
* License, or (at your option) any later version. *
* *
* FreeCAD is distributed in the hope that it will be useful, but *
* WITHOUT ANY WARRANTY; without even the implied warranty of *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU *
* Lesser General Public License for more details. *
* *
* You should have received a copy of the GNU Lesser General Public *
* License along with FreeCAD. If not, see *
* <https://www.gnu.org/licenses/>. *
* *
***************************************************************************/
#include <gtest/gtest.h>
#include <string>
#include <Base/UniqueNameManager.h>
// NOLINTBEGIN(cppcoreguidelines-*,readability-*)
TEST(UniqueNameManager, NonconflictingBaseNameReturnsSameName)
{
// Check that a model name that is a base name that does not conflict with
// any existing name returns the base name
EXPECT_EQ(Base::UniqueNameManager().makeUniqueName("Body"), "Body");
}
TEST(UniqueNameManager, NonconflictingUniqueNameReturnsBaseName)
{
// Check that a model name that is a unique name whose base name does not conflict with
// any existing name returns the base name
EXPECT_EQ(Base::UniqueNameManager().makeUniqueName("Body123"), "Body");
}
TEST(UniqueNameManager, ManyDigitsInModelNameIgnored)
{
// Check that the number of digits in the model name does not affect the number
// of digits in the returned name.
Base::UniqueNameManager manager;
manager.addExactName("Body");
EXPECT_EQ(manager.makeUniqueName("Body12345", 3), "Body001");
}
TEST(UniqueNameManager, ConflictingBaseNameReturnsUniqueName)
{
// Check that a model name that conflicts with an existing name returns a unique form of
// the base name
Base::UniqueNameManager manager;
manager.addExactName("Body");
EXPECT_EQ(manager.makeUniqueName("Body", 1), "Body1");
}
TEST(UniqueNameManager, ConflictingUniqueNameReturnsNewUniqueName)
{
// Check that a given unique name already in the collection causes the return of
// a new unique name.
Base::UniqueNameManager manager;
manager.addExactName("Body");
manager.addExactName("Body001");
EXPECT_EQ(manager.makeUniqueName("Body001", 3), "Body002");
}
TEST(UniqueNameManager, VerifyUniqueDigitCount)
{
// Check that when a unique name is generated it contains at least the specified number
// of inserted digits
Base::UniqueNameManager manager;
manager.addExactName("Body");
EXPECT_EQ(manager.makeUniqueName("Body", 3), "Body001");
}
TEST(UniqueNameManager, UniqueNameLargerThanAnyName)
{
// Check that the name returned is larger (using digits) that any wxisting name
// even if the given model name is not itself in the collection
Base::UniqueNameManager manager;
manager.addExactName("Body001");
EXPECT_EQ(manager.makeUniqueName("Body", 3), "Body002");
}
TEST(UniqueNameManager, Issue18504)
{
// Check that the management of spans of assigned unique digit values is
// correct
std::string name;
Base::UniqueNameManager manager;
manager.addExactName("Origin007");
manager.addExactName("Origin");
manager.addExactName("Origin008");
manager.addExactName("Origin010");
manager.addExactName("Origin011");
manager.addExactName("Origin013");
manager.addExactName("Origin016");
name = manager.makeUniqueName("Origin", 3);
manager.addExactName(name);
name = manager.makeUniqueName("Origin", 3);
manager.addExactName(name);
name = manager.makeUniqueName("Origin", 3);
EXPECT_NE(name, "Origin010");
}
// NOLINTEND(cppcoreguidelines-*,readability-*)