==================== Package Cache System ==================== Launchpad caches published packages (sources and binaries) details in order to get better performance on searches. Cached information for sources is stored in DistributionSourcePackageCache table, including: * distribution, IDistribution where the source was published * archive, IArchive where the source was published * sourcepackagename, respective ISourcePackageName * name, source name in text format. * binpkgnames, text containing all the binary package names generated by th source. * binpkgsummary, text containing all summaries of the binaries generated by this source. * binpkgdescription, text containing all descriptions of the binaries generated by this source. * fti, a tsvector generated on insert or update with the text indexes for name, binpkgnames, binpkgsummaries and binpkgdescriptions. >>> from lp.registry.interfaces.distribution import IDistributionSet >>> from lp.soyuz.model.distributionsourcepackagecache import ( ... DistributionSourcePackageCache) >>> ubuntu = getUtility(IDistributionSet)['ubuntu'] >>> ubuntu_caches = DistributionSourcePackageCache._find(ubuntu) >>> ubuntu_caches.count() 10 >>> for name in sorted([cache.name for cache in ubuntu_caches]): ... print name alsa-utils cnews commercialpackage evolution foobar libstdc++ linux-source-2.6.15 mozilla-firefox netapplet pmount Cached information for binaries is stored in DistroSeriesPackageCache table, including: * distroseries, IDistroSeries where the binary is published. * archive, IArchive where the source was published * binarypackagename, respective IBinaryPackageName. * name, binary name in text format. * summary, binary summary in text format. * description, binary description in text format. * summaries, binary summaries in text format * descriptions, binary description in text format. * fti, a tsvector generated on insert or update with the text indexes for name, summary, description, summaries and descriptions. >>> from lp.soyuz.model.distroseriespackagecache import ( ... DistroSeriesPackageCache) >>> warty = ubuntu['warty'] >>> warty_caches = DistroSeriesPackageCache._find(warty) >>> warty_caches.count() 5 >>> for name in sorted([cache.name for cache in warty_caches]): ... print name at foobar linux-2.6.12 mozilla-firefox pmount Different than sources, where multiple generated binaries are very common, multiple binaries of with the same name are only possible when versions are not the same across architectures. Building these caches we can reach good performance on full and partial term searching. >>> ubuntu.searchSourcePackages('mozilla').count() 1 >>> ubuntu.searchSourcePackages('moz').count() 1 >>> ubuntu.searchSourcePackages('biscoito').count() 0 The cache update procedure is done by cronscripts/update-pkgcache.py, which removes obsolete records, update existing ones and add new records during the scanning of all the publishing records. This scripts runs periodically in our infrastructure. See usage at package-cache- script.txt. Dealing with Source Caches ========================== A SourcePackage that has the status DELETED will be deleted from the cache when it's updated. >>> foobar_in_ubuntu = ubuntu.getSourcePackage('foobar') >>> foobar_rel = foobar_in_ubuntu.releases[0] >>> foobar_pub = foobar_rel.publishing_history[0] >>> foobar_pub.status.name 'DELETED' >>> ubuntu.searchSourcePackages('foobar').count() 1 >>> foobar_cache = DistributionSourcePackageCache.selectOneBy( ... archive=ubuntu.main_archive, distribution=ubuntu, name='foobar') >>> foobar_cache is not None True Source cache updates are driven by distribution, IDistribution instance offers a method for removing obsolete records in cache: Let's use a fake logger object: >>> from lp.services.log.logger import FakeLogger >>> DistributionSourcePackageCache.removeOld( ... ubuntu, archive=ubuntu.main_archive, log=FakeLogger()) DEBUG Removing source cache for 'foobar' (10) >>> import transaction >>> transaction.commit() >>> ubuntu.searchSourcePackages('foobar').count() 0 >>> foobar_cache = DistributionSourcePackageCache.selectOneBy( ... archive=ubuntu.main_archive, distribution=ubuntu, name='foobar') >>> foobar_cache is None True A source package that has the status PUBLISHED will be added to the cache when it's updated the next time. >>> cdrkit_in_ubuntu = ubuntu.getSourcePackage('cdrkit') >>> cdrkit_rel = cdrkit_in_ubuntu.releases[0] >>> cdrkit_pub = cdrkit_rel.publishing_history[0] >>> cdrkit_pub.status.name 'PUBLISHED' >>> ubuntu.searchSourcePackages('cdrkit').count() 0 >>> cdrkit_cache = DistributionSourcePackageCache.selectOneBy( ... archive=ubuntu.main_archive, distribution=ubuntu, name='cdrkit') >>> cdrkit_cache is None True We can invoke the cache updater directly on IDistroSeries: >>> updates = DistributionSourcePackageCache.updateAll( ... ubuntu, archive=ubuntu.main_archive, ztm=transaction, ... log=FakeLogger()) DEBUG ... DEBUG Considering source 'cdrkit' DEBUG Creating new source cache entry. ... DEBUG Considering source 'mozilla-firefox' ... >>> print updates 10 Now we see that the 'cdrkit' source is part of the caches and can be reached via searches: >>> ubuntu.searchSourcePackages('cdrkit').count() 1 >>> cdrkit_cache = DistributionSourcePackageCache.selectOneBy( ... archive=ubuntu.main_archive, distribution=ubuntu, name='cdrkit') >>> cdrkit_cache is not None True Dealing with Binary Caches ========================== A BinaryPackage that has the status DELETED will be deleted from the cache when it's updated. >>> foobar_bin_in_warty = warty.getBinaryPackage('foobar') >>> foobar_bin_rel = foobar_in_ubuntu.releases[0] >>> foobar_bin_pub = foobar_rel.publishing_history[0] >>> foobar_bin_pub.status.name 'DELETED' >>> warty.searchPackages('foobar').count() 1 >>> foobar_bin_cache = DistroSeriesPackageCache.selectOneBy( ... archive=ubuntu.main_archive, distroseries=warty, name='foobar') >>> foobar_bin_cache is not None True Binary cache updates are driven by distroseries, IDistroSeries instance offers a method for removing obsolete records in cache: >>> DistroSeriesPackageCache.removeOld( ... warty, archive=ubuntu.main_archive, log=FakeLogger()) DEBUG Removing binary cache for 'foobar' (8) >>> transaction.commit() >>> warty.searchPackages('foobar').count() 0 >>> foobar_bin_cache = DistroSeriesPackageCache.selectOneBy( ... archive=ubuntu.main_archive, distroseries=warty, name='foobar') >>> foobar_bin_cache is None True A binary package that has been published since the last update of the cache will be added to it. >>> cdrkit_bin_in_warty = warty.getBinaryPackage('cdrkit') >>> cdrkit_bin_pub = cdrkit_bin_in_warty.current_publishings[0] >>> cdrkit_bin_pub.status.name 'PUBLISHED' >>> warty.searchPackages('cdrkit').count() 0 >>> cdrkit_bin_cache = DistroSeriesPackageCache.selectOneBy( ... archive=ubuntu.main_archive, distroseries=warty, name='cdrkit') >>> cdrkit_bin_cache is None True We can invoke the cache updater directly on IDistroSeries: >>> updates = DistroSeriesPackageCache.updateAll( ... warty, archive=ubuntu.main_archive, ztm=transaction, ... log=FakeLogger()) DEBUG Considering binary 'at' ... DEBUG Considering binary 'cdrkit' DEBUG Creating new binary cache entry. ... DEBUG Considering binary 'mozilla-firefox' ... >>> print updates 6 Transaction behaves exactly the same as for Source Caches, except that it commits full batches of 100 elements. >>> transaction.commit() Now we see that the 'cdrkit' binary is part of the caches and can be reached via searches: >>> warty.searchPackages('cdrkit').count() 1 >>> cdrkit_bin_cache = DistroSeriesPackageCache.selectOneBy( ... archive=ubuntu.main_archive, distroseries=warty, name='cdrkit') >>> cdrkit_bin_cache is not None True PPA package caches ================== Package caches are also populated for PPAs, allowing users to search for them considering the packages currently published in their context. We will use Celso's PPA. >>> from lp.registry.interfaces.person import IPersonSet >>> cprov = getUtility(IPersonSet).getByName('cprov') With empty cache contents in Archive table we can't even find a PPA by owner name. >>> print ubuntu.searchPPAs(text='cprov').count() 0 Sampledata contains stub counters. >>> print cprov.archive.sources_cached 3 >>> print cprov.archive.binaries_cached 3 We have to issue 'updateArchiveCache' to include the owner 'name' and 'displayname' field in the archive caches. >>> cprov.archive.updateArchiveCache() Now Celso's PPA can be found via searches and the package counters got reset, reflecting that nothing is cached in the database yet. >>> print ubuntu.searchPPAs(text='cprov')[0].displayname PPA for Celso Providelo >>> print cprov.archive.sources_cached 0 >>> print cprov.archive.binaries_cached 0 The sampledata contains no package caches, so attempts to find 'pmount' (a source), 'firefox' (a binary name term) or 'shortdesc' (a term used in the pmount binary summary) fail. >>> ubuntu.searchPPAs(text='pmount').count() 0 >>> ubuntu.searchPPAs(text='firefox').count() 0 >>> ubuntu.searchPPAs(text='warty').count() 0 >>> ubuntu.searchPPAs(text='shortdesc').count() 0 If we populate the package caches and update the archive caches, the same queries work, pointing to Celso's PPA. >>> source_updates = DistributionSourcePackageCache.updateAll( ... ubuntu, archive=cprov.archive, ztm=transaction, log=FakeLogger()) DEBUG ... DEBUG Considering source 'pmount' ... >>> binary_updates = DistroSeriesPackageCache.updateAll( ... warty, archive=cprov.archive, ztm=transaction, ... log=FakeLogger()) DEBUG Considering binary 'mozilla-firefox' ... >>> cprov.archive.updateArchiveCache() >>> cprov.archive.sources_cached == source_updates True >>> print cprov.archive.sources_cached 3 >>> cprov.archive.binaries_cached == binary_updates True >>> print cprov.archive.binaries_cached 2 >>> print ubuntu.searchPPAs(text='cprov')[0].displayname PPA for Celso Providelo >>> print ubuntu.searchPPAs(text='pmount')[0].displayname PPA for Celso Providelo >>> print ubuntu.searchPPAs(text='firefox')[0].displayname PPA for Celso Providelo >>> print ubuntu.searchPPAs(text='warty')[0].displayname PPA for Celso Providelo >>> print ubuntu.searchPPAs(text='shortdesc')[0].displayname PPA for Celso Providelo The method which populates the archive caches also cleans the texts up to work around the current FTI limitation (see bug #207969). It performs the following tasks: * Exclude punctuation ([,.;:!?]) * Store only unique lower case words We remove all caches related to Celso's PPA. >>> celso_source_caches = DistributionSourcePackageCache.selectBy( ... archive=cprov.archive) >>> celso_binary_caches = DistroSeriesPackageCache.selectBy( ... archive=cprov.archive) >>> from zope.security.proxy import removeSecurityProxy >>> def purge_caches(caches): ... for cache in caches: ... naked_cache = removeSecurityProxy(cache) ... naked_cache.destroySelf() >>> purge_caches(celso_source_caches) >>> purge_caches(celso_binary_caches) Now, when we update the caches for Celso's PPA, only the owner information will be available, no packages information will be cached. >>> cprov.archive.updateArchiveCache() >>> print cprov.archive.sources_cached 0 >>> print cprov.archive.binaries_cached 0 >>> print cprov.archive.package_description_cache celso providelo cprov We insert a new source cache with texts containing punctuation and duplicated words pointing to Celso's PPA. >>> from lp.registry.interfaces.sourcepackagename import ( ... ISourcePackageNameSet) >>> cdrkit_name = getUtility(ISourcePackageNameSet).queryByName('cdrkit') >>> unclean_cache = DistributionSourcePackageCache( ... archive=cprov.archive, ... distribution=ubuntu, ... sourcepackagename=cdrkit_name, ... name=cdrkit_name.name, ... binpkgnames='cdrkit-bin cdrkit-extra', ... binpkgsummaries='Ding! Dong? Ding,Dong. Ding; DONG: ding dong') Note that 'binpkgdescription' and 'changelog' are not considered yet, and we have no binary cache. Let's update the archive cache and see how it goes. >>> cprov.archive.updateArchiveCache() Only one source cached and the 'package_description_cache' only contains unique and lowercase words free of any punctuation. >>> print cprov.archive.sources_cached 1 >>> print cprov.archive.binaries_cached 0 >>> print cprov.archive.package_description_cache ding providelo celso cdrkit cdrkit-bin dong ubuntu cdrkit-extra cprov Let's remove the unclean cache and update Celso's PPA cache, so everything will be back to normal. >>> removeSecurityProxy(unclean_cache).destroySelf() >>> cprov.archive.updateArchiveCache() Package Counters ================ We also store counters for the number of Sources and Binaries published in a DistroSeries pocket RELEASE: >>> warty.sourcecount 3 >>> warty.binarycount 4 Since we have modified the publication list for warty in order to test the caching system, we expect similar changes in the counters. IDistroSeries provides a method to update its own cache: >>> warty.updatePackageCount() New values were stored: >>> warty.sourcecount 6 >>> warty.binarycount 6 Only PENDING and PUBLISHED publications are considered. We will use `SoyuzTestPublisher` for creating convenient publications. >>> from lp.services.database.sqlbase import commit >>> from lp.soyuz.enums import ( ... PackagePublishingStatus) >>> from lp.soyuz.tests.test_publishing import ( ... SoyuzTestPublisher) >>> from lp.testing.layers import LaunchpadZopelessLayer >>> test_publisher = SoyuzTestPublisher() >>> LaunchpadZopelessLayer.switchDbUser('launchpad') >>> unused = test_publisher.setUpDefaultDistroSeries(warty) >>> test_publisher.addFakeChroots() Let's create one source with a single binary in PENDING status. >>> pending_source = test_publisher.getPubSource( ... sourcename = 'pending-source', ... status=PackagePublishingStatus.PENDING) >>> pending_binaries = test_publisher.getPubBinaries( ... binaryname="pending-binary", pub_source=pending_source, ... status=PackagePublishingStatus.PENDING) >>> print len( ... set(pub.binarypackagerelease.name for pub in pending_binaries)) 1 And one source with a single binary in PUBLISHED status. >>> published_source = test_publisher.getPubSource( ... sourcename = 'published-source', ... status=PackagePublishingStatus.PUBLISHED) >>> published_binaries = test_publisher.getPubBinaries( ... binaryname="published-binary", pub_source=published_source, ... status=PackagePublishingStatus.PUBLISHED) >>> print len( ... set(pub.binarypackagerelease.name for pub in published_binaries)) 1 >>> commit() >>> LaunchpadZopelessLayer.switchDbUser(test_dbuser) Exactly 2 new sources and 2 new binaries will be accounted. >>> warty.updatePackageCount() >>> warty.sourcecount 8 >>> warty.binarycount 8 Let's create one source with a single binary in DELETED status. >>> LaunchpadZopelessLayer.switchDbUser('launchpad') >>> deleted_source = test_publisher.getPubSource( ... sourcename = 'pending-source', ... status=PackagePublishingStatus.DELETED) >>> deleted_binaries = test_publisher.getPubBinaries( ... binaryname="pending-binary", pub_source=deleted_source, ... status=PackagePublishingStatus.DELETED) >>> print len( ... set(pub.binarypackagerelease.name for pub in deleted_binaries)) 1 >>> LaunchpadZopelessLayer.switchDbUser(test_dbuser) Distroseries package counters will not account DELETED publications. >>> warty.updatePackageCount() >>> warty.sourcecount 8 >>> warty.binarycount 8 A similar mechanism is offered by IDistroArchSeries, but only for binaries (of course): >>> warty_i386 = warty['i386'] >>> warty_i386.package_count 5 Invoke the counter updater on this architecture: >>> warty_i386.updatePackageCount() New values were stored: >>> warty_i386.package_count 9 DistroSeriesBinaryPackage cache lookups ======================================= The DistroSeriesBinaryPackage and DistroArchSeriesBinaryPackage objects uses a DistroSeriesPackageCache record to present summary and description for the context binary package. >>> from lp.soyuz.interfaces.binarypackagename import ( ... IBinaryPackageNameSet) >>> foobar_name = getUtility(IBinaryPackageNameSet).queryByName('foobar') >>> primary_cache = DistroSeriesPackageCache( ... archive=ubuntu.main_archive, distroseries=warty, ... binarypackagename=foobar_name, summary='main foobar', ... description='main foobar description') The DistroSeriesBinaryPackage. >>> foobar_binary = warty.getBinaryPackage('foobar') >>> foobar_binary.cache == primary_cache True >>> print foobar_binary.summary main foobar >>> print foobar_binary.description main foobar description The DistroArchSeriesBinaryPackage. >>> warty_i386 = warty['i386'] >>> foobar_arch_binary = warty_i386.getBinaryPackage('foobar') >>> foobar_arch_binary.cache == primary_cache True >>> print foobar_arch_binary.summary main foobar >>> print foobar_arch_binary.description main foobar description This lookup mechanism will continue to work even after we have added a cache entry for a PPA package with the same name. >>> ppa_cache = DistroSeriesPackageCache( ... archive=cprov.archive, distroseries=warty, ... binarypackagename=foobar_name, summary='ppa foobar') >>> foobar_binary = warty.getBinaryPackage('foobar') >>> foobar_binary.cache != ppa_cache True >>> foobar_arch_binary = warty_i386.getBinaryPackage('foobar') >>> foobar_arch_binary.cache != ppa_cache True Content class declarations for FTI ================================== For the FTI to work properly with Storm queries, the fti column on the content class must be declared as a RawStr, otherwise unicode errors are likely to be thrown. >>> transaction.commit() >>> binary_name = getUtility( ... IBinaryPackageNameSet).queryByName('commercialpackage') >>> cache_with_unicode = DistroSeriesPackageCache( ... archive=ubuntu.main_archive, distroseries=warty, ... binarypackagename=binary_name, summaries=u"“unicode“") >>> transaction.commit() >>> cache = DistroSeriesPackageCache.get(id=cache_with_unicode.id) The following line fails unless the content class FTI column is RawStr: >>> cache.fti "'\xc3\xa2\xc2\x80\xc2\x9cunicode\xc3\xa2\xc2\x80\xc2\x9c':1B" Disabled archives caches ======================== Once recognized as disabled, archives have their caches purged, so they won't be listed in package searches anymore. First, we rebuild and examinate the caches for Celso's PPA. # Helper functions for completing rebuilding and dumping cache # contents for a given Archive. >>> from lp.services.log.logger import BufferLogger >>> logger = BufferLogger() >>> def rebuild_caches(archive): ... DistributionSourcePackageCache.removeOld( ... ubuntu, archive=archive, log=logger) ... DistributionSourcePackageCache.updateAll( ... ubuntu, archive=archive, ztm=transaction, log=logger) ... for series in ubuntu.series: ... DistroSeriesPackageCache.removeOld( ... series, archive=archive, log=logger) ... DistroSeriesPackageCache.updateAll( ... series, archive=archive, ztm=transaction, log=logger) ... archive.updateArchiveCache() >>> def print_caches(archive): ... source_caches = DistributionSourcePackageCache.selectBy( ... archive=archive) ... binary_caches = DistroSeriesPackageCache.selectBy( ... archive=archive) ... print '%d sources cached [%d]' % ( ... archive.sources_cached, source_caches.count()) ... print '%d binaries cached [%d]' % ( ... archive.binaries_cached, binary_caches.count()) >>> def print_search_results(text, user=None): ... commit() ... LaunchpadZopelessLayer.switchDbUser('launchpad') ... for ppa in ubuntu.searchPPAs(text, user=user): ... print ppa.displayname ... LaunchpadZopelessLayer.switchDbUser(test_dbuser) >>> def enable_archive(archive): ... commit() ... LaunchpadZopelessLayer.switchDbUser('launchpad') ... archive.enable() ... commit() ... LaunchpadZopelessLayer.switchDbUser(test_dbuser) >>> def disable_archive(archive): ... commit() ... LaunchpadZopelessLayer.switchDbUser('launchpad') ... archive.disable() ... commit() ... LaunchpadZopelessLayer.switchDbUser(test_dbuser) >>> rebuild_caches(cprov.archive) >>> print_caches(cprov.archive) 3 sources cached [3] 2 binaries cached [2] >>> print_search_results('pmount') PPA for Celso Providelo When Celso's PPA gets disabled, the indexes remain in the DB. >>> disable_archive(cprov.archive) >>> print_caches(cprov.archive) 3 sources cached [3] 2 binaries cached [2] However the disabled PPA is not included in search results for anonymous requests or requests from users with no view permission to Celso's PPA. >>> print_search_results('pmount') >>> no_priv = getUtility(IPersonSet).getByName('no-priv') >>> print_search_results('pmount', user=no_priv) Only the owner of the PPA can still find it until the changes are removed. >>> print_search_results('pmount', user=cprov) PPA for Celso Providelo When indexes rebuilt the cache records are removed and not even the owner is able to find the disabled PPA. >>> rebuild_caches(cprov.archive) >>> print_caches(cprov.archive) 0 sources cached [0] 0 binaries cached [0] >>> print_search_results('pmount', user=cprov) If by any chance, the disabled PPA gets re-enabled, the cache records will be re-created when the indexes are rebuilt and the ppa becomes publicly searchable again. >>> enable_archive(cprov.archive) >>> rebuild_caches(cprov.archive) >>> print_caches(cprov.archive) 3 sources cached [3] 2 binaries cached [2] >>> print_search_results('cprov') PPA for Celso Providelo