file query api
Closed, ResolvedPublic
Actions

Description

apt-file is rubbish and super slow.

make a REST micro server to query content conveniently and have it backed by an actual decompressed database so its not super slow. currently considering bolt or mongo as database.

NB: this needs to be backed by an on-disk database as the archive.ubuntu contents is almost 2GiB of decompressed data

Each archive has a uniqueid

$ curl localhost:8080/v1/archives 
["archive.neon.kde.org/user/dists/xenial","archive.ubuntu.com/ubuntu/dists/xenial"]

Using this id one can find contents in that archive

$ curl localhost:8080/v1/find/archive.neon.kde.org/user/dists/xenial\?q=\*KF5ConfigConfig.cmake  
{"usr/lib/x86_64-linux-gnu/cmake/KF5Config/KF5ConfigConfig.cmake":["libkf5config-dev"]}

Multiple ids can be grouped togeter into an ordered archive_pool

$ curl localhost:8080/v1/pools
{"neon":["archive.neon.kde.org/user/dists/xenial","archive.ubuntu.com/ubuntu/dists/xenial"]}

For consideration

may or may not be handy to query all supported archives? payload would have to be different though (i.e. /find/*?q=Foo.cmake). This would have to return a vastly more complicated payload though
maybe allow a query list so one can query multiple files in one GET
ability to upload one's sources.list and have the api return relevant archives (NB: with fragmented soruces in sourcs.list.d this becomes vastly more weird)

Related Objects
Search...

		Status	Assigned	Task
		Resolved	sitter	T3926 file query api
				Restricted Maniphest Task

sitter created this task.Oct 3 2016, 9:43 AM

https://github.com/apachelogger/neon-contents-grapple

code concern is that one file can only be in one package due to how the bucket is implemented right now which stores the parsed key,value as such, while in fact one key can have multiple values.

example contents

/bin/bash bash
/bin/bash bash-fork

to represent this in the database we'd have to actually store key,[value,...]. Completey forgot to consider that.

sitter updated the task description. (Show Details)Oct 7 2016, 3:59 PM

sitter updated the task description. (Show Details)Oct 7 2016, 4:04 PM

sitter updated the task description. (Show Details)Oct 10 2016, 1:52 PM

So, the pool concept seems fairly lackluster. Globbing all of ubuntu takes ages and ultimately client-side constraints and expectations would make micro queries substantially faster than macro queries. Latter get obscenely inefficient with large repos involved.

sitter added a subtask: Restricted Maniphest Task.Oct 11 2016, 12:02 PM

sitter claimed this task.Oct 11 2016, 12:11 PM

sitter triaged this task as Low priority.

sitter moved this task from Discussing to Doing on the Neon board.

temporarily at http://build.neon.kde.org:5757/v1/pools until sysadmin ticket gets processed

bcooksley closed subtask Restricted Maniphest Task as Resolved.Oct 24 2016, 10:32 AM

Now at https://contents.neon.kde.org/v1/pools

querying ubuntu is potentially too slow

q=bin/zsh5 takes <2 seconds
q=*/zsh5 takes >17 seconds

Current theory is that the globbing library allocates strings.

sitter mentioned this in T3722: Developer Environment.Nov 8 2016, 1:22 PM

Improved query speed substantially by querying through 512 concurrent routines. Increases CPU load quite a bit. Make */zsh5 return in less than 10 seconds though. Might have to reduce the concurrency, I suspect 512 is still a bit much.

Waiting for some input from Aleix before this can be concluded. Testing is somwhat missing, but testing this is a bit cumbersome.

sitter updated the task description. (Show Details)Nov 9 2016, 4:22 PM

sitter updated the task description. (Show Details)

sitter updated the task description. (Show Details)Nov 10 2016, 1:14 PM

Now has /doc folder with apidocs generated with http://apidocjs.com/ (idieally we'd use swagger to autogen stuff, but gin still has no swagger support). Eitherway api seems to work well.

sitter updated the task description. (Show Details)Dec 22 2016, 1:27 PM

Can this be closed? or still needs input from @apol ?

The API is in use, this can be closed I'd say.

Closing this then.

file query apiClosed, ResolvedPublicActions

Description

For consideration

Related ObjectsSearch...

file query api
Closed, ResolvedPublic
Actions

Related Objects
Search...