file query api
Closed, ResolvedPublic

Description

apt-file is rubbish and super slow.

make a REST micro server to query content conveniently and have it backed by an actual decompressed database so its not super slow. currently considering bolt or mongo as database.

NB: this needs to be backed by an on-disk database as the archive.ubuntu contents is almost 2GiB of decompressed data

Each archive has a uniqueid

$ curl localhost:8080/v1/archives 
["archive.neon.kde.org/user/dists/xenial","archive.ubuntu.com/ubuntu/dists/xenial"]

Using this id one can find contents in that archive

$ curl localhost:8080/v1/find/archive.neon.kde.org/user/dists/xenial\?q=\*KF5ConfigConfig.cmake  
{"usr/lib/x86_64-linux-gnu/cmake/KF5Config/KF5ConfigConfig.cmake":["libkf5config-dev"]}

Multiple ids can be grouped togeter into an ordered archive_pool

$ curl localhost:8080/v1/pools
{"neon":["archive.neon.kde.org/user/dists/xenial","archive.ubuntu.com/ubuntu/dists/xenial"]}

For consideration

  • may or may not be handy to query all supported archives? payload would have to be different though (i.e. /find/*?q=Foo.cmake). This would have to return a vastly more complicated payload though
  • maybe allow a query list so one can query multiple files in one GET
  • ability to upload one's sources.list and have the api return relevant archives (NB: with fragmented soruces in sourcs.list.d this becomes vastly more weird)

Related Objects

StatusAssignedTask
Resolvedsitter
sitter created this task.Oct 3 2016, 9:43 AM
sitter added a comment.EditedOct 7 2016, 3:43 PM

https://github.com/apachelogger/neon-contents-grapple

code concern is that one file can only be in one package due to how the bucket is implemented right now which stores the parsed key,value as such, while in fact one key can have multiple values.

example contents

/bin/bash bash
/bin/bash bash-fork

to represent this in the database we'd have to actually store key,[value,...]. Completey forgot to consider that.

sitter updated the task description. (Show Details)Oct 7 2016, 3:59 PM
sitter updated the task description. (Show Details)Oct 7 2016, 4:04 PM
sitter updated the task description. (Show Details)Oct 10 2016, 1:52 PM

So, the pool concept seems fairly lackluster. Globbing all of ubuntu takes ages and ultimately client-side constraints and expectations would make micro queries substantially faster than macro queries. Latter get obscenely inefficient with large repos involved.

sitter added a subtask: Restricted Maniphest Task.Oct 11 2016, 12:02 PM
sitter claimed this task.Oct 11 2016, 12:11 PM
sitter triaged this task as Low priority.
sitter moved this task from Discussing to Doing on the Neon board.

temporarily at http://build.neon.kde.org:5757/v1/pools until sysadmin ticket gets processed

bcooksley closed subtask Restricted Maniphest Task as Resolved.Oct 24 2016, 10:32 AM

querying ubuntu is potentially too slow

q=bin/zsh5 takes <2 seconds
q=*/zsh5 takes >17 seconds

Current theory is that the globbing library allocates strings.

Improved query speed substantially by querying through 512 concurrent routines. Increases CPU load quite a bit. Make */zsh5 return in less than 10 seconds though. Might have to reduce the concurrency, I suspect 512 is still a bit much.

Waiting for some input from Aleix before this can be concluded. Testing is somwhat missing, but testing this is a bit cumbersome.

sitter updated the task description. (Show Details)Nov 9 2016, 4:22 PM
sitter updated the task description. (Show Details)
sitter updated the task description. (Show Details)Nov 10 2016, 1:14 PM
sitter moved this task from Doing to Review on the Neon board.Nov 24 2016, 1:18 PM

Now has /doc folder with apidocs generated with http://apidocjs.com/ (idieally we'd use swagger to autogen stuff, but gin still has no swagger support). Eitherway api seems to work well.

sitter updated the task description. (Show Details)Dec 22 2016, 1:27 PM
bshah added subscribers: apol, bshah.Jul 17 2018, 4:51 AM

Can this be closed? or still needs input from @apol ?

apol added a comment.Jul 19 2018, 12:58 AM

The API is in use, this can be closed I'd say.

bshah closed this task as Resolved.Jul 19 2018, 8:19 AM

Closing this then.