View Full Version : DHT capacity?
RickH
June 29th, 2007, 02:24 AM
I've been playing with ("beta testing") the Mojito DHT stuff lately, and I'm curious about something. If I'm reading the code correctly, it looks like published file hashes will generally expire in 1 hour, and are automatically republished every half hour to keep them fresh. But the publisher task only appears to publish about 1 file per minute on average... which makes me wonder how many files the DHT was designed and expected to hold per host?
Many hosts share a thousand or more files, most of which are probably uploaded at least a few times a week, making most all of them eligible for publication under the current rules (uploaded at least once, but not within the last 3 hours) once the host has been up for a couple days. To republish a thousand files once per hour, the minimum required to keep them from being dropped, would require republishing a hash every 3.6 seconds. And as it takes quite a few messages back and forth to publish a hash (with all the Find Node requests and replication factor and all), the host will essentially be fully occupied doing nothing but publishing hashes all the time. And alternatively, if the speed stays at 1 file per minute, you obviously can't keep more than 60 file hashes alive and in circulation.
So, are the current settings just for test purposes, and the hash expiration time is planned to be way higher (like at least a day or two) after the DHT goes live? Or are you guys expecting hosts to only publish maybe 20 or 30 hashes each? Or did I just completely misread the code? Something ain't right...
BTW, I love the PlasmaLamp! That's a really excellent bit of visualization design. Info-dense yet intuitive, and cool looking too. Awesome. :cool:
zab
June 29th, 2007, 04:03 PM
A lot of very good questions there. I can answer with certainty that we do not expect to increase the value expiration time as that opens several cans of worms, but we may definitely tighten the definition of a rare file. We'll be doing ongoing studies and optimization as the Mojito gets rolled out.
RickH
June 30th, 2007, 02:55 AM
I know you're not ready to work on this yet, but I was doing some more playing around and thought I should get this noted down for future reference, before I forget.
The main bottleneck for the slow publication rate seems to be the Find Nodes phase. Watching the PlasmaLamp, you can see that each publication starts with several (4-5 or more) waves of Find Node messages separated by about 10 seconds each, finally followed by the Store Value message burst.
I was able to combine the multiple waves of Find Node queries into one by increasing FIND_NODE_PARALLEL_LOOKUPS high enough. That immediately sped up the publication rate from 1 per minute or less to 4 or 5 per minute.
That leaves the 10 second delay after the Find Node wave as the major remaining time sink. The code appears to always wait for the full 10 second timeout period after each wave of queries, even though all the responses are generally received within the first couple of seconds. It would speed things up immensely if the code could track the incoming responses and immediately proceed with further processing if all of them are received before the timeout period, instead of always waiting 10 seconds and then checking to see what happened. 10 seconds would be the rare, worst-case time instead of inserting ~8 seconds of usually needless delay. Combined with the above change, this should give something like 15-20 publications per minute.
Hope this helps.
zab
June 30th, 2007, 03:36 AM
I don't think there is a hard timeout that we wait for FIND_NODE requests; we do however wait until all nodes that have been queried respond, even if we have found the target node. The reason is that this extra communication is what keeps the routing tables fresh.
If we were to shut off the lookup as soon as we find the node, or even worse - complete the iterative step before all nodes on that step have been queried and answered we would make the next few hours' worth of queries faster, but after that we'd start losing the DHT properties.
RickH
June 30th, 2007, 10:38 AM
I didn't expect it to stop when it finds the target node; as you say, all the responses are needed to keep the routing info updated and to find all the nearest nodes. It just appears that there's considerable dead time between Find Node message bursts, and it would be much faster if each burst followed immediately on the heels of the previous one. It gives the appearance of a fixed 10 second period for each Find Node step, even though all responses are typically received within 1 or 2 seconds.
The delay isn't caused by responses straggling in late. I'm watching the messages fly around in the PlasmaLamp, and each message wave is pretty short and distinct. The burst of Find Node messages and responses fly back and forth for about 2 seconds, followed by a long pause with no messages either way (and no real CPU load), followed by the next Find Node burst or the Store Value burst. The timing is extremely reliable and consistent.
If there's no fixed timeout, then I'd expect each message burst to be triggered immediately after the last response of the previous burst is processed, so that 4 Find Node bursts plus the Store Value burst would take maybe 10 seconds total, not the minute or more it's currently taking. If it's not caused by a fixed timeout, is it happening on the other end? That is, are the iterative steps somehow spawned on a fixed interval instead of each being started as quickly as possible after the prior one?
roger
July 9th, 2007, 04:52 PM
Hi RickH,
let me try to answer some of the questions. What you're seeing there are timeouts. A bit less than 30% of all Nodes in the DHT are dead and fail to respond. The lookup algorithm maintains FIND_NODE_PARALLEL_LOOKUPS number of parallel lookups. It does not wait until all responses arrive or timeout before it sends out the next round of request messages.
Increasing the FIND_NODE_PARALLEL_LOOKUPS Setting is not a solution and does not scale at all. By definition you must only know 1 (one) Node at each step that is closer to the target than the previous one. In a perfect world with no dead Nodes and "reliable UDP" you would set FIND_NODE_PARALLEL_LOOKUPS to 1. The world is unfortunately not perfect and we need a bit redundancy. The redundancy is however totally wasted work and puts just load on the DHT.
I believe the whole thing behaves also log/proportional to the size of the DHT. That means doubling the size of DHT will nullify the gained speedup of an increased FIND_NODE_PARALLEL_LOOKUPS value.
Regards storing. It's technically done in round-robin fashion. Sure, if publishing all the values takes longer than one hour the first values start to expire before the publisher is done but at the end it doesn't really matter. There is so much redundancy in the Network that somebody else will publish a value under the same key and participating DHT Nodes need anyways a high uptime. So if you can't find a value now you'll find it in the next hour.
The actual solution is to perform parallel FIND_NODE+STORE operations. Mojito must coexist with LimeWire though and we've no experience regards resource utilization yet.
Oh and it's great to hear you like the visualizations. :)
RickH
July 10th, 2007, 01:14 AM
There is so much redundancy in the Network that somebody else will publish a value under the same key and participating DHT Nodes need anyways a high uptime. So if you can't find a value now you'll find it in the next hour.
Hi, Roger. Files with that much redundancy aren't what I think of as rare. I'm frequently looking for (or serving) files that may only have 1 or maybe 2-3 copies existing in the entire network. And if the server has 1500+ other files, and can only keep 60 of them published at once, then the rare one I want is only published for maybe an hour a day. If I'm lucky, there may be 2-3 copies, and I'd find one after searching for 8-12 hours. That's OK for resuming an interrupted download if you're patient, but not very useful for searching for additional sources for swarmed downloads, which needs results well before the original download completes.
Ideally, you'd want all uncommon files to stay published pretty much all the time, if there were some way to manage that. Common files don't really need to be published in the DHT at all, but the less copies of a file are available, the more valuable it is for the few sources available to stay published.
It might be cool if the expiration time was related to the number of sources. If a node storing a given value has 20 sources stored, expiring them after an hour is fine. But if there's only 2 sources logged, I'd think it would be more useful and valuable to keep them around for a long time than to worry about the small cost of a false positive if the source has gone. ExpirationHours = max(24/NumSources, 1) or something similar, maybe? Or 24-NumSources*2, to make it more linear? Hmm.
The actual solution is to perform parallel FIND_NODE+STORE operations. Mojito must coexist with LimeWire though and we've no experience regards resource utilization yet.
Ah, right, that's more like it. Spawn off NumThreads = NumFiles / 60 publication threads, each responsible for publishing files with HashCode % NumThreads = ThreadID or some such, and their execution staggered at FindNodeTimeout / NumThreads intervals to smooth out the traffic load (and/or a small random sleep() in each thread to de-synchronize them) and you could keep all your files published.
It would create quite a message load trying to keep a couple thousand files published, but the messages are small, so it might not be all that much actual bandwidth consumed. Interestingly, the major burden of the traffic would be on the publisher, as his traffic would be spread pretty uniformly around the entire network. Any given random node's unsolicited inbound traffic should scale much more gently, in relation to the total size of the network and the average number of files published per node. So a server that really wanted to publish a lot of files could choose to accept the load and do so, without undue burden on the network as a whole, I think. Nice.
roger
July 10th, 2007, 05:38 PM
That's OK for resuming an interrupted download if you're patient, but not very useful for searching for additional sources for swarmed downloads, which needs results well before the original download completes.
You should never query the DHT for additional sources. You query it only if you're out of sources.
It might be cool if the expiration time was related to the number of sources. If a node storing a given value has 20 sources stored, expiring them after an hour is fine. But if there's only 2 sources logged, I'd think it would be more useful and valuable to keep them around for a long time than to worry about the small cost of a false positive if the source has gone. ExpirationHours = max(24/NumSources, 1) or something similar, maybe? Or 24-NumSources*2, to make it more linear? Hmm.
This works only if the Nodes understand the values they are storing and only if they're able to verify the current status of the values. That's something only the Host application can do (e.g. LimeWire) and only to a limited degree.
vBulletin® v3.7.1, Copyright ©2000-2009, Jelsoft Enterprises Ltd.