random toughts
Home › Forums › Browsing the Hexes › random toughts
This topic contains 5 replies, has 2 voices, and was last updated by Fragensteller 1 month, 3 weeks ago.

AuthorPosts

FragenstellerI was wondering wether one of you could answer me this. If the algorythm tecnically contains a shere infinite amount of data, and every set of data is traceable, what keeps us from buliding devices with endless storage capacity? In addition we would just need an extra programm that keeps track of the paths. And for updating information we just would have to change the pathways instead of information itself. Or are there obstacles that I’m to unaware to consider? I’m not a programmer btw.
Greetings
HaploThat is an interesting thought. If someone had the algorithm for the universal slideshow, they could have all their pictures take up zero hard drive space on their computer. You could even store your pictures (their reference numbers that is) in a paper notebook! Just don’t lose the algorithm! Actually that wouldn’t be practical because the numbers can get really huge.
But you could have the algorithm store the reference number in a save file along with a description that you write to remember which picture it is, then simply render the picture whenever you want to view it. You could “store” billions of photos in just a few megabytes
Additional features:
A “photo album” comprised of photo descriptions, each linked to it’s huge number which can be used to render the photo when it’s clicked.
A slideshow which renders only your saved reference numbers
An additional scrambler function which encrypts the reference numbers with a password so that only the authorized user may render their photos.
I’ve actually experimented with creating something similar to this idea in the past, but the same issue is present. Haplo hinted on it, that being “the numbers can get really huge”.
You see, while it is possible to make something like the proposed idea, it quickly becomes unfeasible due to the location of the data. The numbers become so huge, that just to store the location number itself takes up more disk space than the actual data itself.
To give you a very simple example, let’s say you want to store a simple piece of text. One way to do this is to give every word a number, and then you would just store those numbers and connect the words. According to a quick search, there are 171,476 words in the English language. In binary, this would be 101001110111010100. It “costs” 18 bits to store that number. Now, let’s look at what it would cost to store “i like pie”. There are 10 characters in that text. If we’re storing them as plain ASCII, that’s 10 bytes. A byte is 8 bits, so 10×8 = 80 bits. However, there are 3 words in that text, so with our dictionary approach, it would be 3×18 or 54 bits. So far so good. There’s one problem. That’s just compression.
Now we need to turn the data into a seed which we can feed into an algorithm and regenerate our data. For nonprogrammers, think of the number pi. What we would have to do is find a sequence of 54 numbers somewhere in pi. The problem becomes obvious. It’s easy enough to find a sequence of maybe 5 or so numbers, but 54 is another story. Our compressed text might be in the 10^200th digit of pi (if we’re supremely lucky), but if you count how long that number is, it’s 201 digits. Uh, oh, that’s already 2.5x bigger than the 80 bits of the original text. And that’s just for a 10character piece of text. Forget about a full page or even something as small as a floppy disk’s worth of data!
It’s a fun challenge to try and solve, but if you do it, you’re definitely guaranteed to be instantly the richest person on Earth and win a Nobel prize and several other awards ðŸ™‚
FragenstellerAt first thanks for your answers. I get the problem now and I wasn’t aware of how huge these numbers can get.
It sounds a bit naive, but wasn’t it possible to split the sequences up into smaller, more doable pieces, then regenerate those data pieces and then simply add them up again? Like if we took every 6 numbers in that sequence of 54, give them a number to know their location and later assemble them in that order.
Or would that demand even more processing resources?It might be doable, but then you have to store the locations of each of those smaller pieces, which will most likely still end up being a large size. And yes, it would consume more processing resources, but that just means it’ll take more time to decompress the data (or “unzip”).
I’ll use my above example again, but illustrate the actual set of data required to store the phrase in pi. With a small tweak though, we can compress the data further by only using the top 10,000 most used English words, since people don’t use ~171k on a daily basis. There’s a handy repository here:
https://github.com/first20hours/google10000englishSo, the words “i like pie” are located on lines:
i = 14
like = 96
pie = 6321Another tweak we can employ, is instead of using binary, we can just represent the positions in plain decimal. Since we’re only working with 10,000 words, we only need 5 digits. That’s a lot better than 18 binary digits! It also addresses the concern you raised about smaller workable pieces.
i = 00014
like = 00096
pie = 06321Now, you might be thinking “Why do we need the zeroes?”, and the answer to that is quite simple. Pi starts at 3.1415926… and you can see that our first number is literally in the 1st position! But the problem is that you don’t know where to stop. What if you actually meant 141? Or 1415? You need a constant length so that you always know the exact number.
Using this handy site, we can search the first 2 billion digits of pi: http://www.subidiom.com/pi/pi.asp
Our positions are:
i = 260,307th digit
like = 55,483rd digit
pie = 29,093rd digitWe’ll need a constant length representation again, with the pi position. Since 2,000,000,000 is 10 digits, the above numbers would be stored like this in memory:
0000260307
0000055483
0000029093Or: 000026030700000554830000029093
That way, the computer would read the first 10 digits, go to that position in pi, grab the first 5 numbers starting there, and then refer to our list and see what word that was. Then it reads the next 10 digits and so on…
But look at that piece of data. Since the computer has to store it in binary, that’s:
1010110001000001101110001001100011000101110010010110110010011001111111111110110100101That’s 85 bits. “i like pie” is 10 characters (including spaces). 10 bytes would be 80 bits. So we’re already consuming 5 more bits to store the phrase instead of the raw version.
FragenstellerI’m quite impressed. Actually I think I never had a question that was answered that complex yet so simplified and understandably explained, given the fact that for me this was an untouched topic. Thank you for taking the time. Yet I find the idea of unlimited save space incredibly fascinating and hopeful that one day somebody will be able to think outside the box enough to acieve it regardless of todays limitations.
Greetings

AuthorPosts