Searching across pages
Home › Forum › Browsing the Hexes › Searching across pages
This topic contains 13 replies, has 2 voices, and was last updated by Jonathan Basile 5 months, 3 weeks ago.
-
AuthorPosts
-
Derek MillerDear Jonathan,
Thank you for this gorgeous site. I was wondering about the ability to search across multiple pages.
I’m guessing you need to limit the number of search characters to keep query times manageable. But it would be interesting (since pages are the “complete” unit at the moment) to see where passages continue “correctly” onto the next page. So, the first 3200 characters of Lincoln’s second inaugural address should be followed on the next page by the word “until.” Might some trans-page search be possible?
Manuel ReinspergerHello Jonathan,
I too find this website amazing and wanted to ask something about it’s search algorithm as well.
How can you perform the search with random english words?
I would love if you could tell me this and maybe a little more on the technical side of this fantastic library, if possible even show some code.Thanks in advance,
Manuel Reinsperger
Derek MillerTo second Manuel’s point, a glimpse at your algorithm (both directions) would be amazing. Do you have a GitHub page for the project?
Might there be a programmatic way to search for something? I was thinking of curating a book made solely of libraryofbabel pages (i.e., slicing a book into 3200 character chunks and then creating it as a series of pages bookmarked here).
Derek MillerOne more thought (which might be totally useless): You’ve been using the page as the meaningful unit, but what if you worked combinatorially first on lines of eighty characters, rather than on pages? So, the number of possible lines is 29**80. Then the pages work combinatorially on line locations? I’d need to understand your code (and also, math) better, but might that simplify the scale of the calculations?
Hey Derek and Manuel,
So many interesting ideas! I’ll do my best to touch on everything.
So the truth is that just on the basis of statistics I can guarantee you that you’ll never find two specific pages one after the other even if you spend your life looking. The chances would be much lower than your chance of quantum tunneling through a wall. The only way to match consecutive pages of text would be to increase the seed state of my pseudo-random number generator which powers the site – this is possible and something Im presently experimenting with. Right now I have a successful model for full 410 page books, but its just a little too slow for the web. once i tidy up some things in my GUI I’m going to get back to experimenting with that. Right now the numbers the cgi programs are working with are around 2^16000 – to produce full books I need to step that up to 2^2000000, give or take. daunting!
The best description of the search function can be found on one of the theory pages, Grains of Sand. To summarize, the book pages work by using the book location as the seed of a pseudo-random number generator, and the search simply inverts that – in fact, it’s not really a “search” in the sense that it doesn’t work by reading through text until it finds a match. It’s method is more like gematria – it converts that text into a number, and inverts the PRNG algorithm to find the seed which produces it (again, there’s a more patient description of that on the theory page linked to above). So to match with random English words I simply tack on some random words to the searched-for term before running through the algorithm in reverse.
I do want to share my code and make the project as useful as I can to other programmers. I also hope that people will go on to create more universal libraries (there is no one “universal” library) with other character sets – ideographic ones intrigue me the most. But im pretty new to programming and dont know how people usually go about this. is Github the place you would recommend? would you recommend getting a creative commons license before posting it? Are there any security risks for the site if I don’t redact certain portions of the code?
Like i said, I want this project to be as helpful to everyone as it can be, whether through its books or through its code, but I want to make sure I go about it in the right way. At the very least, I think I should make my code a little more readable and well organized (comments, etc.) rather than just throwing it all out there in disarray. For now I want to focus on completing the website, and then I hope to turn to sharing the code and hopefully helping others build similar libraries.
(That being said, I hope anyone reading will let me know if you have an interest in creating a “universal” library in another language)
As for your last suggestion, Derek. The simplest algorithm, of course, would just start from the 29 characters in our character set and vary those – but that’s just what the algorithm does! unfortunately what is necessary, no matter what size elements you start with, is to seed your random number generator (or whatever function you work with) with large enough values that it can produce the requisite number of unique states. in our case 29^3200 ~= 2^15500, give or take. Otherwise your patterns will start to repeat before you’ve passed through all the possible permutations.
Derek MillerDear Jonathan,
Thank you for the thorough reply. I promise, I did read the theory page first!
I think GitHub would be the place to post your code, when you’re ready to do so. I’m not near expert enough to suggest how to handle sharing code so as to minimize security risks; sorry.
On your reply in the last paragraph: I think I understand what you’re saying. My point (and I think it’s still valid) is that you needn’t think about pages as the minimum unit; you might instead use lines. There are “only” 29^80 unique lines in the library. They’re then combined in units of 40, making the possible pages: (29^80)^40 = 29^3200. And combined again for 410 pages, giving you the number of unique books: (29^3200)^410 = 29^1312000. I guess my question, though, is, wouldn’t working on lines rather than pages make the number of possible states more manageable? Then the problem is how to make the proper random combinations among that set of lines.
Maybe I’d understand better with your code. Or maybe I’ll never understand! Anyway, it’s been fun to contemplate.
Manuel ReinspergerHey Jonathan,
Thanks for your answer! I too promise that I read the theory site, which was one of the things that made me even more curious.
If you are really that new to programming then you certainly have learned quite a ton!
I can’t say that I’m a security expert, but I know quite some things about development in open source and a little about data security. If you want your code and creation to be visible to as much audience as possible then GitHub is certainly a great option. I would certainly like a version where the Anglishize option could be a Germanize or any other language for that matter.
One thing you would have to remove before publishing are things like database passwords, or for that matter any passwords or credentials at all.Also, I think that with a little work over Derek’s algorithm idea it could be beneficial for the library.
Eric Nitardy“On some shelf in some hexagon, it was argued, there must exist a book that is the cipher and perfect compendium of all other books, and some librarian must have examined that book; this librarian is analogous to a god.”
Hi Jonathan,
Once you decide to release the source in some form, and I hope you do, you might link to a page in the library that describes the algorithm and its inverse.
That you can do that and likely fit it on a single page marks your library as a very rational one. You note in your Grains of Sand section that you might have chosen an algorithm whose inverse cannot be calculated in a practical fashion. There are yet other algorithms that may not, even in principle, be inverted, that is, any description of the inverse would be nearly as long as the library. These might be called half-rational libraries. There might be other unbuildable libraries for which both the ordering and the inverse have no reasonably sized (rational) description.
Eric
Hey Manuel – thank you for the advice – it’s all very helpful.
If you have a link to a plaintext German lexicon I could make a Germanize (hmm…verdeutschen?) option pretty easily.
I’m fairly certain that Derek and I are talking about more or less the same thing – I mean I think that what he’s describing is what the algorithm does already. Right now it computes a number from the book location, maps that to a random value, and then does a base conversion from ten to 29 to create a block of text in our 29-character set. So it isn’t treating the full page as a unit in the way Derek suggests, and if anything I think it would be adding a step to treat lines as units – instead of
letter = randomvalue % 29;
page += characterset[letter];we would have to start from
line = randomvalue % 29^80
then convert that value into a line of letters. Unless I misunderstand Derek’s idea.
Manuel ReinspergerHello Jonathan
Du könntest http://sourceforge.net/projects/germandict/ benutzen um den Text zu verdeutschen ;D
As of Derek’s idea, I don’t know if you are on the same page, or if I even am on the same page so… yeah.
Thanks, Manuel!
I’m still thrilled by this text-theremin you’ve created. I’ll add verdeutschen to my list – should get to it soon.
Hey Manuel and Derek,
I’ve been doing more work on the site and starting to understand some of the more basic ideas of programming a little better as I go along. I understand now the lookup table that you both were describing, and you are exactly right that it would be faster. I’m using something similar for the image library im working on, which will have about 10^1000000 possible images. Hopefully once that’s done I can apply what I’ve learned to the text library and make a more efficient 410-page algorithm.
I’ll keep you posted on the results!
Edgar Chávez GarcíaHello, everybody! I think it would be more interesting not only searching across pages as Derek Miller suggests, but making crossed searches, e.g.: characters contained in a page of a book with a given title, and/or located in a certain hexagon. This would be useful since every time the user sends the same search query, obtains different matches.
I’m working on a new version of the site which will have all 410-page books, and title searches will be much more versatile when it is finished.
I think searching a hexagon will remain more or less impossible though – unless you want to do it the old fashioned way and look through each page yourself. To search through a hexagon it would be necessary to generate all 262400 pages and read through the text, which I expect would be too taxing on my server. Also, keep in mind that even when searching through an entire hexagon it’s basically impossible to match a string of greater than six characters.
-
AuthorPosts