return to main site
search for: in: entire forum this post
you are here: root => general discussion => Welcome to AtSameIP 2.0
member since:
folders:
2
posts:
4
replies:
15

Welcome to AtSameIP 2.0

atsameip.com was basically an experiment. i saw some other similar site and i thought "i can think of a way to make a site like that" so i started building my clone.

you might think that this is like stealing an idea, but even if i did get the idea from another site i could not know how it worked inside. so i had to come up with my own way of producing the wanted result and thats basically the fun part .

so my first idea on how to make this was to make a table with 1 row for each IP and each row would have a field that contains a textual list of domains associated to this ip.

even to make sure the data doesnt get too big instead of just 1 "ips" table i made 1 for each first digit of an IP (from 1 to 255) like this "ips1", "ips2", "ips3" .... "ips255"

and well that allowed me to store as much data in as little space as possible.





and the basic idea is whenever someone searches for a domain, you ping that domain and add it to the list, so as more ppl come to checkout their domain the database grows.

but in practice after i just started this wasnt very good cos the site beeing unknown and no one visiting it, the database was not growing much and the results were poor compared to the *competitor sites

so i decided to build some scripts that would import lots of domains in my database.

the first primitive experiments included a script that generated random domain names and just tried them all and added the ones that pigned (think xo23hjd.com) and a crazed web crawler that was visiting random urls and saved all found urls on the pages and then visited them and went on, pinging all the domains it encountered which ended up completely hosing my server.

but finaly i started using a huge list of domains (only seems to have .com domains but lots of them) provided online by some site that i would rather not name (at least until im finished crawling their list...)

anyways so then i started having a more interesting amount of results but i was having 2 problems:

first with the quantity of domains getting large (a few million, on a few hundred thousand IPs) i had a problem that was caused by the fact that some of the IPs had very large amount of domains pointing to them. that made the rows of these IPs be heavy to read since they contained a long "text" content, and sadly since the whole list of domains was in 1 text, even when you wanted only to show a few of the domains of an IP you still had to read the whole list from the DB and parse it. so the result was that atsameip was generally fast, but as soon as one of the IPs with large amount of domains pointing to it was part of the results, it made the query quite long. thats a problem that i wanted to find an intelligent way to fix.

another problem that i had was that importing (and testing) all the domains from that huge online list of domains i found was taking a very long time (there might have been at least a few hundred thousand and maybe even million pages to crawl and each crawl was taking many hours) and the process was also using a lot of CPU on my DB (i have other sites on that server...)

but for the first version of the site thats how it was... i eventually stopped crawling the huge list of domains, having only crawled the first pages (now i have a lot of domains that start with 000000 like 00000000123sex.com ect...) and i was not thinking much of the site, using it once in a while...





time has passed and i have been creating various other sites and continuously improving my programming skills. and someday i noticed that i was starting to get some traffic on atsameip.com. so i went back to look at the site and asked myself "how can i improve it with my new skills?"

so first i wanted to restart the script to crawl the huge list of domains and get them all-in once and for all; thinking that maybe if i re-orchestrated the mysql queries differently i might be able to make it work faster.

when i started to try to write a more efficient script i felt that it was hard to build something efficient with the way my information was currently stored (255 separate tables with 1 row per IP and all the domains bundled in 1 text for each IP). so after spending a while trying to work something efficient on the old system, i started becoming crazy about redoing a brand new system, a more efficient one: my idea was to create a single table where each domain would have a row containing its name and its IP. of course that would mean 1 huge table, with a looots of rows... and it would ultimately use more space on the server because some IP address would be repeated. but that would allow me to manage the data a lot more easily, like easily verify if a domain is already listed or not and really select only 10 domains from a certain IP even if there are thousands. I felt that the new system might possibly fix my 2 problems, the problem where IPs with lots of domains slow the thing and the problem where crawling the list of domains is very slow.

So i started building the new system.

first i created that new "domains" table that would contain all the domains and then i wrote a script that would import all the domains from the old tables into the new table.

and then i found out that importing this data was taking forever! as im writing this i cannot be exactly sure but i suspect that the old tables might have contained about 2 or 3 million domains and now after many days i have managed to transfer about half of it in the new table!! lol im still continuing though until i will have transfered it all of course, i also improved the script many times since its first version now i guess its about as fast as i can make it for the moment.

then i made a new version of the script to crawl that huge list of domains to work with the new all-in-one table and since the new table makes it much easier to verify if a domain exists, i was able to reduce ridiculously the time it takes to crawl and test the domains of the list. lets just say that at the speed it was going before it would have tied up my database for about 30 years before it was done reading the whole list, but now i estimate that it will take about 3 days.

and then i also make an entire new version of the homepage to work with the new system and since i was waiting for this super long data transfer to be finished i started adding lots of options, like country flags next to ip, whois information and a forum and a contact us page

so you might think that all is well in wonderland, only need to wait for the data transfer to be finished and kick in the new homepage and bing we have a new super update of the atsameip system.

but sadly the new system, even if it was working fantastic, had a little problem: it was veeery slow!!

while the old system was generally lets say 9/10 fast and once in a while 2/10 fast, lets say the new system was steadily 4/10 fast... not good at all...

of course while im still importing data and have all these crawlers/table transfering scripts running, it makes it even slower. but i tested whit all these temporary scripts not running and while it was a little bit faster it was still slow enough to be annoying and generally not nearly as fast as the old system....

so at the moment the old system is still on the homepage, looking quite much exactly and working exactly like it always did. but theres a link on the home page to try the new version of atsameip which im still working on.

it has lots of new features that im proud of, including country flags and full support for internationalized domain names. but im still working on some ways of increasing the speed of this thing before i release it on the first page.



and i must say that even if it was not easy, i did manage to imagine a way to drastically speed up the new system but it involves pre-organizing a massive amount of information which i have a script working on at the moment and i estimate it will take several hours, so that gives me some time to write this long story

but basically, my point is that if all goes well, probably in a couple of hours, days max, the new system shall be ready, full of new options and Fast! ill keep you posted on the development.


post #1 permalink
please login to reply
member since:
folders:
2
posts:
4
replies:
15
-Edit-

so in the end atsameip 2.0 is still not ready for action yet. but i have not given up on it!

im not finished copying the old data in the new table, thats taking a lot longer than i thought it would. it must have been like 3 or 4 days already... and its hard to tell how long it will still take but i think somewhere between 1 and 3 more days.

pre-creating the cache for each ip is also taking longer than i would have hoped. and im suspecting some of the cache information to be deleted maybe a bug! i have to investigate this.

so in the end i feel that this creation of the new atsameip 2.0 is taking too long, so i decided to instead incorporate all the new features that i wanted to include with the new 2.0 system on the 1.0 system and take more time to get the 2.0 system ready. the things that i wanted to include in my initial update were this:

  • new reverse DNS system (atsaemip 2.0)
  • whois information
  • country flags
  • support for internationalized domain names
  • new site sections (forum, contact us... whois page..)


but now at the moment what im gonna do instead is throw in these new features
  • whois information
  • country flags
  • support for internationalized domain names
  • new site sections (forum, contact us... whois page..)

on the old atsameip 1.0 (its a bit of work now cos while these features have been built-in in the 2.0 system the 1.0 system has been build without this in mind)


ive already started, at the moment i got these added already
  • whois information
  • new site sections (forum, contact us... whois page..)


now im gonna build these tonight
  • country flags
  • support for internationalized domain names


while im still waiting for the full data to be finished transfering in the 2.0 database
post #4 permalink
member since:
folders:
2
posts:
4
replies:
15
-Edit-

today, 4 days later, the data transfer is still not done but is really getting there. its in the 216.x.x.x now (goes up to 255.x.x.x). its hard to plan how long it takes cos the number of domains in each ip is very different. example in the 66.x.x.x there were 40000 ip addresses with domains (the number of domains vary on each ip too) and in the 189.x.x.x there were only like 200 ip addresses. but in the 216.x.x.x there are 39000.

anyways i just want you to know that the atsameip 2.0 system is still in the oven!

while im waiting for this giant data transfer to be completed, im studying other subjects that were lacking to my (self-)education, like how to make SSL, run my own mail server and maybe if i have the time, how to run my own DNS server.

all that while watching movies and tv-shows episodes like Family Guy and playing a bit of Gothic 3...
post #7 permalink
member since:
folders:
2
posts:
4
replies:
15
incredible! the data transfer is finally finished! i now will work on the caches a bit. trying to make it so that the database is only like a backup to rebuild the caches but only the caches are used to answer queries... if possible.
post #8 permalink
member since:
folders:
2
posts:
4
replies:
15
ok so the 2.0 system has been released for a little while and its working fine but its a little bit slow. i think 1 of the main problem are these domains that point to many different IP addresses. that usually happens with big sites like google.com or yahoo.com. when i designed all this i was not aware that this existed... i seen it before but i didnt really think of it... so anyways now the way things are built when a domain keeps changing IP it prevents the cache system from working.

you might notice that trying to get the sites at the same IP of a big site like google.com or cnn.com takes a few seconds while most other should be rather fast.

Well anyways ive started to develop an update that will take this in consideration by allowing a domain to point to a group of IP addresses instead of just 1. i think that this could help the system to respond much faster in general.
post #14 permalink
member since:
folders:
2
posts:
4
replies:
15
Et voila! after a few days of procrastination and thinking and testing, i finally altered the system to allow domains to be bound to many IP at once as it is possible in the DNS. this will finally allow the cache to work properly on these domains that do point to more than 1 IP.

while i was at it i made the whole text of info returned by the system be printed on the page when a domain is pinged, because i judged that this little text could be informative.

and there you have it, now im pretty satisfied with this system for now. i will work on bugs as i discover them. please let me know if you find any.
post #15 permalink
a guest
you really smart guy....
post #19 permalink
a guest
quote from a guest on post #19
you really smart guy....

post #20 permalink
member since:
folders:
2
posts:
4
replies:
15
Thank you!

Atsameip 2.0 has been in operation for a while now and im pretty satisfied with it but i still find its too slow and most especially uses way too much CPU on my server, which runs a bunch of other things too!

Thats why i can tell you that very soon ill start building Atsameip 3.0 which will look and work exactly the same as 2.0 but will be much faster and lighter!

so stay tuned, and thanks a lot for using Atsameip!
post #23 permalink
member since:
folders:
2
posts:
4
replies:
15
i started building atsameip 3.0 today, my objective is to complete it before September.
post #24 permalink
member since:
folders:
2
posts:
4
replies:
15
the development of atsameip 3.0 went marvelously well.

i had already designed it in my head so i just went ahead and coded my idea. It only took a day and a half and it was all build all working and i was very happy.

i started pumping the data from the current atsameip (2.0) into the new system everything looked fast and fantastic.

after i finish pumping about 1% of the data in the new system my server started saying "no space left on device" not only i was not able to pump anymore data in my system but many other of my sites that freely create files for their caches started not working and i was not able to create any more folders on my system!!! i googled that issue and discovered that i was out of inodes!!!

code:
df -i

to see how many inodes you have left on your linux system.

i had no idea!!! i had no idea there was a maximum of files the server itself can contain and that includes everything!! all the sites even all the system files even each single folder or symlink.

on this server, the atsameip server, there is about 4.6 million inodes.

that means that the server can only contain a max of 4.6 million files thats it, thats all...

i really wish i had known that before, now im really learning the hard way. theres hours, days maybe even months of work for me to redo now cos i had nooo idea i had to limit the number of files on the system i thoughed i only had to check the space left.... motherfucker....


so where is atsameip 3.0 now?

well atsameip 3.0 is dead now you wont hear from it again, but instead of giving up, i think ill try to come up with what ill call atsameip 3.1. atsameip 3.1 just like atsameip 3.0 will not use any database cos its too slow with 5 million rows, it will use a homemade file system, only difference is that the file system will need to contain a limited number of files and folders!!

to be able to imagine how to design the next file system ive been thinking and thinking but i have not yet come up with an idea that i feel is good and i can start building.

so i guess that means that now i need to keep thinking... thinking a lot! until i can imagine a filesystem that will be homemade that will contain all my info and will be fast and will contain a limited amount of files and folders...

so ill be thinking... and ill let you know when i manage to wrap up my mind around some viable concept!
post #25 permalink
please login to reply

moderators of this post

envizz (level: ∞)
powered by Nodesforum