Discussion:
Highly specialized web server
(too old to reply)
Alex Kovach
2003-12-09 04:22:04 UTC
Permalink
In the past I have developed dozens of web applications with apache,
mysql or oracle, and perl. I've also developed java servlets on Tomcat.

I'm about to embark on designing the largest site I've ever done,
hopefully with the ability to serve 30,000 - 40,000 simultaneous users.

Here are a few thoughts in my head:

- databases exist for those too lazy to manage their own data
structures. Assuming the database would be run from memory anyway, it
would be best to make custom data structures to replace each table.
- the 'best' solution is to create your own multi-threaded program that
listens on port 80. writing it in assembly is best, c/c++ would work as
well.

In short, I don't mind re-inventing the wheel, if it will be faster.

Some questions I'm asking ... how much of a performance hit do I take by
adding the overhead of apache? a database? mod_perl? java JVM?

AK
R.F. Pels
2003-12-09 20:45:51 UTC
Permalink
Post by Alex Kovach
I'm about to embark on designing the largest site I've ever done,
hopefully with the ability to serve 30,000 - 40,000 simultaneous users.
Why bother? Just have a look at finance.yahoo.com. These guys really have a
gazillion records in a MySQL database and probably run the site using a
combination of apache and backhand.

There's absolutely no need to create your own webserver or database or
wahtever. Proper tuning of the components in combination with running the
thing from a server farm easily does the trick. In this case, the simple
thought 'troughput' is what is most important. In other words, divide and
conquer.
--
Ruurd
.o.
..o
ooo
Alex Kovach
2003-12-10 01:48:39 UTC
Permalink
Hi RFP, thanks for the reply.
Post by R.F. Pels
Why bother? Just have a look at finance.yahoo.com. These guys really have a
gazillion records in a MySQL database and probably run the site using a
combination of apache and backhand.
finance.yahoo.com has a trivial design with respect to data flow. The
quotes are all read-only. And that makes replication very, very easy.

The application in question will have frequent database writes. We now
have to worry about tables locking.

-A
R.F. Pels
2003-12-10 20:16:17 UTC
Permalink
Post by Alex Kovach
Post by R.F. Pels
Why bother? Just have a look at finance.yahoo.com. These guys really have
a gazillion records in a MySQL database and probably run the site using a
combination of apache and backhand.
finance.yahoo.com has a trivial design with respect to data flow. The
quotes are all read-only. And that makes replication very, very easy.
On the dissemination part of the system, yes, that's true.
Post by Alex Kovach
The application in question will have frequent database writes. We now
have to worry about tables locking.
Again, as I said: divide and conquer. Think about how usefull or necessary
it is to store all data in the same database at the same time. Are there
possibilities to manipulate per user data and can those manipulations be
divided over more than one database, propagating the changes to a central
database at a convenient time. Is propagating all the per user data
necessary?

Example. If you have 30-40K simultaneous users, how about distributing them
over let's say 30-40 different database engines?

Food for thought?
--
Ruurd
.o.
..o
ooo
Chris Shepherd
2003-12-10 17:19:35 UTC
Permalink
Post by Alex Kovach
I'm about to embark on designing the largest site I've ever done,
hopefully with the ability to serve 30,000 - 40,000 simultaneous users.
From your later statements in your post, I am guessing this is the
first time you've done ANY kind of enterprise deployment of more than
1,000 users?
Post by Alex Kovach
- databases exist for those too lazy to manage their own data
structures. Assuming the database would be run from memory anyway, it
would be best to make custom data structures to replace each table.
Actually, that's not true. Databases exist for those who want to take
advantage of the various indexing and searching algorithms that have
been properly developed with decades of funding and experience put into
their design. I doubt you will be able to write a database solution that
is even half as efficient than those provided by the major players
(Oracle, IBM, etc.). People do not often want to reinvent the wheel,
especially where there are a number of experts on wheel design (in this
case anyway) that know a load more about it.
Post by Alex Kovach
- the 'best' solution is to create your own multi-threaded program
that listens on port 80. writing it in assembly is best, c/c++ would
work as well.
You want to reimplement a webserver, and database system because you
think it will be faster? Do you have any idea how much time that will
add to your development time?
Post by Alex Kovach
In short, I don't mind re-inventing the wheel, if it will be faster.
Some questions I'm asking ... how much of a performance hit do I take by
adding the overhead of apache? a database? mod_perl? java JVM?
Compared to what?

I have to say, your plan sounds very silly. You really honestly should
stick to desiging the app and database. If you have any competence at
either you can design such a system and implement it on a load-balanced
basis.
--
Chris Shepherd
Alex Kovach
2003-12-12 02:14:34 UTC
Permalink
Post by Chris Shepherd
From your later statements in your post, I am guessing this is the
first time you've done ANY kind of enterprise deployment of more than
1,000 users?
At work, I am part of a team that does so, yes. But this is the first
time I would be doing it alone.
Post by Chris Shepherd
their design. I doubt you will be able to write a database solution that
is even half as efficient than those provided by the major players
I agree! But I'm not trying to make a database, I'm trying to make an
application. My plan is to use specialized datastructures for each type
of data. I guarentee you that I make a C program that looks up a
username from a usernumber faster than any database system, IBM, Oracle,
or otherwise. How? Make an array of fixed-size strings. Now I have
O(1) access time, and no overhead of ODBC, indexing, etc. etc.

I spent 4 years doing this to get my CS degree. I might as well spend
one month doing this for myself.
Post by Chris Shepherd
You want to reimplement a webserver, and database system because you
think it will be faster? Do you have any idea how much time that will
add to your development time?
Again, I don't need the database system. Socket programs are easy to
implement, and the HTTP protocol is simple. I can store images and such
on an apache box, but do page creation on this specialized server.
Post by Chris Shepherd
I have to say, your plan sounds very silly.
So did Columbus's :)
Chris Shepherd
2003-12-14 18:31:33 UTC
Permalink
Post by Alex Kovach
I agree! But I'm not trying to make a database, I'm trying to make an
application. My plan is to use specialized datastructures for each type
of data. I guarentee you that I make a C program that looks up a
username from a usernumber faster than any database system, IBM, Oracle,
or otherwise. How? Make an array of fixed-size strings. Now I have
O(1) access time, and no overhead of ODBC, indexing, etc. etc.
You intend on writing everything in C, by yourself, and you believe you
can achiece 0(1) access time? Considering the large scale deployment you
are looking at, I'd have to question such an application's scalability.
On one box, that may be true. When you have to scale it and cluster it
with two, or three boxes, you are already dropping your response time to
that of a DB server. The added advantage of a database server is that
all of that code has been written, tested, debugged, etc, thoroughly by
a huge group of people. You may not want to call your 'data structures'
a database, but in essence, you are still reinventing the wheel, and for
your other applications to work with it, you will have to create a query
language, etc., etc.. I really don't see why you think this is even
worth your time considering.
Post by Alex Kovach
I spent 4 years doing this to get my CS degree. I might as well spend
one month doing this for myself.
What works in a theoretical instance does not always function in the
real world.
Post by Alex Kovach
Again, I don't need the database system. Socket programs are easy to
implement, and the HTTP protocol is simple. I can store images and such
on an apache box, but do page creation on this specialized server.
Since you evidently weren't looking for actual input, but rather just an
argument with any who would suggest you do things otherwise, I wish you
luck.
Post by Alex Kovach
Post by Chris Shepherd
I have to say, your plan sounds very silly.
So did Columbus's :)
The difference here (and why it is absurd for you to make that
comparison) is that the 'alternate route to india' (scalable data
storage and web services) has already been 'discovered' (implemented)
countless times before.
--
Chris Shepherd
NicK
2003-12-18 20:52:17 UTC
Permalink
The specialized datastructure server could be good for certain cases,
but beyond a certain point, it may need some kind of indexing....unless
you have thought of something I havent.

Also, an array of fixed-length strings isn't really advisable. I've
noticed that using the required data type (such as boolean or integer)
can sometimes be easier on the load. Try a test run feeding a million
numbers into an String array (with conversion and all) and then try it
with an int array.

BTW, Oracle and other DB servers do provide fixed length character data
types (CHAR), but I wouldnt want to stop you from being the catalyst
that brought about a new kind of software into the I.T. world.


------------
Reply to newsgroup; I get more than 80 spam mail messages and so I just
hit delete if it looks like spam.
Post by Chris Shepherd
Post by Alex Kovach
I agree! But I'm not trying to make a database, I'm trying to make an
application. My plan is to use specialized datastructures for each
type of data. I guarentee you that I make a C program that looks up a
username from a usernumber faster than any database system, IBM,
Oracle, or otherwise. How? Make an array of fixed-size strings. Now
I have O(1) access time, and no overhead of ODBC, indexing, etc. etc.
You intend on writing everything in C, by yourself, and you believe you
can achiece 0(1) access time? Considering the large scale deployment you
are looking at, I'd have to question such an application's scalability.
On one box, that may be true. When you have to scale it and cluster it
with two, or three boxes, you are already dropping your response time to
that of a DB server. The added advantage of a database server is that
all of that code has been written, tested, debugged, etc, thoroughly by
a huge group of people. You may not want to call your 'data structures'
a database, but in essence, you are still reinventing the wheel, and for
your other applications to work with it, you will have to create a query
language, etc., etc.. I really don't see why you think this is even
worth your time considering.
Post by Alex Kovach
I spent 4 years doing this to get my CS degree. I might as well spend
one month doing this for myself.
What works in a theoretical instance does not always function in the
real world.
Post by Alex Kovach
Again, I don't need the database system. Socket programs are easy to
implement, and the HTTP protocol is simple. I can store images and
such on an apache box, but do page creation on this specialized server.
Since you evidently weren't looking for actual input, but rather just an
argument with any who would suggest you do things otherwise, I wish you
luck.
Post by Alex Kovach
Post by Chris Shepherd
I have to say, your plan sounds very silly.
So did Columbus's :)
The difference here (and why it is absurd for you to make that
comparison) is that the 'alternate route to india' (scalable data
storage and web services) has already been 'discovered' (implemented)
countless times before.
Loading...