How/where to store (and retrieve) milions of small data files?

There is an application generating small (30-300KB) files containing XML, HTML and Json. It can generate tens of thousands of these files per day. Those files are connected with certain activities, which have identifiers. They are also groupped into about 250 categories. There should be some additional metadata stored like generation date, etc. Those files sometimes (I'd say once per minute in average) have to be analyzed (by human) and they should be relatively easy to find using those activity identifiers. It would also be usefull to be able to search for a certain text (this could be slower). After about 3 weeks or a month those files could be moved to some kind of an archive, which still should be searchable, but much less often.

I'd like to ask about some storage (and search) possibilities, which would offer scalability. (the output volume could multiply many times in the future).

I've heard of MongoDB, but I don't know much about it and if it's feasable.

I am sorry if the question overlaps an existing one, but I was unable to find the right keywords.

How/where to store (and retrieve) milions of small data files?

No comments:

Post a Comment