Thursday, June 19, 2008

Web Browsers and Memory Fragmentation

I've been following Stuart Parmenter's blog posts about improving memory usage in Firefox 3. The most interesting part has been his excellent work on reducing memory fragmentation. By making pretty pictures of memory fragmentation under various malloc implementations, he was able to identify jemalloc as the best option for Firefox, and the improvement made by switching was pretty impressive.

But I suspect that ultimately it won't be enough. The advent of Ajax is dramatically changing the way browsers use memory. Rather than loading a fairly static DOM and occasionally doing a few small manipulations in Javascript as the site is used, modern web apps mutate the DOM much more extensively, adding and removing sections repeatedly as new data is received from the server. And they're going to be doing that kind of thing more and more.

There's an interesting problem here which is unique to browsers. Javascript doesn't give programmers direct access to pointers, so it's free to move objects around in memory however it pleases as it executes code. Modern Javascript implementations take advantage of that abstraction, and manage memory allocation with generational garbage collectors that compact memory by relocating objects as they operate. But the DOM, which also has to be manipulated by browser code in C or C++, is reference-counted. The reference counter can't migrate objects between regions of memory, so it depends on the underlying malloc implementation to control memory fragmentation. That's not always possible even for a very smart malloc to accomplish. DOMs in particular should be challenging, because they contain large numbers of strings of varying sizes.

Reference counting introduces memory fragmentation problems in any language, but the situation is particularly bad in browsers. Javascript programs and DOMs on different web pages share the same process memory space. The fragmentation caused by one page can interact with the memory used by other pages, and even after a problematic page is closed, the fragmented memory often can't be reclaimed. With ordinary applications, fragmented memory is all neatly reclaimed when the process terminates. Right now, that doesn't work for web apps.

But this is a problem for browsers, not for web apps. Individual web app authors won't be motivated to care if your browser uses a lot of memory and causes your machine to start swapping once or twice a day, especially if the blame is shared among many popular sites. Unless one site is much worse than its peers, high memory usage makes the browser look bad, not the app.

So I think browsers will eventually experience pressure to fix this problem. As the correct metaphor for a web page moves farther from "document" and closer to "application", maybe it makes sense for browsers to act more like operating systems. Local memory for a web page could be allocated in dedicated regions, and could all be bulk-reclaimed when the page is closed.

That model is simple, but it still allows individual pages to fragment their own memory and consume more and more over time. Maybe a better answer is to find a way to use a generational garbage collector on the DOM. The familiar OS model of per-process virtual memory management was designed for running programs written in languages like C and C++, and that's why it's appealing for handling browser DOMs. But Javascript is a high-level, functional, garbage-collected language. If browsers are little operating systems that run Javascript applications, maybe they should operate more like Lisp machine operating systems than like Unix.

Tuesday, June 17, 2008

Private Pastes (and Projects) for codepad.org

I've noticed that some pastes on codepad.org include confidentiality notices in their copyright boilerplate. You own (or your employer owns) the copyright on code you paste on codepad.org, but it is a public forum! Pasted code appears on the "Recent Pastes" page, for example, where anyone can see it, and search engines can index it.

It makes sense that some people might want to use the site without revealing their code to the public, though. To support that kind of usage, I've made it possible to flag a paste as private, so that it won't be publicly linked on the site, and will include a noindex meta tag to advise search engines not to index it. Just in case, I went and flagged all the pastes in the database that included the words "confidential" or "copyright" as private pastes.

While I was at it, I added a similar feature for codepad projects. Now you can set up a private codepad project to collaborate on code that you'd rather not show the whole world.

Since paste URLs are randomly generated, they shouldn't be vulnerable to the kind of URL-guessing attacks SmugMug had problems with earlier this year. Still, these measures aren't really good enough for protecting important trade secrets — especially for projects, where the name might be guessable. If it seems like there's demand for it, I'll consider adding access control lists in the future.