NYCPHP Meetup

NYPHP.org

[nycphp-talk] session variables: seven deadly sins

Paul Houle paul at devonianfarm.com
Tue Dec 26 22:11:46 EST 2006


Allen Shaw wrote:
> Really?  That's a surprising assertion, though I'll agree my surprise 
> probably comes more from my own lack of insight than a flaw in your 
> argument. Of course a quick google shows a few people hold that 
> session vars are "evil," but I can't find much to back up the idea.
>
> Can you elaborate, or give us a few links on the topic?
    I'll try to reply to this and some other people who replied to my 
previous message.

    I'll start with my background.  I've often been the person who the 
buck stops with -- somebody else develops an application that almost 
works (perhaps even puts it in production) and then I have to clean up 
the mess.  The app might be written in PHP,  Java,  Cold Fusion,  Perl,  
you name it.  I've learned to see session variables as a "bad smell".

    When I develop my own applications,  I use cookies for 
personalization and caching.  I use the authentication system described in

http://cookies.lcs.mit.edu/pubs/webauth:sec10-slides.ps.gz

    this mechanism can carry a "session id",  which in turn can be used 
a key against application state stored in a relational database.  I 
think through the boundary cases,  and find that my greenfield apps 
behave predictably -- my only woe is that you'll discover that browsers 
have a lot of undocumented behavior connected with cookies,  form 
handling,  and caching.  All problems that you still need to fight with 
if you use sessions,  see the comments for

http://www.php.net/manual/en/function.session-cache-limiter.php

----

    The context of this is that the average web application is poor in 
the areas of usability and security:  recent studies show that 80% of 
web applications have serious security problems

http://www.whitehatsec.com/home/resources/presentations/files/wh_security_stats_webinar.pdf

    Jacob Nielsen's website has been chronicling the sorry state of web 
application usability:

http://www.useit.com/

    Perhaps the top 20% of programmers can write applications with 
$_SESSION that don't have serious security and usability problems,  but 
what about the other 80%?

----

(1)  Session variables are treacherous.  Odd things can happen in 
boundary cases,  such as when sessions expire,  or when you are targeted 
by session fixation attacks.

http://shiflett.org/articles/security-corner-feb2004

    I've looked at many apps that use sessions that seem to be 
working...  Until you walk away for two hours,  come back,  and discover 
that you're logged in as somebody else.  I suppose I could have spent 
hours or days tracking down an intermittent problem,  which involved 
some confluence of browser oddness (IE was fine,  Firefox was screwy),  
the behavior of the session system,  and crooked logic in the 
application.  Or I could use cryptographically signed cookies to 
implement an authentication system which won't give me surprises in the 
future.

Anybody can write applications that work 95% of the time with 
$_SESSION.  Getting the other 5% right requires a deep understanding of 
state and statelessness on the web...  Which is what (many) people are 
trying to avoid when they use $_SESSION variables.

    There are more than twenty configuration variables that affect the 
way sessions work under PHP.  Incorrect configuration of any of these 
can cause applications to fail,  often in intermittent ways.  The use of 
a custom session handler can have unpredictable effects on security,  
reliability and performance.

    Other languages are a lot worse than PHP -- the use of the "scope" 
concept in languages such as Cold Fusion and Tango makes it easy to use 
a session variable without realizing it...  Resulting in an application 
that "works" sometimes,  but fails in mysterious ways.

(2) Session variables are bound to a particular language.  In the real 
world,  I work with legacy systems that might be written in other 
languages.  I might have some old pages in Cold Fusion that work just 
fine,  and I won't rework them in PHP until I've got a good reason.  If 
users can set a customization parameter,  such as the background of a 
page,  it's easy to write a cookie that all languages can read.  
Applications stuck in the session variable roach motel aren't as 
maintainable and portable.

(3) PHPSESSID.  Do I need to say more?  I consider the client that wants 
user tracking and can't accept cookies,  so all the pages on their
site look like

http://www.example.com/about_us.php?PHPSESSID=**pseudo-random blob**

    Three months later they come back and wonder why their site isn't 
being indexed in Google.  Yes,  there's a saner way to use this 
feature,  but this "cure" to privacy violation is worse than the cookie 
"disease",  since session ids will leak out through referrers,  
bookmarks,  links that people cut-and-pate...

(4) The back button.  When somebody asks a question about sessions on a 
forum,  they'll usually ask another question a few days or weeks later:  
"How do I disable the back button?"

    The underlying problem is a deep aspect of the structure of the 
web.  There is certain state information that's particular to a request 
(GET and POST variables) and certain state information that has a more 
persistent scope (cookies,  session information,  a relational 
database.)  The back button makes it possible for these two things to 
get out of sync.

    Ultimately,  we need a systematic strategy to deal with this.  One 
pattern is to put the complete state of the application in form 
variables.  Applications that use this pattern always work perfectly 
with the back button.  This pattern doesn't work always (hitting the 
back button shouldn't cancel your order on an e-commerce site),  but it 
works often...  For instance,  you can use hidden variables to hold onto 
form variables for complicated forms that spread over several pages,

(5) Multiple windows.  I think it's a human right to be able to have 
more than one window open on a web site.  If I'm shopping,  for 
instance, I'd like to be able to look at two products simultaneously.  
An application that keeps state in form variables doesn't care how many 
you have open.  If you're looking for jobs at an organization that uses 
taleo.net's software,  you'll find that it uses trickery to prevent you 
from having more than one window open...  So you can't look at two jobs 
at once,  or look at the job description while you're filling out the 
application.  I suspect that they did this because they don't want to 
spend forever debugging "race conditions" that could be caused by a user 
acting in two windows simultaneously.

    Session variables introduce problems of locking.  PHP gets an 
exclusive lock on the session for each page displayed.  This hurts the 
performance of pages that use dynamically generated images and 
Javascript,  and can mysteriously deadlock AJAX applications.

(6) Scalability,  Reliability,  and all that.  This is a tricky one,  
because it depends on particulars.  Sessions can be lightning-fast in 
systems that keep them in RAM,  such as Java and Cold Fusion.  The 
default session handler in PHP uses files,  and is probably faster than 
a relational database in a direct comparison:  however,  the session 
handler will load all of the data into RAM,  whereas a relational 
implementation may only need to load information when it's needed.  
Keeping information in POST variables or cookies also involves a 
tradeoff -- this is as scalable as it gets so far as server resources,  
but requires that the state be passed back and forth between the browser 
and server.  This is no big deal if the state is 500 bytes.  It's 
unacceptable if the state is 500 megabytes.  In most cases,  it starts 
looking expensive when we're passing an extra 10k-100k around.

I've recently been working on a legacy app that contains a query (select 
a subset of items) and reporting (display user-selected fields of those 
items) function.  The interface between those modules is simple:  the 
query system passes a comma-separated list of item identifiers to the 
reporting system.  I like this,  because it meant that one system could 
be changed without affecting the other.  I had to update the app so it 
would work with a changed database schema,  so both sides needed some work.

I discovered that the app was passing the item list as a session 
variable.  This worked:  unless I was using the application in two 
windows at a time.  In that case,  a query in one window would change 
the report delivered in another window.  I thought about it,  and 
realized that in this case,  result sets would always be under about 
10k,  and usually be around 1k.  Therefore,  it made sense to pass this 
as a hidden variable in the form and ditch the session variable.

This shows the kind of problems that regularly turn up in the 
applications that developers "throw over the wall" to testers and 
clients.  Choose a session variable,  and your application behaves 
mysteriously for a user who didn't respect the "one window at a time" 
assumption you made.  Passing hidden variables in forms,  on the other 
hand,  might work OK when you're testing with a small data set over a 
LAN,  but could rapidly become a performance nightmare for dialup users 
using a production database.

Performance can be improved in a number of ways:  for instance,  by 
delta-sigma compressing the item list,  or creating a "form scope" 
variable that's keyed against a unique identifier in the form.  Either 
way,  quality web applications take quality thought.

(7) Lack of engineered application state:  Engineered Application State 
is the gem of database-backed web applications.

If you keep the state of your application in a relational database,  you 
need to ~design~ the state of your application.  You need to ~think~ 
every time you add or change a table in your relational database.  You 
can add a new variable to your application as easily as typing '$'.

Desktop apps keep the application state in a tangle of pointers.  C and 
C++ applications tend to contain 5 or more defects per thousand lines of 
code.  Errors show up in data structures over time,  just as mutations 
occur in your cells.  Memory leaks,  application hangs,  and crashes are 
cancers caused by these mutations.

PHP apps die at the end of each request,  and are reborn for the next 
request.  They don't accumulate errors over time.  Web application 
environments such as Java and Cold Fusion that involve a long-running 
process regularly hang or crash and require restarts.  When is the last 
time you've had to restart PHP?

A database protects you from errors in multiple ways.  Transactions,  
for instance,  protect against data corruption caused by crashing 
scripts.  It's easy to write

$_SESSION["logged_in"]=true;

in one place and

$_SESSION["logged-in"]=false;

in another,  introducing unpredictable behavior and security holes.  A 
relational database will give you an error if you try something like that.

-------------

Can users of $_SESSION avoid the seven deadly sins?

Yes.

In practice they don't.










More information about the talk mailing list