Item1403: Use of uninitialized value at lib/Foswiki.pm
Priority: Normal
Current State: Closed
Released In: 1.0.5
Target Release: patch
Applies To: Engine
Component:
Branches:
lib/Foswiki.pm
Index: core/lib/Foswiki.pm
===================================================================
--- core/lib/Foswiki.pm (revisão 3356)
+++ core/lib/Foswiki.pm (cópia de trabalho)
@@ -1779,8 +1779,9 @@
if ($cgiQuery) {
my $agent = $cgiQuery->user_agent();
if ($agent) {
- $agent =~ m/([\w]+)/;
- $extra .= ' ' . $1;
+ if($agent =~ m/([\w]+)/) {
+ $extra .= ' ' . $1;
+ }
}
}
}
--
JoenioCosta
Thanks for the report and fix.
Checked into SVN
--
KennethLavrsen - 02 Apr 2009
As I raised on IRC, for me, this only adresses half the problem there (not to mention we could use only one if, which would be me being picky).
Issue is that the user agent will be cut to its first word. I'm not sure it's by design, and even less sure it's the intended behavior, and the one a user might expect.
I would personally fix it that way:
Index: core/lib/Foswiki.pm
===================================================================
--- core/lib/Foswiki.pm (revisão 3356)
+++ core/lib/Foswiki.pm (cópia de trabalho)
@@ -1779,8 +1779,9 @@
if ($cgiQuery) {
my $agent = $cgiQuery->user_agent();
- if ($agent) {
- $agent =~ m/([\w]+)/;
- $extra .= ' ' . $1;
+ if($agent && $agent =~ /^(.+)$/) {
+ $extra .= ' ' . $1; # Untaint it
}
}
}
--
OlivierRaginel - 08 Apr 2009
I agree with Babar's improved fix. Joenio's fix addressed the symptom, but not the root problem.
--
CrawfordCurrie - 08 Apr 2009
Sorry, I had put a
\S
in the pattern above. This is just as bad as \w, as it won't match anything that has spaces, which is for me the root cause of the issue.
--
OlivierRaginel - 08 Apr 2009
The original issue was that when a client does not have any agent name - you ended up with warnings about uninitialized value.
THAT problem was correctly fixed by Joenio Costa and that is what I checked in.
If there is an additional problem then it is not yet described in this bug report.
The original code took the first full word in the agent string and appended it. The suggested patch above puts the entire string. You certainly do not want to put the entire long long loooong agent string in the log entry. The log format is designed to put a single word there.
I cannot support Oliviers patch as it will make the logs impossible to read. If someone needs the entire agent string look in the Apache log.
Another thing is: "What is the value of always seeing Mozilla?"
All browsers start with Mozilla/# where # is a version. So the value of this string is close to zero.
If we are to change the logging we should try and do something more intelligent. But I would not want to put too much CPU power into "giving the lamb a fifth leg". In fact I would not do anything unless someone comes up with a brilliant idea what to put in the column instead of Mozilla or a blank which is what you always get today.
So what is the big problem with "fixing the symptom and not the root cause?????"
My view would be to remove the agent string from this log column and reserve it only for the other messages we also show in that column. Like the email address of the person that registers.
--
KennethLavrsen - 13 Apr 2009
As alluded to by Kenneth, the change would alter the output format of the logs, and as such, even if deemed useful (which I agree it may not be), is probably more appropriate for a minor release rather than a patch release.
--
IsaacLin - 13 Apr 2009
Whatever.
--
OlivierRaginel - 13 Apr 2009
I imagine the original reason for putting the agent string in the log was to make it susceptible to analysis. Since one-word agent strings are rarely useful (most browsers pretend to be something else at some point in their devious lives) the full string is required to have any hope of decoding the real user agent. So I still support Babar's change. However since that same information is available from the server logs, I do wonder why it was ever added to this log.
--
CrawfordCurrie - 13 Apr 2009
For the record, I ended up to the same conclusion as Crawford, and as I have way less history of "our" project than you guys, I let you decide what's best.
I just think we're hiding things under the carpet.
And also, I do support Kenneth raising it, and incorporating Joenio's changes. I just didn't felt it should end there, that's all, so I wouldn't have marked this as "closed", even though, as Kenneth notes, it fixes the bug. Or maybe create a new Task stating that this user-agent string in the logs should be reviewed / enhanced / explained, ....
--
OlivierRaginel - 13 Apr 2009
The current feature uses that column for multiple things. And one is adding the first word in the agent string.
For the Foswiki log to be useful it must add something the Apache log does not do.
The Apache log is typically difficult to read by the human eye because it contains too much information. Each line is a kilometer long and all the JS and graphics and CSS files are also listed. The Foswiki log is brief, only contains the page views, and adds some extra information from the inside like revisions of topics, reprev etc. This is what makes it useful.
The Agent string is only added when the user is guest. My guess is that the one that coded it, wanted the information to quickly see what was loading the server. And the typical thing that loads a server is either a site sucking tool or a search bot. I believe many site sucking bots either hide themselves as a real browser or say they are Mozilla. But those should be blocked by the Apache config if you follow the advice from
ApacheConfigGenerator.
Some of the search bots have their name as first word. msnbot as an example. Google presents itself as Mozilla but has the Googlebot later in the string.
The reason I am against showing the whole agent string is that it drowns the other use of the field. The log becomes difficult to read for a human. If the purpose of the Agent string is to show which search agent.
Maybe we should just simply make a little regex with a list of the top 10 search agent strings represented by the single word that describes it (msnbot, Googlebot ...) and let the matched word appear. Mozilla will be one and shown if nothing else matched. By defining the regex right this should work without too much overhead. I can try and experiment a little and make a proposal that makes the agent a single word but with a little more value than today.
A compromize could be to match a single word first. And if no match put the entire agent string. This will make the log format readable in 99% of the log and a little less readable where some site sucker has operated. But then this may be OK as this is then the information you are most likely looking for.
Let me play a little.
I will open a new enhancement report when I have something and keep this one in Waiting for Release.
--
KennethLavrsen - 14 Apr 2009