Validator.nu is validation 2.0.
RELAX NG validation—XML syntax and Compact Syntax
Schematron 1.5 validation (standalone schemas only—ISO Schematron or Schematron embedded in RELAX NG are not supported)
XML 1.0 and HTML5 parsing.
Validator.nu does not check for XML 1.0 validity constraints. That is, DTD validation is not performed.
Validator.nu does not perform the duties of a “validating SGML parser” as defined in ISO 8879. In fact, this service does not have any SGML functionality at all. In particular, the HTML 4.01 support uses the HTML5 parser with some additional error conditions.
Validator.nu has two facets: generic (complex UI) and (X)HTML5 (simple UI).
Enter the URL (
data URL to be
exact) of the document you want to validate in the field labeled
“Document” and submit the form. That’s all it takes in most
In the (X)HTML5 facet, the parser and the schema will be chosen
based on the HTTP
Content-Type of the document. In the
generic facet, the parser will be chosen based on the HTTP
Content-Type and a preset schema will be chosen based on
the root namespace (for XML) or the doctype (for
For simplicity, the HTML5 facet only shows UI for validation by URL. Validation by text area and by file upload are available in the generic facet.
Here are bookmarklets:
There is a command-line script that uploads documents from the local filesystem to the (X)HTML5 validator. Integration into vim is available.
When the field for schemas is left empty, the validator will try to
choose a schema on its own. If you are not happy with the guessed
preset, you can specify a schema either by selecting a preset or by
entering a space-separated list of schema URLs (
data URLs). In addition to actual schemas, you may use
certain special URLs to invoke checkers
that seem like special schemas but aren’t actually implemented as
If the automatic choice of parser does not work for you, you can
choose the parser manually. The choice of parser affects the HTTP
Accept request header that is sent.
When the lax option is set,
text/plain are allowed as XML content types and
text/plain is allowed as an HTML content type and, if
the URL ends with
.rnc, as a Compact Syntax content
type. Also, in the lax mode the US-ASCII default for
XML types is not enforced.
Normally, schemas using the RELAX NG XML syntax, Schematron schemas
and the XML documents to be validated are expected to be served
using an XML content type. Schemas using the RELAX NG Compact Syntax
are expected to be served using
content type. (The unregistered
content type is also understood.) HTML documents are expected to be
When the “Show Image Report” checkbox is set, a report concerning the textual
img elements in the XHTML namespace is shown for accessibility
You may check the “Show Source” checkbox to show the decoded source of the document being checked. Please note that the source may not be shown in its entirety if the parser encounters a fatal error. Moreover, the show source feature shows the decoded Unicode source. Erroneous byte sequences in the original source and characters that would render the validator output as non-conforming (e.g. U+0000) are not represented faithfully.
If you want to create you own alternative mode of input or want to call Validator.nu (or your own local copy) from within your own application, there is a RESTful Web service API. In addition to the modes of input that work from HTML forms, you can also POST the document to be checked as an HTTP entity body. In addition to the default HTML output, the messages are also available as XHTML, XML, JSON, GNU error format and plain text.
text/html-compatible content models)
HTML5 with ARIA (unendorsed integration prototype)
Mike(tm) Smith has generated documentation for this schema.
XHTML 1.0 Strict with URL support. Generally suitable for use HTML 4.01 Strict checking as well, although there are theoretically wrong corner cases. Uses backported HTML5 datatypes.
XHTML 1.0 Transitional with URL support. Generally suitable for use HTML 4.01 Transitional checking as well, although there are theoretically wrong corner cases. Uses backported HTML5 datatypes.
XHTML 1.0 Frameset with URL support. Generally suitable for use HTML 4.01 Frameset checking as well, although there are theoretically wrong corner cases. Uses backported HTML5 datatypes. Do not use. :-)
XHTML5 (XML-compatible content models)
XHTML5 with ARIA (unendorsed integration prototype), SVG 1.1, MathML 2.0 and holes for OpenMath, RDF and Inkscape cruft.
XHTML 1.0 (not 1.1), SVG 1.1 and MathML 2.0 with URL support.
XHTML 1.0 (not 1.1), Ruby, SVG 1.1 and MathML 2.0 with URL support.
A schema for XHTML Basic with URL support. Suitable for use with the HTML parser.
SVG 1.1 Full with URL support (Inkscape cruft not permitted).
The service supports a few special pseudo-schema URIs that map to checkers written in a Turing-complete programming language.
Checks (X)HTML table integrity. The current implementation should be considered a prototype that has not yet been updated to match the latest spec language for HTML5. (See more detailed discussion.)
Checks that constructs in the document tree are in the Unicode Normalization Form C and don’t start with a “composing character”. Using this pseudo-schema also enables normalization checking of source text. (See more detailed discussion.)
Checks the text content of the (X)HTML5
time elements for conformance. (This is a prototype
with liberties taken.)
Warns about RDF, OpenMath and Inkspace holes and about the use of
version="1.0" in SVG.
usemap attribute for referential integrity.
http://c.validator.nu/nfc/ http://c.validator.nu/text-content/ http://c.validator.nu/unchecked/ http://c.validator.nu/usemap/.
http://c.validator.nu/nfc/ http://c.validator.nu/unchecked/ http://c.validator.nu/usemap/.
Dumps parse events as warnings.
Your server cannot properly deal with an
header that does not have
*/* in it. Chances are that
you are using Apache 1.3, PHP and MultiViews together. MultiViews
thinks the type of your page is
which isn’t in the
Accept header. Apache 2 does not
have this problem.
No, Validator.nu does not give badges.
I have observed that once people are given badges they start to feel entitled to the badges and become hostile if the validation service is changed so that some documents that previously were proclaimed valid no longer are. I do not want to deliberately incite an opposition to bug fixes. I know some of the schemas are not as tight as the corresponding spec prose. If I make them tighter, consider it a bug fix. Moreover, the HTML 5 spec is still changing, so the schema will change as well. Finally, I may (and even intend to) change the namespace associations of preset schemas in the future.
In addition to the problem with changing the validator after badges have been awarded, badges don’t provide value to the readers of validated pages. Validation is a tool for you as a page author—not something your readers need to verify. However, if you are writing about Web authoring and want to refer others to Validator.nu, please, by all means feel free to link to Validator.nu.
By the time Ruby on Rails hit everyone’s radar, this project was already underway. However, Ruby would still have been a bad choice had I considered it seriously earlier. Ruby lacks a solid Unicode infrastructure. I’ve already been in a situation when I had to stop writing app code and spend time writing the very basics Unicode infrastructure. I don’t want to be in that situation again. Ruby lacks solid XML infrastructure as well.
I chose Java over Python for three reasons: SAX, Jing and more experience with Java. Apart from Java feeling like a more secure choice because I had more experience with it, the choice between Java and Python also comes down to infrastructure. Having a platform-wide unified way for plugging together XML tools is extremely important when what you are doing entails plugging together XML tools efficiently.
Java is in a unique position when it comes to XML tool infrastructure. Java has a lot of XML-related libraries available and they pretty much all plug into the same interface. Not only is there a platform-wide XML API, it is also happens to be one of the most complete and correct of the XML APIs around. From the point of view of RELAX NG, Java being the language Jing is written in is an extremely important consideration. Jing is a seriously good piece of software. Moreover, Java is the native language of the extensibility interface for RELAX NG datatype libraries.
While I’m on a soap box, I should mention that ICU4J is a seriously good piece of software, too, and having Java’s notion of Unicode frozen as UTF-16 from to dawn of time until eternity is very important considering the stability of infrastructure. It is a horribly bad idea that the meaning of Python programs change (due to datatypes changing underneath) depending on how the interpreter was compiled. Unicode is optimized for 16-bit units. The stability of sticking to UTF-16 in RAM everywhere outweighs the theoretical purity of UTF-32 in RAM. (On disk and network, use UTF-8, of course.)
I do want to make the validator functionality available to applications that are not written in Java, though. This is why Validator.nu has a Web service interface that can be used either with the instance running at validator.nu or with a your private instance running at localhost. I encourage you to write a wrapper library for the Web service in your favorite programming language.
I think DTDs are bad in four ways:
DTDs pollute the document with schema-specific syntax. Since the document itself declares the rules, the question on answered by DTD validation is not the question that should be asked. DTD validation aswers the question “Does this document conform to the rules it declares itself?” The interesting question is “Does this document conform to these rules?” when the person who asks the question chooses the rules the question is about.
DTDs mix a validation mechanism, an inclusion mechanism and an infoset augmentation mechanism. The inclusion mechanism is mainly used for character entities, which solve (but only if the DTD is processed and processing it is not required!) an input problem by burdening the recipient instead of keeping input matters between the editing software and the document author.
DTDs aren’t particularly expressive.
DTDs don’t support Namespaces in XML.
I hope providing an online validation service for RELAX NG removes the excuse that DTDs are needed for online validators.
“Validation” and “validator” in the name and the user interface of the service refer to the ISO/IEC FDIS 19757-2 definition of “validator” (which performs validation), to the Schematron “validation” function (which is performed by a validator), and to the HTML 5 definition of “validator”.
Schemas for XHTML 1.0 are used for HTML 4.01, because XHTML
1.0 is supposed to be a reformulation of HTML 4.01 in XML. However,
there are some subtle spec bugs introduced in the reformulation.
For this reason, some errors for HTML 4.01 are wrong. For example,
XHTML 1.0 (in the DTD) forbids the
name attribute on
form element, although it is allowed in HTML 4.01.
Please refer to the bug tracker for other known issues and for ideas for future development.
The preferred forum for reporting bugs, discussing issues, and getting help related to using the (X)HTML5 validator is the project's GitHub issue tracker.
ID/IDREF/IDREFS checking in RELAX NG is enabled for the benefit of those who use their own schemas and expect this feature to work. However, the preset schemas do not use RELAX NG ID/IDREF/IDREFS features, because the checking isn’t precise enough (cannot require that the referent is of a certain type) and using these features places really annoying restrictions on the schemas.
Comments are not exposed to the validation layer and, therefore, cannot be matched in Schematron.
The document is validated independently (but concurrently) against each schema. The Schematron validators do not see IDness assignments from the RELAX NG validators.
Embedded Schematron is not supported.
xml:id processing is performed. Also, the
id in no namespace is given IDness unless the
host element is a CML element. This means that both
id are matched by the XPath
function. SVG 1.2 IDness rules are not honored.
The following datatype libraries are supported:
NG DTD Compatibility library
The W3C XML
Schema Datatypes library
Datatype Library for HTML5 Datatypes
http://whattf.org/datatype-draft) This is not a
stable library, so you should not rely on it at this time.
The HTML parser emits
parse events as if it was parsing an equivalent XHTML flavor
document. Therefore, the schemas should assume lowercase element
names in the XHTML namespace and attributes in no namespace (except
langattribute maps to
The HTML 4.01 parsing mode does not use an SGML parser. Instead, the HTML5 parser is used in an HTML 4.01 compatibility mode. The names of boolean attributes are repeated as values for compatibility with XHTML 1.0 schemas. (This does not happen in the HTML5 mode.)
The source code and the dependencies can be obtained using a Python-based (no XML situps!) build script:
First, set the
JAVA_HOME environment variable properly.
mkdir checker cd checker svn co http://svn.versiondude.net/whattf/build/trunk/ build python build/build.py all
This will download, build and run the system at
http://localhost:8888/. For other options, please run
python build/build.py --help instead. Please note that the
dependencies are big. The script will spend time downloading stuff.
The script requires Python, Subversion and JDK 5 or later (JDK 6 and Hardy’s OpenJDK work). (Tested on Mac OS X and Ubuntu. On Windows, the build completes but the app crashes on startup.) Note: The script wants to see a Sun-compatible
jar executable. Debian
fastjar will not work.
The above example starts a standalone HTTP server with debug
messages printed to the console. To use AJP13
instead, use --ajp=on. A log4j
configuration for deployment can be given using the --log4j=
option. There is a sample file in
extras/ is searched for additional jars for
the classpath. For example, if you configure log4j to send email, you
should put the Java
Mail API and JavaBeans
Activation Framework jars in
I would like to thank the Mozilla Foundation and the Mozilla Corporation for funding this project.
I would like to thank James Clark for writing Jing and for championing RELAX NG and XML. I would also like to thank everyone who tested the development builds, the writers of test cases and everyone who has developed library code and schemas that the service uses.
The XHTML 1.0 schemas were originally written by James Clark and have been improved by Petr Nálevka.
fantasai designed the (X)HTML5 schema framework, wrote the (X)HTML5 Core schemas and helped along the way when I added features.
The schemas for RELAX NG and XSLT were written by James Clark.
The principal author of the schema for DocBook is Norman Walsh.
The SVG schemas come from the W3C.
The MathML schema was written by Yutaka Furubayashi.
Test cases written by fantasai, Anne van Kesteren and Christoph Schneegans were very useful in developing this service.
This product includes software developed by The Apache Software Foundation (http://www.apache.org/).
This product uses The SAXON XSLT Processor from Michael Kay.
Focuses on HTML, XHTML, WML. Uses SGML DTDs and custom code for HTML. Uses XSD and custom code for XHTML. Recently added support for RSS and Atom, but that feature is still in flux.
Validates using the XSD implementation of XHTML 1.0.
Uses RELAX NG and Schematron for validating XHTML and HTML. (The XHTML 1.0 schemas offered here as presets are based on the schemas used in Relaxed.)
DTD-based SGML and XML validation.
Checks Atom and RSS feeds. Uses Python as the schema language. :-)
Checks CSS style sheets.
DTD-based SGML and XML validation.
This service is provided in the hope that it is useful. Neither Henri Sivonen nor anyone else has any obligation to provide this service to you. The service or any part thereof may be discontinued at any time without notice. There is absolutely no warranty. There is no guarantee of a level of service. If you need a guaranteed level of service, you should probably run your own instance of the software.
Please use the service reasonably. If you call it from your own blog, that’s cool. If you need a validator as a part of a massively traffic-generating blog hosting service, please run your own instance.
When you access the validation service, data about the access is logged for the purpose of understanding the use of the service, identifying popular resources for retrieval to local storage and acting on abuse.
The HTTP request/response pair between your user agent and the
service is logged in the “combined” format (without identd
check). The logged data includes the network address of the remote
host from which the request came, the HTTP authentication name (if
for whatever reason supplied; not requested by the service), the date
and time of the request, the first line of the request including the
HTTP version, the path part of the URL and the query string
containing the validator arguments, the HTTP “
header (where you came from) and HTTP “
header (the name and version of your browser).
Additionally, the URLs of the HTTP requests made by the validator are logged. Some internal error conditions may also be logged. When an internal error condition is logged, the log entry may include data entered by you or pertaining to the resources your request caused the validator to process. Finally, (X)HTML5 validation errors are logged for documents that are retrieved from the Web (i.e. for documents that are world-readable anyway).
The logs are readable by me (Henri Sivonen) and, technically, by the administrators of the hosting provider. I have no intent of sharing raw log entries with others (except with law enforcement officials if necessary). However, I reserve the right to publish aggregate statistics derived from the logs.