更新软件的包和使用的 JDK 版本

This commit is contained in:
2022-12-15 00:44:03 -05:00
parent dd08af8f9f
commit 8eeeccb213
9 changed files with 884 additions and 799 deletions
+113
View File
@@ -120,3 +120,116 @@ Google can understand a wide variety of custom sitemap formats that they made up
To generate a special type of sitemap, just use GoogleMobileSitemapGenerator, GoogleGeoSitemapGenerator, GoogleCodeSitemapGenerator, GoogleCodeSitemapGenerator, GoogleNewsSitemapGenerator, or GoogleVideoSitemapGenerator instead of WebSitemapGenerator.
You can't mix-and-match regular URLs with Google-specific sitemaps, so you'll also have to use a GoogleMobileSitemapUrl, GoogleGeoSitemapUrl, GoogleCodeSitemapUrl, GoogleNewsSitemapUrl, or GoogleVideoSitemapUrl instead of a WebSitemapUrl. Each of them has unique configurable options not available to regular web URLs.
<html><head><title>How to use SitemapGen4j</title></head>
<body>
<h1>How to use SitemapGen4j</h1>
SitemapGen4j is a library to generate XML sitemaps in Java.
<h2>What's an XML sitemap?</h2>
Quoting from <a href="http://sitemaps.org/index.php">sitemaps.org</a>:
<blockquote><p>Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.</p>
<p>Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.</p>
<p>Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.</p>
</blockquote>
<h2>Getting started</h2>
<p>The easiest way to get started is to just use the WebSitemapGenerator class, like this:
<pre name="code" class="java">WebSitemapGenerator wsg = new WebSitemapGenerator("http://www.example.com", myDir);
wsg.addUrl("http://www.example.com/index.html"); // repeat multiple times
wsg.write();</pre>
<h2>Configuring options</h2>
But there are a lot of nifty options available for URLs and for the generator as a whole. To configure the generator, use a builder:
<pre name="code" class="java">WebSitemapGenerator wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
.gzip(true).build(); // enable gzipped output
wsg.addUrl("http://www.example.com/index.html");
wsg.write();</pre>
To configure the URLs, construct a WebSitemapUrl with WebSitemapUrl.Options.
<pre name="code" class="java">WebSitemapGenerator wsg = new WebSitemapGenerator("http://www.example.com", myDir);
WebSitemapUrl url = new WebSitemapUrl.Options("http://www.example.com/index.html")
.lastMod(new Date()).priority(1.0).changeFreq(ChangeFreq.HOURLY).build();
// this will configure the URL with lastmod=now, priority=1.0, changefreq=hourly
wsg.addUrl(url);
wsg.write();</pre>
<h2>Configuring the date format</h2>
One important configuration option for the sitemap generator is the date format. The <a href="http://www.w3.org/TR/NOTE-datetime">W3C datetime standard</a> allows you to choose the precision of your datetime (anything from just specifying the year like "1997" to specifying the fraction of the second like "1997-07-16T19:20:30.45+01:00"); if you don't specify one, we'll try to guess which one you want, and we'll use the default timezone of the local machine, which might not be what you prefer.
<pre name="code" class="java">
// Use DAY pattern (2009-02-07), Greenwich Mean Time timezone
W3CDateFormat dateFormat = new W3CDateFormat(Pattern.DAY);
dateFormat.setTimeZone(TimeZone.getTimeZone("GMT"));
WebSitemapGenerator wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
.dateFormat(dateFormat).build(); // actually use the configured dateFormat
wsg.addUrl("http://www.example.com/index.html");
wsg.write();</pre>
<h2>Lots of URLs: a sitemap index file</h2>
One sitemap can contain a maximum of 50,000 URLs. (Some sitemaps, like Google News sitemaps, can contain only 1,000 URLs.) If you need to put more URLs than that in a sitemap, you'll have to use a sitemap index file. Fortunately, WebSitemapGenerator can manage the whole thing for you.
<pre name="code" class="java">WebSitemapGenerator wsg = new WebSitemapGenerator("http://www.example.com", myDir);
for (int i = 0; i &lt; 60000; i++) wsg.addUrl("http://www.example.com/doc"+i+".html");
wsg.write();
wsg.writeSitemapsWithIndex(); // generate the sitemap_index.xml
</pre>
<p>That will generate two sitemaps for 60K URLs: sitemap1.xml (with 50K urls) and sitemap2.xml (with the remaining 10K), and then generate a sitemap_index.xml file describing the two.</p>
<p>It's also possible to carefully organize your sub-sitemaps. For example, it's recommended to group URLs with the same changeFreq together (have one sitemap for changeFreq "daily" and another for changeFreq "yearly"), so you can modify the lastMod of the daily sitemap without modifying the lastMod of the yearly sitemap. To do that, just construct your sitemaps one at a time using the WebSitemapGenerator, then use the SitemapIndexGenerator to create a single index for all of them.</p>
<pre name="code" class="java">WebSitemapGenerator wsg;
// generate foo sitemap
wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
.fileNamePrefix("foo").build();
for (int i = 0; i &lt; 5; i++) wsg.addUrl("http://www.example.com/foo"+i+".html");
wsg.write();
// generate bar sitemap
wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
.fileNamePrefix("bar").build();
for (int i = 0; i &lt; 5; i++) wsg.addUrl("http://www.example.com/bar"+i+".html");
wsg.write();
// generate sitemap index for foo + bar
SitemapIndexGenerator sig = new SitemapIndexGenerator("http://www.example.com", myFile);
sig.addUrl("http://www.example.com/foo.xml");
sig.addUrl("http://www.example.com/bar.xml");
sig.write();</pre>
<p>You could also use the SitemapIndexGenerator to incorporate sitemaps generated by other tools. For example, you might use Google's official Python sitemap generator to generate some sitemaps, and use WebSitemapGenerator to generate some sitemaps, and use SitemapIndexGenerator to make an index of all of them.</p>
<h2>Validate your sitemaps</h2>
<p>SitemapGen4j can also validate your sitemaps using the official XML Schema Definition (XSD). If you used SitemapGen4j to make the sitemaps, you shouldn't need to do this unless there's a bug in our code. But you can use it to validate sitemaps generated by other tools, and it provides an extra level of safety.</p>
<p>It's easy to configure the WebSitemapGenerator to automatically validate your sitemaps right after you write them (but this does slow things down, naturally).</p>
<pre name="code" class="java">WebSitemapGenerator wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
.autoValidate(true).build(); // validate the sitemap after writing
wsg.addUrl("http://www.example.com/index.html");
wsg.write();</pre>
<p>You can also use the SitemapValidator directly to manage sitemaps. It has two methods: validateWebSitemap(File f) and validateSitemapIndex(File f).</p>
<h2>Google-specific sitemaps</h2>
<p>Google can understand a wide variety of custom sitemap formats that they made up, including a Mobile sitemaps, Geo sitemaps, Code sitemaps (for Google Code search), Google News sitemaps, and Video sitemaps. SitemapGen4j can generate any/all of these different types of sitemaps.</p>
<p>To generate a special type of sitemap, just use GoogleMobileSitemapGenerator, GoogleGeoSitemapGenerator, GoogleCodeSitemapGenerator, GoogleCodeSitemapGenerator, GoogleNewsSitemapGenerator, or GoogleVideoSitemapGenerator instead of WebSitemapGenerator.</p>
<p>You can't mix-and-match regular URLs with Google-specific sitemaps, so you'll also have to use a GoogleMobileSitemapUrl, GoogleGeoSitemapUrl, GoogleCodeSitemapUrl, GoogleNewsSitemapUrl, or GoogleVideoSitemapUrl instead of a WebSitemapUrl. Each of them has unique configurable options not available to regular web URLs.</p>
</body>
</html>
+5 -6
View File
@@ -83,7 +83,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.4</version>
<version>3.2.1</version>
<executions>
<execution>
<id>attach-sources</id>
@@ -96,16 +96,15 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.10.1</version>
<version>3.4.1</version>
<executions>
<execution>
<id>attach-javadocs</id>
<id>create-javadoc-jar</id>
<goals>
<goal>javadoc</goal>
<goal>jar</goal>
</goals>
<configuration>
<additionalparam>-Xdoclint:none</additionalparam>
</configuration>
<phase>package</phase>
</execution>
</executions>
</plugin>
@@ -6,27 +6,30 @@ import java.net.URL;
/**
* Builds a code sitemap for Google Code Search. To configure options, use {@link #builder(URL, File)}
*
* @author Dan Fabulich
* @see <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=75224">Creating Code Search Sitemaps</a>
*/
public class GoogleCodeSitemapGenerator extends SitemapGenerator<GoogleCodeSitemapUrl,GoogleCodeSitemapGenerator> {
public class GoogleCodeSitemapGenerator extends SitemapGenerator<GoogleCodeSitemapUrl, GoogleCodeSitemapGenerator> {
GoogleCodeSitemapGenerator(AbstractSitemapGeneratorOptions<?> options) {
super(options, new Renderer());
}
/** Configures the generator with a base URL and directory to write the sitemap files.
/**
* Configures the generator with a base URL and directory to write the sitemap files.
*
* @param baseUrl All URLs in the generated sitemap(s) should appear under this base URL
* @param baseDir Sitemap files will be generated in this directory as either "sitemap.xml" or "sitemap1.xml" "sitemap2.xml" and so on.
* @throws MalformedURLException
* @throws MalformedURLException Exception
*/
public GoogleCodeSitemapGenerator(String baseUrl, File baseDir)
throws MalformedURLException {
this(new SitemapGeneratorOptions(baseUrl, baseDir));
}
/**Configures the generator with a base URL and directory to write the sitemap files.
/**
* Configures the generator with a base URL and directory to write the sitemap files.
*
* @param baseUrl All URLs in the generated sitemap(s) should appear under this base URL
* @param baseDir Sitemap files will be generated in this directory as either "sitemap.xml" or "sitemap1.xml" "sitemap2.xml" and so on.
@@ -35,17 +38,21 @@ public class GoogleCodeSitemapGenerator extends SitemapGenerator<GoogleCodeSitem
this(new SitemapGeneratorOptions(baseUrl, baseDir));
}
/**Configures the generator with a base URL and a null directory. The object constructed
/**
* Configures the generator with a base URL and a null directory. The object constructed
* is not intended to be used to write to files. Rather, it is intended to be used to obtain
* XML-formatted strings that represent sitemaps.
*
* @param baseUrl All URLs in the generated sitemap(s) should appear under this base URL
* @param baseUrl
* @throws MalformedURLException Exception
*/
public GoogleCodeSitemapGenerator(String baseUrl) throws MalformedURLException {
this(new SitemapGeneratorOptions(new URL(baseUrl)));
}
/**Configures the generator with a base URL and a null directory. The object constructed
/**
* Configures the generator with a base URL and a null directory. The object constructed
* is not intended to be used to write to files. Rather, it is intended to be used to obtain
* XML-formatted strings that represent sitemaps.
*
@@ -55,7 +62,8 @@ public class GoogleCodeSitemapGenerator extends SitemapGenerator<GoogleCodeSitem
this(new SitemapGeneratorOptions(baseUrl));
}
/** Configures a builder so you can specify sitemap generator options
/**
* Configures a builder so you can specify sitemap generator options
*
* @param baseUrl All URLs in the generated sitemap(s) should appear under this base URL
* @param baseDir Sitemap files will be generated in this directory as either "sitemap.xml" or "sitemap1.xml" "sitemap2.xml" and so on.
@@ -65,7 +73,8 @@ public class GoogleCodeSitemapGenerator extends SitemapGenerator<GoogleCodeSitem
return new SitemapGeneratorBuilder<GoogleCodeSitemapGenerator>(baseUrl, baseDir, GoogleCodeSitemapGenerator.class);
}
/** Configures a builder so you can specify sitemap generator options
/**
* Configures a builder so you can specify sitemap generator options
*
* @param baseUrl All URLs in the generated sitemap(s) should appear under this base URL
* @param baseDir Sitemap files will be generated in this directory as either "sitemap.xml" or "sitemap1.xml" "sitemap2.xml" and so on.
@@ -18,7 +18,7 @@ public class GoogleCodeSitemapUrl extends WebSitemapUrl {
*/
public enum FileType {
/** A special value meaning that the URL is a compressed archive containing code.
* @see @see <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=75259">Supported archive suffixes</a>
* @see <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=75259">Supported archive suffixes</a>
*/
ARCHIVE("Archive"),
ADA("Ada"),
@@ -19,7 +19,10 @@ public class GoogleMobileSitemapUrl extends WebSitemapUrl {
this(new URL(url));
}
/** Specifies the url */
/**
* Specifies the url
* @param url
*/
public Options(URL url) {
super(url, GoogleMobileSitemapUrl.class);
}
@@ -5,7 +5,8 @@ import java.util.ArrayList;
import java.util.Arrays;
import java.util.Date;
/** One configurable Google Video Search URL. To configure, use {@link Options}
/**
* One configurable Google Video Search URL. To configure, use {@link Options}
*
* @author Dan Fabulich
* @see Options
@@ -30,7 +31,9 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
private final Integer durationInSeconds;
private final String allowEmbed;
/** Options to configure Google Video URLs */
/**
* Options to configure Google Video URLs
*/
public static class Options extends AbstractSitemapUrlOptions<GoogleVideoSitemapUrl, Options> {
private URL playerUrl;
private URL contentUrl;
@@ -49,7 +52,8 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
private Integer durationInSeconds;
private Boolean allowEmbed;
/** Specifies a landing page URL, together with a "player" (e.g. SWF)
/**
* Specifies a landing page URL, together with a "player" (e.g. SWF)
*
* @param url the landing page URL
* @param playerUrl the URL of the "player" (e.g. SWF file)
@@ -61,7 +65,8 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
this.allowEmbed = allowEmbed;
}
/** Specifies a landing page URL, together with the URL of the underlying video (e.g. FLV)
/**
* Specifies a landing page URL, together with the URL of the underlying video (e.g. FLV)
*
* @param url the landing page URL
* @param contentUrl the URL of the underlying video (e.g. FLV)
@@ -71,7 +76,8 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
this.contentUrl = contentUrl;
}
/** Specifies a player URL (e.g. SWF)
/**
* Specifies a player URL (e.g. SWF)
*
* @param playerUrl the URL of the "player" (e.g. SWF file)
* @param allowEmbed when specifying a player, you must specify whether embedding is allowed
@@ -82,7 +88,9 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
return this;
}
/** Specifies the URL of the underlying video (e.g FLV) */
/**
* Specifies the URL of the underlying video (e.g FLV)
*/
public Options contentUrl(URL contentUrl) {
this.contentUrl = contentUrl;
return this;
@@ -102,7 +110,9 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
return this;
}
/** The title of the video. Limited to 100 characters. */
/**
* The title of the video. Limited to 100 characters.
*/
public Options title(String title) {
if (title != null) {
if (title.length() > 100) {
@@ -113,7 +123,9 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
return this;
}
/** The description of the video. Descriptions longer than 2048 characters will be truncated. */
/**
* The description of the video. Descriptions longer than 2048 characters will be truncated.
*/
public Options description(String description) {
if (description != null) {
if (description.length() > 2048) {
@@ -124,7 +136,9 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
return this;
}
/** The rating of the video. The value must be number in the range 0.0-5.0. */
/**
* The rating of the video. The value must be number in the range 0.0-5.0.
*/
public Options rating(Double rating) {
if (rating != null) {
if (rating < 0 || rating > 5.0) {
@@ -135,13 +149,17 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
return this;
}
/** The number of times the video has been viewed */
/**
* The number of times the video has been viewed
*/
public Options viewCount(int viewCount) {
this.viewCount = viewCount;
return this;
}
/** The date the video was first published, in {@link W3CDateFormat}. */
/**
* The date the video was first published, in {@link W3CDateFormat}.
*/
public Options publicationDate(Date publicationDate) {
this.publicationDate = publicationDate;
return this;
@@ -153,7 +171,7 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
* content. A single video could have several tags, although it might
* belong to only one category. For example, a video about grilling food
* may belong in the Grilling category, but could be tagged "steak",
* "meat", "summer", and "outdoor". Create a new <video:tag> element for
* "meat", "summer", and "outdoor". Create a new &lt;video:tag&gt; element for
* each tag associated with a video. A maximum of 32 tags is permitted.
*/
public Options tags(ArrayList<String> tags) {
@@ -167,7 +185,7 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
* content. A single video could have several tags, although it might
* belong to only one category. For example, a video about grilling food
* may belong in the Grilling category, but could be tagged "steak",
* "meat", "summer", and "outdoor". Create a new <video:tag> element for
* "meat", "summer", and "outdoor". Create a new &lt;video:tag&gt; element for
* each tag associated with a video. A maximum of 32 tags is permitted.
*/
public Options tags(Iterable<String> tags) {
@@ -184,7 +202,7 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
* content. A single video could have several tags, although it might
* belong to only one category. For example, a video about grilling food
* may belong in the Grilling category, but could be tagged "steak",
* "meat", "summer", and "outdoor". Create a new <video:tag> element for
* "meat", "summer", and "outdoor". Create a new &lt;video:tag&gt; element for
* each tag associated with a video. A maximum of 32 tags is permitted.
*/
public Options tags(String... tags) {
@@ -208,13 +226,17 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
return this;
}
/** Whether the video is suitable for viewing by children */
/**
* Whether the video is suitable for viewing by children
*/
public Options familyFriendly(boolean familyFriendly) {
this.familyFriendly = familyFriendly;
return this;
}
/** The duration of the video in seconds; value must be between 0 and 28800 (8 hours). */
/**
* The duration of the video in seconds; value must be between 0 and 28800 (8 hours).
*/
public Options durationInSeconds(int durationInSeconds) {
if (durationInSeconds < 0 || durationInSeconds > 28800) {
throw new RuntimeException("Duration must be between 0 and 28800 (8 hours):" + durationInSeconds);
@@ -225,7 +247,8 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
}
/** Specifies a landing page URL, together with a "player" (e.g. SWF)
/**
* Specifies a landing page URL, together with a "player" (e.g. SWF)
*
* @param url the landing page URL
* @param playerUrl the URL of the "player" (e.g. SWF file)
@@ -235,7 +258,8 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
this(new Options(url, playerUrl, allowEmbed));
}
/** Specifies a landing page URL, together with the URL of the underlying video (e.g. FLV)
/**
* Specifies a landing page URL, together with the URL of the underlying video (e.g. FLV)
*
* @param url the landing page URL
* @param contentUrl the URL of the underlying video (e.g. FLV)
@@ -244,7 +268,9 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
this(new Options(url, contentUrl));
}
/** Configures the url with options */
/**
* Configures the url with options
*/
public GoogleVideoSitemapUrl(Options options) {
super(options);
contentUrl = options.contentUrl;
@@ -279,73 +305,96 @@ public class GoogleVideoSitemapUrl extends WebSitemapUrl {
}
/** Retrieves the {@link Options#playerUrl}*/
/**
* Retrieves the {@link Options#playerUrl}
*/
public URL getPlayerUrl() {
return playerUrl;
}
/** Retrieves the {@link Options#contentUrl}*/
/**
* Retrieves the {@link Options#contentUrl}
*/
public URL getContentUrl() {
return contentUrl;
}
/** Retrieves the {@link Options#thumbnailUrl}*/
/**
* Retrieves the {@link Options#thumbnailUrl}
*/
public URL getThumbnailUrl() {
return thumbnailUrl;
}
/** Retrieves the {@link Options#title}*/
/**
* Retrieves the {@link Options#title}
*/
public String getTitle() {
return title;
}
/** Retrieves the {@link Options#description}*/
/**
* Retrieves the {@link Options#description}
*/
public String getDescription() {
return description;
}
/** Retrieves the {@link Options#rating}*/
/**
* Retrieves the {@link Options#rating}
*/
public Double getRating() {
return rating;
}
/** Retrieves the {@link Options#viewCount}*/
/**
* Retrieves the {@link Options#viewCount}
*/
public Integer getViewCount() {
return viewCount;
}
/** Retrieves the {@link Options#publicationDate}*/
/**
* Retrieves the {@link Options#publicationDate}
*/
public Date getPublicationDate() {
return publicationDate;
}
/** Retrieves the {@link Options#tags}*/
/**
* Retrieves the {@link Options#tags}
*/
public ArrayList<String> getTags() {
return tags;
}
/** Retrieves the {@link Options#category}*/
/**
* Retrieves the {@link Options#category}
*/
public String getCategory() {
return category;
}
/** Retrieves whether the video is {@link Options#familyFriendly}*/
/**
* Retrieves whether the video is {@link Options#familyFriendly}
*/
public String getFamilyFriendly() {
return familyFriendly;
}
/** Retrieves the {@link Options#durationInSeconds}*/
/**
* Retrieves the {@link Options#durationInSeconds}
*/
public Integer getDurationInSeconds() {
return durationInSeconds;
}
/** Retrieves whether embedding is allowed */
/**
* Retrieves whether embedding is allowed
*/
public String getAllowEmbed() {
return allowEmbed;
}
}
@@ -12,8 +12,10 @@ import java.util.ArrayList;
import java.util.List;
import java.util.zip.GZIPOutputStream;
abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGenerator<U,THIS>> {
/** 50000 URLs per sitemap maximum */
abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGenerator<U, THIS>> {
/**
* 50000 URLs per sitemap maximum
*/
public static final int MAX_URLS_PER_SITEMAP = 50000;
private final URL baseUrl;
@@ -47,30 +49,33 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
gzip = options.gzip;
this.renderer = renderer;
if(options.suffixStringPattern != null && !options.suffixStringPattern.isEmpty()) {
if (options.suffixStringPattern != null && !options.suffixStringPattern.isEmpty()) {
fileNameSuffix = gzip ? options.suffixStringPattern + ".xml.gz" : options.suffixStringPattern + ".xml";
}
else {
} else {
fileNameSuffix = gzip ? ".xml.gz" : ".xml";
}
}
/** Add one URL of the appropriate type to this sitemap.
/**
* Add one URL of the appropriate type to this sitemap.
* If we have reached the maximum number of URLs, we'll throw an exception if {@link #allowMultipleSitemaps} is false,
* or else write out one sitemap immediately.
*
* @param url the URL to add to this sitemap
* @return this
*/
public THIS addUrl(U url) {
if (finished) throw new RuntimeException("Sitemap already printed; you must create a new generator to make more sitemaps");
if (finished)
throw new RuntimeException("Sitemap already printed; you must create a new generator to make more sitemaps");
UrlUtils.checkUrl(url.getUrl(), baseUrl);
if (urls.size() == maxUrls) {
if (!allowMultipleSitemaps) throw new RuntimeException("More than " + maxUrls + " urls, but allowMultipleSitemaps is false. Enable allowMultipleSitemaps to split the sitemap into multiple files with a sitemap index.");
if (!allowMultipleSitemaps)
throw new RuntimeException("More than " + maxUrls + " urls, but allowMultipleSitemaps is false. Enable allowMultipleSitemaps to split the sitemap into multiple files with a sitemap index.");
if (baseDir != null) {
if (mapCount == 0) mapCount++;
try {
writeSiteMap();
} catch(IOException ex) {
} catch (IOException ex) {
throw new RuntimeException("Closing of stream failed.", ex);
}
mapCount++;
@@ -81,9 +86,11 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
return getThis();
}
/** Add multiple URLs of the appropriate type to this sitemap, one at a time.
/**
* Add multiple URLs of the appropriate type to this sitemap, one at a time.
* If we have reached the maximum number of URLs, we'll throw an exception if {@link #allowMultipleSitemaps} is false,
* or write out one sitemap immediately.
*
* @param urls the URLs to add to this sitemap
* @return this
*/
@@ -92,9 +99,11 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
return getThis();
}
/** Add multiple URLs of the appropriate type to this sitemap, one at a time.
/**
* Add multiple URLs of the appropriate type to this sitemap, one at a time.
* If we have reached the maximum number of URLs, we'll throw an exception if {@link #allowMultipleSitemaps} is false,
* or write out one sitemap immediately.
*
* @param urls the URLs to add to this sitemap
* @return this
*/
@@ -103,9 +112,11 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
return getThis();
}
/** Add multiple URLs of the appropriate type to this sitemap, one at a time.
/**
* Add multiple URLs of the appropriate type to this sitemap, one at a time.
* If we have reached the maximum number of URLs, we'll throw an exception if {@link #allowMultipleSitemaps} is false,
* or write out one sitemap immediately.
*
* @param urls the URLs to add to this sitemap
* @return this
*/
@@ -114,9 +125,11 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
return getThis();
}
/** Add one URL of the appropriate type to this sitemap.
/**
* Add one URL of the appropriate type to this sitemap.
* If we have reached the maximum number of URLs, we'll throw an exception if {@link #allowMultipleSitemaps} is false,
* or else write out one sitemap immediately.
*
* @param url the URL to add to this sitemap
* @return this
*/
@@ -130,9 +143,11 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
}
}
/** Add multiple URLs of the appropriate type to this sitemap, one at a time.
/**
* Add multiple URLs of the appropriate type to this sitemap, one at a time.
* If we have reached the maximum number of URLs, we'll throw an exception if {@link #allowMultipleSitemaps} is false,
* or write out one sitemap immediately.
*
* @param urls the URLs to add to this sitemap
* @return this
*/
@@ -141,9 +156,11 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
return getThis();
}
/** Add one URL of the appropriate type to this sitemap.
/**
* Add one URL of the appropriate type to this sitemap.
* If we have reached the maximum number of URLs, we'll throw an exception if {@link #allowMultipleSitemaps} is false,
* or write out one sitemap immediately.
*
* @param url the URL to add to this sitemap
* @return this
*/
@@ -159,16 +176,19 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
@SuppressWarnings("unchecked")
THIS getThis() {
return (THIS)this;
return (THIS) this;
}
/** Write out remaining URLs; this method can only be called once. This is necessary so we can keep an accurate count for {@link #writeSitemapsWithIndex()}.
/**
* Write out remaining URLs; this method can only be called once. This is necessary so we can keep an accurate count for {@link #writeSitemapsWithIndex()}.
*
* @return a list of files we wrote out to disk
*/
public List<File> write() {
if (finished) throw new RuntimeException("Sitemap already printed; you must create a new generator to make more sitemaps");
if (!allowEmptySitemap && urls.isEmpty() && mapCount == 0) throw new RuntimeException("No URLs added, sitemap would be empty; you must add some URLs with addUrls");
if (finished)
throw new RuntimeException("Sitemap already printed; you must create a new generator to make more sitemaps");
if (!allowEmptySitemap && urls.isEmpty() && mapCount == 0)
throw new RuntimeException("No URLs added, sitemap would be empty; you must add some URLs with addUrls");
try {
writeSiteMap();
} catch (IOException ex) {
@@ -183,6 +203,7 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
* Each string in the list is a formatted list of URLs.
* We return a list because the URLs may not all fit --
* google specifies a maximum of 50,000 URLs in one sitemap.
*
* @return a list of XML-formatted strings
*/
public List<String> writeAsStrings() {
@@ -223,6 +244,8 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
/**
* After you've called {@link #write()}, call this to generate a sitemap index of all sitemaps you generated.
*
* @return
*/
public String writeSitemapsWithIndexAsString() {
return prepareSitemapIndexGenerator(null).writeAsString();
@@ -257,7 +280,7 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
} else {
fileNamePrefix = this.fileNamePrefix;
}
File outFile = new File(baseDir, fileNamePrefix+fileNameSuffix);
File outFile = new File(baseDir, fileNamePrefix + fileNameSuffix);
outFiles.add(outFile);
OutputStreamWriter out = null;
@@ -279,7 +302,7 @@ abstract class SitemapGenerator<U extends ISitemapUrl, THIS extends SitemapGener
} catch (SAXException e) {
throw new RuntimeException("Sitemap file failed to validate (bug?)", e);
} finally {
if(out != null) {
if (out != null) {
out.close();
}
}
@@ -32,7 +32,7 @@ import java.util.TimeZone;
* <li>MILLISECOND: YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
* </ol>
*
* Note that W3C timezone designators (TZD) are either the letter "Z" (for GMT) or a pattern like "+00:30" or "-08:00". This is unlike
* <p>Note that W3C timezone designators (TZD) are either the letter "Z" (for GMT) or a pattern like "+00:30" or "-08:00". This is unlike
* RFC 822 timezones generated by SimpleDateFormat, which omit the ":" like this: "+0030" or "-0800".</p>
*
* <p>This class allows you to either specify which format pattern to use, or (by default) to
@@ -1,111 +0,0 @@
<html><head><title>How to use SitemapGen4j</title></head>
<body>
<h1>How to use SitemapGen4j</h1>
SitemapGen4j is a library to generate XML sitemaps in Java.
<h2>What's an XML sitemap?</h2>
Quoting from <a href="http://sitemaps.org/index.php">sitemaps.org</a>:
<blockquote><p>Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.</p>
<p>Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.</p>
<p>Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.</p>
</blockquote>
<h2>Getting started</h2>
<p>The easiest way to get started is to just use the WebSitemapGenerator class, like this:
<pre name="code" class="java">WebSitemapGenerator wsg = new WebSitemapGenerator("http://www.example.com", myDir);
wsg.addUrl("http://www.example.com/index.html"); // repeat multiple times
wsg.write();</pre>
<h2>Configuring options</h2>
But there are a lot of nifty options available for URLs and for the generator as a whole. To configure the generator, use a builder:
<pre name="code" class="java">WebSitemapGenerator wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
.gzip(true).build(); // enable gzipped output
wsg.addUrl("http://www.example.com/index.html");
wsg.write();</pre>
To configure the URLs, construct a WebSitemapUrl with WebSitemapUrl.Options.
<pre name="code" class="java">WebSitemapGenerator wsg = new WebSitemapGenerator("http://www.example.com", myDir);
WebSitemapUrl url = new WebSitemapUrl.Options("http://www.example.com/index.html")
.lastMod(new Date()).priority(1.0).changeFreq(ChangeFreq.HOURLY).build();
// this will configure the URL with lastmod=now, priority=1.0, changefreq=hourly
wsg.addUrl(url);
wsg.write();</pre>
<h2>Configuring the date format</h2>
One important configuration option for the sitemap generator is the date format. The <a href="http://www.w3.org/TR/NOTE-datetime">W3C datetime standard</a> allows you to choose the precision of your datetime (anything from just specifying the year like "1997" to specifying the fraction of the second like "1997-07-16T19:20:30.45+01:00"); if you don't specify one, we'll try to guess which one you want, and we'll use the default timezone of the local machine, which might not be what you prefer.
<pre name="code" class="java">
// Use DAY pattern (2009-02-07), Greenwich Mean Time timezone
W3CDateFormat dateFormat = new W3CDateFormat(Pattern.DAY);
dateFormat.setTimeZone(TimeZone.getTimeZone("GMT"));
WebSitemapGenerator wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
.dateFormat(dateFormat).build(); // actually use the configured dateFormat
wsg.addUrl("http://www.example.com/index.html");
wsg.write();</pre>
<h2>Lots of URLs: a sitemap index file</h2>
One sitemap can contain a maximum of 50,000 URLs. (Some sitemaps, like Google News sitemaps, can contain only 1,000 URLs.) If you need to put more URLs than that in a sitemap, you'll have to use a sitemap index file. Fortunately, WebSitemapGenerator can manage the whole thing for you.
<pre name="code" class="java">WebSitemapGenerator wsg = new WebSitemapGenerator("http://www.example.com", myDir);
for (int i = 0; i &lt; 60000; i++) wsg.addUrl("http://www.example.com/doc"+i+".html");
wsg.write();
wsg.writeSitemapsWithIndex(); // generate the sitemap_index.xml
</pre>
<p>That will generate two sitemaps for 60K URLs: sitemap1.xml (with 50K urls) and sitemap2.xml (with the remaining 10K), and then generate a sitemap_index.xml file describing the two.</p>
<p>It's also possible to carefully organize your sub-sitemaps. For example, it's recommended to group URLs with the same changeFreq together (have one sitemap for changeFreq "daily" and another for changeFreq "yearly"), so you can modify the lastMod of the daily sitemap without modifying the lastMod of the yearly sitemap. To do that, just construct your sitemaps one at a time using the WebSitemapGenerator, then use the SitemapIndexGenerator to create a single index for all of them.</p>
<pre name="code" class="java">WebSitemapGenerator wsg;
// generate foo sitemap
wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
.fileNamePrefix("foo").build();
for (int i = 0; i &lt; 5; i++) wsg.addUrl("http://www.example.com/foo"+i+".html");
wsg.write();
// generate bar sitemap
wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
.fileNamePrefix("bar").build();
for (int i = 0; i &lt; 5; i++) wsg.addUrl("http://www.example.com/bar"+i+".html");
wsg.write();
// generate sitemap index for foo + bar
SitemapIndexGenerator sig = new SitemapIndexGenerator("http://www.example.com", myFile);
sig.addUrl("http://www.example.com/foo.xml");
sig.addUrl("http://www.example.com/bar.xml");
sig.write();</pre>
<p>You could also use the SitemapIndexGenerator to incorporate sitemaps generated by other tools. For example, you might use Google's official Python sitemap generator to generate some sitemaps, and use WebSitemapGenerator to generate some sitemaps, and use SitemapIndexGenerator to make an index of all of them.</p>
<h2>Validate your sitemaps</h2>
<p>SitemapGen4j can also validate your sitemaps using the official XML Schema Definition (XSD). If you used SitemapGen4j to make the sitemaps, you shouldn't need to do this unless there's a bug in our code. But you can use it to validate sitemaps generated by other tools, and it provides an extra level of safety.</p>
<p>It's easy to configure the WebSitemapGenerator to automatically validate your sitemaps right after you write them (but this does slow things down, naturally).</p>
<pre name="code" class="java">WebSitemapGenerator wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
.autoValidate(true).build(); // validate the sitemap after writing
wsg.addUrl("http://www.example.com/index.html");
wsg.write();</pre>
<p>You can also use the SitemapValidator directly to manage sitemaps. It has two methods: validateWebSitemap(File f) and validateSitemapIndex(File f).</p>
<h2>Google-specific sitemaps</h2>
<p>Google can understand a wide variety of custom sitemap formats that they made up, including a Mobile sitemaps, Geo sitemaps, Code sitemaps (for Google Code search), Google News sitemaps, and Video sitemaps. SitemapGen4j can generate any/all of these different types of sitemaps.</p>
<p>To generate a special type of sitemap, just use GoogleMobileSitemapGenerator, GoogleGeoSitemapGenerator, GoogleCodeSitemapGenerator, GoogleCodeSitemapGenerator, GoogleNewsSitemapGenerator, or GoogleVideoSitemapGenerator instead of WebSitemapGenerator.</p>
<p>You can't mix-and-match regular URLs with Google-specific sitemaps, so you'll also have to use a GoogleMobileSitemapUrl, GoogleGeoSitemapUrl, GoogleCodeSitemapUrl, GoogleNewsSitemapUrl, or GoogleVideoSitemapUrl instead of a WebSitemapUrl. Each of them has unique configurable options not available to regular web URLs.</p>
</body>
</html>