Thursday, 5 September 2013

JEE7. Real time browser output. Glassfish 4

JEE7. Real time browser output. Glassfish 4

I'm designing a web crawler. A main view consists of two parts: - a list
of seed pages - a queue
You can add multiple seed-pages from a list into queue, and fire crawling.
Here how it looks:

What I'm trying to achieve:
when the crawling ends - the entry disappears from the queue automatically
(without user integration)
the seed-pages list is also updated (also in real time)
My ideas, how to solve that issue:
HTTP Events
Polling (p:poll from Primefaces)
WebSockets
In the endpoint the application have 2 main classes: Net and Spider. Net
stores information about queue, and performs operations such as running
spiders, adding page to queue, analyzing the results given from each
spider. Spider just performs web crawling on specific domain and fires
results to Net via CDI when finished.
Net.java:
@Startup
@Singleton
public class Net {
private Set<SeedPage> seedPages = Collections.synchronizedSet(new
HashSet<SeedPage());
@Inject
Spider spider;
// ....
public void startCrawling() {
System.out.println("[NET] Starting crawling " + seedPages.size() + "
pages...");
for (SeedPage seedPage : seedPages) {
spider.crawl(seedPage.getUrl() );
}
}
public void onCrawlingResult(@Observes(during =
TransactionPhase.AFTER_SUCCESS) @ComputationResults CrawlingResults
results) {
System.out.println("[NET] Crawling success.");
System.out.println("[NET] Analyzing data.");
// analyze & persist collected data
removeSeedPageFromQueue(seedPage);
System.out.println("[NET] Done ...");
}
public void onCrawlingFailure(@Observes(during =
TransactionPhase.AFTER_FAILURE) @ComputationResults CrawlingResults
results) {
System.out.println("[NET] Crawling failure...");
// handle failure
}
Spider.java:
@Stateless
public class Spider {
@Inject @ComputationResults
Event<CrawlingResults> results;
@Asynchronous
public void crawl(String pageUrl) {
// crawling process
results.fire(crawlingResults);
}
And the corresponding presentation layer:
@Named
@ApplicationScoped
public class ComputationStatus {
@Inject
Net net;
public Net getNet() {
return net;
}
public String startCrawling() {
net.startCrawling();
return null;
}
}
And a snippet from JSF facelet template (commonQueue.xhtml):
<h5>Queue</h5>
<h:form>
<h:dataTable value="#{computationStatus.net.seedPages}" var="seed"
rendered="#{not empty computationStatus.net.seedPages}"
styleClass="table .table-condensed">
<h:column>#{seed.url}</h:column>
<h:column>
<h:commandButton
action="#{seedPageController.removeSeedPageFromQueue(seed)}"
value="Remove" styleClass="btn btn-danger btn-mini"/>
</h:column>
</h:dataTable>
<h:commandButton action="#{computationStatus.startCrawling}"
value="Crawl" rendered="#{not empty computationStatus.net.seedPages}"
styleClass="btn btn-primary"/>
&nbsp;&nbsp;
<h:commandButton
action="#{seedPageController.removeAllSeedPagesFromQueue}"
value="Remove All" rendered="#{not empty
computationStatus.net.seedPages}" styleClass="btn btn-danger
btn-small"/>
</h:form>
What do you think about such architecture? How could it be improved? I'm
thinking if giving a direct access to singleton bean through application
scoped CDI bean is a good idea?
Also, how to implement real time monitoring status? I think the easiest
way would be to use p:poll from Primefaces. But which method should be
listened to for changes? getSeedPages() from Net or something from
Controller?
Thanks for any thoughts.

No comments:

Post a Comment