Java jSoup: parse complicated xml tags

Last time I needed to parse complicated xml tags and came to the solution, described in “Android: RSS reader with complicated xml
This time, I used jSoup library and my solution became much more simple and accurate.

public void getData(String keyword, ArrayList<ToldotItem> list) {
        String urlToRssFeed = keyword;
        Log.d(Helpers.TAG, urlToRssFeed);

        org.jsoup.nodes.Document doc = null;
        try {
            doc =  Jsoup.connect(urlToRssFeed)

        } catch (IOException e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.

        if (doc != null)
            ToldotItem item = null;

            String cssPath = "item";
            Elements links =;

            for (org.jsoup.nodes.Element link : links) {
                Log.e(Helpers.TAG, link.text());

                org.jsoup.nodes.Document docInner = Jsoup.parse(link.outerHtml());
                item = new ToldotItem();
                Elements linksInner ="title");
                   item.title =    linksInner.text();

                    linksInner ="pubDate");
                    item.pubDate  = linksInner.text();

                    linksInner ="link");
            = linksInner.text();

                    linksInner ="description");
                    item.description  = linksInner.text();

                    linksInner ="guid");
                    item.guid  = linksInner.text();

                    linksInner ="author");
            = linksInner.text();

                    linksInner ="media|thumbnail ");
                    item.thumbnail  = linksInner.attr("url");

                    linksInner ="media|content[medium=video]");
                if (linksInner.size()>0)
                    Log.d(Helpers.TAG, "links: "+linksInner.size());
            = linksInner.attr("url");
                    item.videoFileSize  = linksInner.attr("fileSize");
                    item.videoDuration  = linksInner.attr("duration");
                    linksInner ="media|content[medium=audio]");
                if (linksInner.size()>0)
            = linksInner.attr("url");
                    item.audioFileSize  = linksInner.attr("fileSize");
                    item.audioDuration  = linksInner.attr("duration");


as you see tags like “<media:content>” are parsed by searching xpath “media|content”

1 comment

Leave a Reply

%d bloggers like this: