Skip to content

alekseysotnikov/buran

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Buran (meaning "Snowstorm" or "Blizzard") was the first spaceplane to be produced as part of the Soviet/Russian Buran programme. Wikipedia

Buran 🌀

Parse and generate RSS/Atom feeds in Clojure

Clojars Project CircleCI codecov CodeScene System Mastery CodeScene Code Health

Buran is a bidirectional feed library: parse any RSS/Atom feed into Clojure data structures, transform them with standard functions, and produce feeds in any format. Built on ROME Tools with a data-driven approach.

Buran can be used as an aggregator for various feed formats, converting them into regular Clojure data structures. When consuming a feed, Buran creates a map, which can be read or manipulated using regular functions such as filter, sort, assoc, dissoc, and more. After the modifications, Buran can generate your own feed, for example, in a different format (RSS 2.0, 1.0, 0.9x or Atom 1.0, 0.3).

Quick Start

;; Add to deps.edn
{:deps {buran/buran {:mvn/version "0.1.4"}}}

;; Or to project.clj
[buran "0.1.4"]

;; In your namespace
(ns your.app
  (:require [buran.core :as buran]))

;; Parse a feed
(def data (buran/consume-http "https://stackoverflow.com/feeds/tag?tagnames=clojure"))

;; Generate a feed
(buran/produce {:info {:feed-type "atom_1.0" :title "My Feed"}
                :entries [{:title "Hello" :description {:value "World"}}]})

Usage

Regardless of the feed format you are working with and whether you want to consume or produce a new feed, Buran uses the same data structure every time. Buran's API is concise, with functions such as consume, consume-http, produce, and some helpers to manipulate feeds, including combine-feeds, filter-entries, sort-entries-by and shrink. The basic workflow involves passing the data structure to the API functions repeatedly. See the documentation for Various options and details.

examples

Consume a feed from String

(def feed "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
           <feed xmlns=\"http://www.w3.org/2005/Atom\">
             <title>Feed title</title>
             <subtitle />
             <entry>
               <title>Entry title</title>
               <author>
                 <name />
               </author>
               <summary>entry description</summary>
             </entry>
           </feed>
           ")
(shrink (consume feed))
=>
{:info    {:feed-type "atom_1.0", 
           :title     "Feed title"},
 :entries [{:title       "Entry title", 
            :description {:value "entry description"}}]}

Produce a feed

(def feed {:info {:feed-type "atom_1.0"
                  :title     "Feed title"}
           :entries [{:title       "Entry title"
                      :description {:value "entry description"}}]})
(produce feed)
=>
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
 <feed xmlns=\"http://www.w3.org/2005/Atom\">\r
   <title>Feed title</title>\r
   <subtitle />\r
   <entry>\r
     <title>Entry title</title>\r
     <author>\r
       <name />\r
     </author>\r
     <summary>entry description</summary>\r
   </entry>\r
 </feed>
 "

Consume a feed over http

(consume-http "https://stackoverflow.com/feeds/tag?tagnames=clojure")
=>
{:info {...},
 :entries [...],
 :foreign-markup [...]}

Shrink a feed (remove nils, empty colls, maps and etc.)

(shrink (consume-http "https://stackoverflow.com/feeds/tag?tagnames=clojure"))
=>
{:info {:description "most recent 30 from stackoverflow.com",
        :feed-type "atom_1.0",
        :published-date #inst"2018-08-20T08:03:33.000-00:00",
        :title "Active questions tagged clojure - Stack Overflow",
        :link "https://stackoverflow.com/questions/tagged/?tagnames=clojure&sort=active",
        :uri "https://stackoverflow.com/feeds/tag?tagnames=clojure",
        :links [{:href "https://stackoverflow.com/questions/tagged/?tagnames=clojure&sort=active",
                 :type "text/html",
                 :rel "alternate",
                 :length 0}, ...]},
 :entries [{:description {:type "html", :value "<p>..."},
            :updated-date #inst"2018-08-20T06:16:12.000-00:00",
            :foreign-markup [...],
            :published-date #inst"2018-08-20T05:54:39.000-00:00",
            :title "Clojure evaluate lazy sequence",
            :author "Constantine",
            :categories [{:name "clojure", :taxonomy-uri "https://stackoverflow.com/tags"}, ...],
            :link "https://stackoverflow.com/questions/51924808/clojure-evaluate-lazy-sequence",
            :uri "https://stackoverflow.com/q/51924808",
            :authors [{:name "Constantine", :uri "https://stackoverflow.com/users/4201205"}],
            :links [{:href "https://stackoverflow.com/questions/51924808/clojure-evaluate-lazy-sequence",
                     :rel "alternate",
                     :length 0}]}, ...],
 :foreign-markup [...]}

Supported Formats

Format Parse Generate Notes
Atom 1.0 Full support
Atom 0.3 Legacy
RSS 2.0 Most common
RSS 1.0 RDF-based
RSS 0.9x Various variants
RSS 0.9 Original

Basic API Reference

consume

Parse a feed from string, file, reader, or other sources.

;; Shortcut
(consume "<?xml version=\"1.0\"?><feed>...</feed>")

;; With options
(consume {:from             (java.io.File. "~/feed.xml") 
                                        ; String, File, Reader, W3C DOM document, JDOM document, W3C SAX InputSource
          :validate         false       ; Indicates if the input should be validated
          :locale           (Locale/US) ; java.util.Locale
          :xml-healer-on    true        ; Healing trims leading chars from the stream (empty spaces and comments) until the XML prolog.
                                        ; Healing resolves HTML entities (from literal to code number) in the reader.
                                        ; The healing is done only with the File and Reader.
          :allow-doctypes   false       ; You should only activate it when the feeds that you process are absolutely trustful
          :throw-exception  false       ; false - return map with an exception, throw an exception otherwise
         })
Option Type Default Description
:from String, File, Reader, InputStream, W3C DOM, JDOM, SAX InputSource required Source to parse
:validate boolean false Validate XML against DTD/schema
:locale java.util.Locale (Locale/US) Locale for parsing
:xml-healer-on boolean true Trim whitespace/comments before XML prolog; resolve HTML entities
:allow-doctypes boolean false Allow DOCTYPE declarations (⚠️ security risk - only for trusted sources)
:throw-exception boolean false If false, return error map; if true, throw exception

consume-http

Fetch and parse a feed over HTTP.

;; Shortcut
(consume-http "https://example.com/feed.xml")

;; With options
(consume-http {:from             "https://stackoverflow.com/feeds/tag?tagnames=clojure" 
                                                      ; <http url string>, URL, File, InputStream
               :headers          {"X-Header" "Value"} ; Request's HTTP headers map
               :lenient          true                 ; Indicates if the charset encoding detection should be relaxed
               :default-encoding "US-ASCII"           ; Supports: UTF-8, UTF-16, UTF-16BE, UTF-16LE, CP1047, US-ASCII
               ... 
               + All options applied to a (consume) call.
              })
Option Type Default Description
:from String URL, java.net.URL, File, InputStream required URL or source to fetch
:headers map {} HTTP headers (e.g., {"User-Agent" "MyApp"})
:lenient boolean true Relaxed charset encoding detection
:default-encoding String "US-ASCII" Fallback encoding: UTF-8, UTF-16, UTF-16BE, UTF-16LE, CP1047, US-ASCII
:content-type String nil Override Content-Type header (used with InputStream)

Beware! consume-http from either http url string or URL is rudimentary and works only for simplest cases. For instance, it does not follow HTTP 302 redirects. Please consider using a separate library like clj-http or http-kit for fetching the feed.

produce

Generate RSS/Atom feed as string, file, or DOM.

(produce {:feed            {:info {:feed-type "atom_1.0" ; Supports: atom_1.0, atom_0.3, rss_2.0, 
                                                         ; rss_1.0, rss_0.94, rss_0.93, rss_0.92, 
                                                         ; rss_0.91U (Userland), rss_0.91N (Netscape), 
                                                         ; rss_0.9
                                   :title "Feed title"}
                            :entries [{:title       "Entry 1 title"
                                       :description {:value "entry description"}}]
                            :foreign-markup nil}

          :to              :string ; <file path string>, :string, :w3cdom, :jdom, File, Writer
          :pretty-print    true    ; Pretty-print XML output
          :throw-exception false   ; false - return map with an exception, throw an exception otherwise
         })
Option Type Default Description
:feed map nil (uses argument as feed) Feed data structure to generate
:to :string, :w3cdom, :jdom, String (file path), File, Writer :string Output destination
:pretty-print boolean true Pretty-print XML output
:throw-exception boolean false If false, return error map; if true, throw exception

shrink

Remove nil values and empty collections from feed data.

(shrink feed)

License

Copyright © 2018-2026 Aleksei Sotnikov

Distributed under the MIT License