PajekConvert1.1ReadMe.txt ---------------------------- | PajekConverter v1.2 | ---------------------------- Documentation Aug 3, 2001 Written by Skye Bender-deMoll skyebend@santafe.edu skyebend@bennington.edu http://student.bennington.edu/~skyebend Written for John Padgett, University of Chicago PajekConverter is a basic utility (written in Java) for converting tab-delineated text files into a format readable by the network analysis and visualization software Pajek. Pajek is Windows-based freeware, written by Vladimir Batagelj and Andrej Mrvar,University of Ljubljana,Slovenia downloadable from: http://vlado.fmf.uni-lj.si/pub/networks/pajek/ 0. CHANGES in PajekConverter 1.2 - Added ability to read in text files rather than pasting into window CHANGES in PajekConverter 1.1: - Added limited ability to process timecode (edges only) - Added support for all 78 Pajek colors - Made internal parsing routines more robust - Added font information to *.net file which overrides Pajek's options (Pajek was inconsistent about modifying font info for EPS) 1. REQUIREMENTS For PajekConverter to run, you must have a Java Virtual Machine installed on your computer. PajekConverter was written and tested in Java 1.3, but it should run ok under earlier versions as well. If you do not have Java installed, download it from: http://java.sun.com/j2se/1.3/ Because Pajek only runs under Windows, I have only tested PajekConverter under Windows. Theoretically, it is completely cross platform. However, Pajek will only read files with PC line breaks (\r), so if you run PajekConverter under UNIX, you may have to convert the output files before Pajek will read them. 2. INSTALLATION Copy the file "PajekConverter.jar" to whatever directory you would like to run it from. The *.jar file contains all of the Java class files in a compressed form, and will launch the program when double clicked. Alternatively, it can be launched from the command line by typing the command "PajekConverter.jar" from within its directory. 3. USING PAJEKCONVERTER - Start PajekConverter by double clicking its icon, or from the command line (see above). - The PajekConverter application should appear. It consists of two main windows, (a data input window, and a status window), a group of fields for setting column assignments, and five command buttons. - Paste tab-deliniated text from a spreadsheet or text filed into the large top window. (see Expected Input Format below)* If the first row contains a list of column headings rather than data to be parsed, make sure the "Ignore first row" checkbox under the input window is checked. - Enter column assignments in the fields on the right. (see Expected Input Format below) Each field takes an number which tells the program which column of the data is to be converted to the attribute. "1" corresponds to the first column, "2" to the second, etc. "ID" and "LinkID" must be specified, all others can be left as "0" (default). - To enter a specific attribute to be used for all nodes, enter the value in the appropriate column assignment field, prefixed by the "$" symbol. (The idea is the same as absolute as compared to relative addresses in spreadsheets.) Examples: enter"$ellipse" in the NodeShape field or enter"$Red" in the NodeColor or ArcColor fields or enter"$0.5" in the NodeSize field. Note: values are case sensitive, for a list of accepted colors and shapes, see below. - If you data include timing information for edge creation, check the "Process Timecode" option. - Click the "Parse Text" button. If "Process timecode" is checked, a dialog will appear for entering the timecode column assignments. The program will then parse the text from the input window, reporting its status and any errors to the status (bottom) window. This should be fairly fast (4000 edges in about 5 seconds on this gateway machine) - When parsing is complete, click one of the export buttons - "Export network" will bring up a save dialog asking you where the network file should be saved. It reports how many nodes were exported to the status window. - "Export partition" will bring up a window asking which node class attribute you would like to export. Choose one and click ok. It will then bring up a save dialog asking you where the partition file should be saved. - "Export vector" will bring up a window asking which numeric node variable you would like to export. Choose one and click ok. It will then bring up a save dialog asking you where the vector file should be saved. - "Reset/Clear network" will delete the contents of windows, and clear all of the internal variables, discarding all of the previously parsed network data. - To close the program, click on the close box in the upper right hand corner. 4. EXPECTED INPUT FORMAT PajekConverter expects data to be in a modified "arc list" format. Each record (row) defines one arc (directed edge). The arc is directed from the node with the id determined by the ID column to the node determined by the LinkId column. If there is no linkID node, no arc is drawn. The attributes in the remaining columns are associated with the ID node, or with the arc, as appropriate. If an ID occurs more than once, (multiple arcs issuing from one node), the node attributes from the first encountered record are used. PajekConverter requires that each column be separated by a tab character (\t), which is the default format when pasting form a spreadsheet. The function and expected format of each of the column mappings are as follows: ID: The id of the node. - Any string (or number) which is a unique identifier of a node is valid. - Ids need not occur in any particular order, and do not have to corresponded to PajekIds or labels. - A node is created for each unique entry in this column. - Repeat instances of the id define additional edges from the node. LinkId: The id of the node to which the arc connects. - Any string (or number) which occurs in the ID column is a valid LinkId. - An error will appear if a LinkId is found which does not correspond to the id of a node. NodeLabel: The text which will appear as the label in Pajek. - Any string or number is valid NodeShape: Defines the shape as it will appear in Pajek (and EPS or SVG files) - Any string or number is valid. - Entries will be mapped to shape classes in the order they appear in the input file. - A key describing the mappings will be output to the status window. - If a shape name is found, the shape will be used for the corresponding node - Pajek currently seems to accept a maximum of four shapes: "ellipse" "box" "diamond" "cross" The cross shape will only show up when network images are exported, and it also has a fixed size. - If more than four shapes or shape classes occur in the file, the default (ellipse) will be used FOR ALL CLASSES. NodeSize: Controls the size of the node as it appears in Pajek. - Any number is valid - If an entry is found which is not a number, an error will be displayed, and the default size (1) will be used for that node. - Sizes of nodes will only be displayed if Options/Mark Vertices Using/Real sizes on/off is selected. - Sizes of nodes when exported by Pajek may be modified by other settings in Pajek. (see Pajek Notes below) NodeColor: Controls the color of the node as it will appear when exported from Pajek, and as it will appear in Pajek after the Draw/Draw command. - Any string or number is valid. - Entries will be mapped to color classes in the order they appear in the input file. - A key describing the mappings will be output to the status window. - If a color name is found, the color will be used for the corresponding node. - Although Pajek currently accepts 78 different color names, many of the colors are indistinguishable. PajekConverter now accepts and uses 78 anyway. (see file CRAYOLA.PDF in the Pajek documentation for color chips) Blue ForestGreen Orchid Green Fuchsia Peach Red Goldenrod PineGreen Yellow GreenYellow Pink BurntOrange JungleGreen Plum Purple LFadedGreen ProcessBlue Brown LSkyBlue RawSienna Tan Lavender RedOrange Gray LightCyan RedViolet Black LightGreen Rhodamine White LightMagenta RoyalBlue Apricot LightOrange RoyalPurple Aquamarine LightPurple RubineRed Bittersweet LightYellow Salmon BlueGreen LimeGreen SeaGreen BlueViolet Magenta Sepia BrickRed Mahogany SkyBlue CadetBlue Maroon SpringGreen Canary Melon TealBlue CarnationPink MidnightBlue Thistle Cerulean Mulberry Turquoise CornflowerBlue NavyBlue Violet Cyan OliveGreen VioletRed Dandelion Orange WildStrawberry DarkOrchid Periwinkle YellowGreen Emerald OrangeRed YellowOrange - If more than 78 colors or color classes occur in the file, the default (Blue) will be used FOR ALL CLASSES. - Node colors will only appear on screen in Pajek if Options/Colors/Vertices/As defined in input file is selected. ?- Node colors will always override class colors when exporting from Pajek to EPS or SVG. Xcoord: The x coordinate at which the node will be drawn. - Any number is valid - If numbers larger than 1.0 are used, the node may not appear on screen in Pajek until coordinates are re-scaled to fit the screen. - If an entry is found which is not a number, the default value (1.000) will be used for that node. Ycoord: The y coordinate at which the node will be drawn. - same as Xcoord ArcColor: The color of the arc that will be drawn in Pajek and when exported from Pajek. - see NodeColor ArcWidth: The width of the arc that will be drawn when exported from Pajek. - Arc widths will not appear on screen in Pajek. ArcWeight: The weight of the arc that will be used by the spring embedding algorithms in Pajek. - If no column is specified, ArcWeight defaults to (1.0) - Arcs with weight 0 will not be drawn - Arcs with negative weights will be drawn as a dashed line. Process Timecode: Whether or not to parse data on edge creation/ deletion times When "Process Timecode" is checked, clicking "Parse text" brings up a dialog for entering timecode information before beginning the parse. The fields function pretty much the same as the column assignment fields: Start Time: Indicates the column of edge creation times - Any positive integer is valid - A fixed value can be entered with the "$" prefix End Time: Indicates the column of edge deletion times - Any positive integer is valid - A fixed value can be entered with the "$" prefix Duration: Used instead of end time to create edges of fixed duration Duration is not a column assignment. It essentially creates and end time for each edge equal to StartTime+Duration. - Any positive integer is valid 5. OUTPUT FORMAT NETWORK: PajekConverter outputs a Pajek readable text file with *.net extension. The file contains all of the node and arc attribute information. The network is fully specified, meaning that attributes which have been left to default in PajekConverter will be assigned default values rather than being excluded from the file. In most cases this is advantageous, as it will override some of Pajek's defaults. However, many of the attributes, (NodeColor, ArcColor, ArcWidth) are difficult or impossible to modify from within Pajek, or will override desired output. Hopefully I will change this in later version (should there be any). In the meantime, the solution may be the old standby of modifying the *.net file or the EPS (or SVG) file directly. TIMECODE: PajekConverter can include timecode information in the *.net file. If the "process timecode" option is checked, time codes, to be used with Pajek's Generate In Time function, will be appended to each line in the network file. At this point, time information is for edges only. All nodes are given the time code [0-*] , meaning that they have a duration from 0 to infinity. PARTITION: PajekConverter outputs a Pajek readable text file with *.clu extension. The file contains the values of the specified node attribute, mapped to Pajek partition classes (integer) in the order in which they are encountered. VECTOR: PajekConverter outputs a Pajek readable text file with *.vec extension. The file contains the values of the specified numeric node attribute. 6. NOTES - Due to a bug in the Java AWT library, there is a limit to the amount of text which can be pasted into the input window. Although this limit is very large, if your spreadsheet contains larger number of attributes for each node/arc, it might be best to only paste in the columns you are likely to use. If you are unable to paste into the input window, you have too much text selected. - PajekConverter now specifies the label information in the *.net file. This means that it will override the settings of the "Export/Options" dialog in Pajek. I did this because Pajek was being inconsistent, and often disabling the dialog anyway, producing large, black, unreadable labels positioned over the nodes. Now you get small, blue labels, positioned to the right and slightly below the nodes (similar to how they appear in Pajek's draw window). - PajekConverter only exports arcs, not edges. However, arcs can be converted to edges in Pajek using the Transform/Arcs->edges/All command. - The ability to read and export time codes is now included, but only for arcs. That is, all nodes are given a duration from 0 to infinity. - For simplicity, some node attributes can be exported as Vectors, others as Partitions. Partitions can be converted to vectors (and vs. versa) in Pajek. * MISSING COLUMNS Microsoft Excel seems to behave somewhat erratically when copying or exported tab-delinated data. If the columns on the far right of the copied cells do not contain any data, in some cases Excel will not include the tab characters corresponding to the missing columns. This can cause problems, as PajekConverter expects the empty columns to still be indicated by tabs. If PajekConverter is throwing errors about missing columns, and you have checked that you have the right number, this is probably what is going on. The mysterious work around in excel is to select the blank columns and choose "Fill Down" - this appears to essentially replace nothing with nothing, but for some reason it works. - In some cases you may have node attribute information which you would like to use, but do not want to change the node attribute values on every single edge record. PajekConverter defines a node and its attributes according to the records present on the line where it first encounters a node ID. The node attributes on subsequent lines with that ID are ignored and only the arc information is read. This means that you can define nodes by including a list of them (with no linkID) at the top of the text to be parsed. example1: NODEID LINKID COLOR ARCWEIGHT node1 node2 Purple 2.7 node1 node3 Orange 1.2 node3 node2 Black 1.2 example2: NODEID LINKID COLOR ARCWEIGHT node1 Blue node2 Red node3 Green node1 node2 Purple 2.7 node1 node3 Orange 1.2 node3 node2 Black 1.2 In example1, 3 nodes will be created with colors Purple, Orange, and Black. In example2, the three nodes will have the colors Blue, Red, Green because they were encountered first. THE EMPTY COLUMNS (ESPECILLY THE LAST ONE) MUST STILL BE INDICATED BY TABS FOR THIS TO WORK. Also, ALL of the node information must be included in the first node record. - 8. TODO LIST... - read me file/ manual for common operations in Pajek - Add font size/color/postion control panel to PajekConverter - allow timecode specification for nodes - add ability to read text files from disk