Cola Grafos RA34
Cola Grafos RA34
A very popular tree is the Binary Tree : Each node can have at most 2 children
An interesting application of Binary Tree is for data compression using Huffman Coding
Instead to use 8 bits to represent any character of a text, the idea is to create a new code structure where less bits are used to
represent more frequent characters and more bits are used to represent less frequent characters
1. Count the frequency of the symbols in the text and sort it by frequency 2. Build a binary tree called Huffman Tree grouping the
symbols by frequency 3. Traversal the Huffman Tree using Depth-First Search to build a code dictionary
4. Encode the text using the code dictionary
• To build the tree, we use a priority queue with symbol and frequency // In a loop, we remove from the queue the first two
symbols with lower frequency // Sum the frequencies of both symbols in a parent node, putting the symbol with lower frequency
in the left node and the symbol with higher frequency in the right node // Add the parent node with the resulting sum into the
queue // Stop when we have only one element in the queue
Traversal the tree using Depth-First Search, adding a bit 0 when it goes to a left node or a bit 1 when it goes to a right node until
reach a leaf node
How to encode the text using the code dictionary? We just need to replace the symbols by the codes from dictionary
Compression Ratio = Uncompressed size / Compressed Size Space Saving = 1 – Compressed Size / Uncompressed Size
Text = Huffman gosta de batata e de estudar (36 char) X 8bits = 288 bits Encoded text = 130 bits
Topological Sort
• Topological sort is a method of arranging the nodes of a Directed Acyclic Graph (DAG), as a sequence, such that no vertex
appears in the sequence before its predecessor, i.e., if G contains an edge (u, v), then the node u appears before node v in the
sequence
• It is a linear arrangement of the graph in which all the edges point from left to right TASKS DONE BEFORE OTHERS
• Management of software development projects // Performing some tasks or implementing some routines depends on others
being completed
• Process scheduling in operating system
• Compile a software
Kahn ́s algorithm seems to be an easy solution to obtain the topological sort and to detect cycles in graphs However, what is its
main limitation? To detect cycles, it works only on directed graphs since it depends on indegree information
What happens now that we add the edge (H, G)? Algorithm stops since it finds a visiting node as adjacent of another visiting node.
We can conclude that the topological sort is not possible because the graph contains a cycle
Besides to sort the nodes in a topological order, we can also use Kahn ́s Algorithm to detect cycles in directed graphs and Tarjan ́s
Algorithm to detect cycles in directed or undirected graphs
A directed graph is said strongly connected if there is a path in each direction between each pair of nodes of the graph
A graph could not be strongly connected, but it can contain a subgraph that is. Such subgraph is called strongly connected
component // Formally, a strongly connected component of a digraph G is a subset of vertices C ⊆ G such that for every pair of
vertices from C, there is a path from u to v and from v to u.
As we note, a same graph can have multiple Spanning Trees In general, we have interest in two of them for weighted graphs:
Minimum Spanning Tree is the Spanning Tree with the lower cost (considering the sum of all edges) between all possible trees of a
weighted graph Most useful for most problems
Maximum Spanning Tree is the Spanning Tree with the higher cost between all possible trees of an unweighted graph
To find maximum spanning tree with kruskals algorithm, just sort edges os DECREASING weight order.
Mathematical models to build graphs
• These models are called Graph Formation Models or Complex Network Models • Models are used to simulate a network or its
process of formation that represents a problem that we do not have enough data or data access • We can study and understand a
problem or related-problems based on analysis of these simulated networks
Erdös number describes the "collaborative distance" between the Hungarian mathematician Paul Erdös (wrote more than 1,475
papers, including some related to Graph Theory) and another researcher measured by paper ́s authorship
GNM define a fixed number of nodes (N) and a fixed number of edges (M) Choose the M connections picking the nodes randomly
(without loop and parallel edges)
Both GN,p and GN,M models may be inappropriate for modeling certain real-life phenomena • Absence of communities • Node
degree distribution do not follow a power law distribution • In general, follow a Poisson distribution but it is not scale-free • Low
clustering coefficient (we will understand how to compute it in the next class) • In real complex networks, the neighbors of a node
have a high probability of connecting between them
The network starts with a small initial number of n0 nodes and edges with random connections • At each time step, new nodes are
added with k edges that connect to the nodes already present in the network and where k ≤ n0 • To make the connections of the
new nodes with the old ones, a preferential attachment rule is employed • The probability of connecting a node i to a node j is
proportional to the degree of j • This process is repeated until the network reaches up the number of
desired nodes or after t time steps
Analysis of Complex Network
• Path analysis: Examines the relationships between nodes through the paths. Mostly used in shortest distance problems
Measures based on distance of shortest paths: Eccentricity // Diameter // Radius // Average Geodesic Distance
The eccentricity of a node u is the maximum distance between u to all other nodes of the graph (or its longest shortest path) and
denoted by e(u).
The diameter of a graph is defined as the maximum value of eccentricity observed for all nodes
Radius is Similar to diameter, but we consider the minimum value of eccentricity observed for all nodes
The average geodesic distance is the relation between the sum of the minimum paths of all possible pairs of nodes and the
𝑑(𝑖,𝑗)
maximum number of possible edges => ∑𝑖,𝑗∈𝑉 • The smaller the average geodesic distance, the more efficient
𝑁(𝑁−1)/2
𝑖≠𝑗
the network is in terms of number of hops from one node to reach others • In other words, the information is propagated faster in
the network
• Centrality analysis: Estimates how important a node is for the connectivity of the network. It helps to estimate the most
influential people in a social network or most frequently accessed web pages
Importance: If an incident occurs at DEN, it propagates worldwide much faster than an incident at CWB What is the global impact
when the Google or PUCPR website goes down due to a cyber-attack? As when a highly connected node on the Internet breaks
down, the disruption of p53 has severe consequences
Degree centrality - most central node is that one with the highest number of connections / total vertices – 1 // LIMITATIONS • We
can have nodes with higher degree (hubs), but peripheral (peripheral hubs) • The propagation of an information over the entire
network by these nodes is limited or slow • It is a local measure that considers only its neighborhood
Coreness centrality (graph k-core) - • To compute the Coreness centrality, we need to obtain the k-core graph • The k-core is a
subgraph in which all nodes have a degree at least k • The k-core is obtained by iteratively removing all nodes whose degree is
smaller than k. After removing these nodes, the network is re-analyzed to verify whether there are nodes with less than k
connections. If such nodes are present, then they are also removed • The process is repeated until the minimum degree in the
network is k ///////////// LIMITATIONS • In some cases, we can have many nodes with similar number of connections making
difficult to really identify the most "important" nodes • As well as degree centrality, it is a local measure that do not consider the
global position of node in the network structure • Discards nodes who serve as bridges between communities
Closeness centrality - A node is considered central by this measure if it is close to most of the other nodes of the network
• Nodes with a smaller average distance to reach all other nodes receive a value close to 1 for the closeness centrality • These
nodes are considered important because the information transmitted by it reaches to the other nodes faster
CALC UNDIRECTED: N-1 / (sum of other nodes degrees) CALC DIRECTED: Sum of 1/degrees of every node it is connected to / (N-1)
Betweenness centrality - • A node is considered central if it is part of most of the shortest paths of all possible pairs of nodes in
which it is not at the beginning or end of the path • The removal of a node with high betweenness centrality will significantly
impact the information flow in the network since it is part of most of shortest paths
CALC: SUM( number of shortest paths from u to v that pass through i) / (number of shortest paths from u to v // standardize by
dividing by ((N-1)(N-2)/2) = Max number of paths
• Community analysis: Distance and density of relationships can be used to find groups of people interacting frequently with
each other in a social network or to understand the patterns of a group over time
Clustering coeficiente
Clustering coefficient is a measure that indicates how much the nodes of a graph/network tend to form groups In general, we call
these groups as clusters. In social networks, it is usual to say communities Many evidence suggests that in most real-world
networks, nodes tend to create high density connected groups that are sparsely connected to other groups.
The global clustering coefficient of a graph is based on triplets of nodes A triplet is an ordered set of three nodes that are
connected by two undirected edges (open triplet) or three undirected edges (closed triplet) A triangle graph therefore includes
three closed triplets, one centered on each of the nodes The global clustering coefficient C is defined as:
𝐶 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑜𝑠𝑒𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠 / 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠 (𝑜𝑝𝑒𝑛 𝑎𝑛𝑑 𝑐𝑙𝑜𝑠𝑒𝑑)
The local clustering coefficient Ci of a given node i is defined according to the equation:
ki is the degree of node I // L i is number of links among the neighbors of node i
REVISÂO CHATGPT
2. Algoritmos de Kahn e Tarjan
Os algoritmos de Kahn e Tarjan foram desenvolvidos para resolver o problema de ordenação topológica em grafos direcionados
acíclicos (DAGs).
• Algoritmo de Kahn: Funciona removendo nós sem predecessores iterativamente e construindo a ordem topológica.
• Algoritmo de Tarjan: Usa uma busca em profundidade (DFS) para encontrar a ordem topológica.
Ambos podem ser usados para resolver problemas de detecção de ciclos e planejamento de tarefas com dependências.
3. Características de Árvores
2. Arestas: Uma árvore com nnn nós sempre possui n−1n-1n−1 arestas.
a. Definição:
• Árvore Geradora: Subgrafo que conecta todos os vértices do grafo original com o menor número possível de arestas.
• Árvore Geradora Mínima (AGM): Árvore geradora com o menor peso total.
Algoritmo de Kruskal:
Algoritmo de Prim:
1. Começa com um nó e adiciona arestas de menor peso que conectem nós fora da árvore.
c. Problema Real:
Um exemplo é o planejamento de redes elétricas, onde a AGM minimiza o custo de instalação das linhas de transmissão.
6. Inferno de Dependências
A teoria dos grafos, especialmente a ordenação topológica em DAGs, pode auxiliar na instalação ordenada de pacotes e
bibliotecas.
Ordenação Topológica:
1. Distribuição de Grau: Em muitas redes complexas, poucos nós têm muitos vizinhos, enquanto a maioria tem poucos.
2. Pequeno Mundo: A maioria dos nós pode ser alcançada a partir de qualquer outro nó por um pequeno número de passos.
o Exemplo: Redes de amizade, onde amigos de um indivíduo são frequentemente amigos entre si.
4. Robustez e Fragilidade: Redes complexas são robustas contra falhas aleatórias, mas frágeis a ataques direcionados.
Um modelo de formação de redes complexas descreve como redes reais podem crescer e evoluir ao longo do tempo.
2. Modelo de Barabási–Albert: Preferência por ligação, onde novos nós tendem a se conectar aos nós mais conectados.
3. Modelo Watts-Strogatz: Introduz aleatoriedade em uma rede regular para criar propriedades de "pequeno mundo".
O modelo de Barabási–Albert é mais adequado, pois aeroportos maiores tendem a ter mais conexões, representando a preferência
por ligação.